The Indo-European Cognate Relationships (IE-CoR) dataset is an open-access relational dataset showing how related, inherited words (‘cognates’) pattern across 160 languages of the Indo-European family. IE-CoR is intended as a benchmark dataset for computational research into the evolution of the Indo-European languages. It is structured around 170 reference meanings in core lexicon, and contains 25731 lexeme entries, analysed into 4981 cognate sets. Novel, dedicated structures are used to code all known cases of horizontal transfer. All 13 main documented clades of Indo-European, and their main subclades, are well represented. Time calibration data for each language are also included, as are relevant geographical and social metadata. Data collection was performed by an expert consortium of 89 linguists drawing on 355 cited sources. The dataset is extendable to further languages and meanings and follows the Cross-Linguistic Data Format (CLDF) protocols for linguistic data. It is designed to be interoperable with other cross-linguistic datasets and catalogues, and provides a reference framework for similar initiatives for other language families.

The Indo-European Cognate Relationships dataset / Anderson, Cormac; Scarborough, Matthew; Jocz, Lechosław; Kümmel, Martin Joachim; Jügel, Thomas; Irslinger, Britta; Pooth, Roland; Liljegren, Henrik; Strand, Richard F.; Haig, Geoffrey; Geupel, Ulrich; Macak, Martin; Kim, Ronald I.; Anonby, Erik; Pronk, Tijmen; Belyaev, Oleg; Dewey-Findell, Tonya Kim; Boutilier, Matthew; Freiberg, Cassandra; Tegethoff, Robert; Serangeli, Matilde; Stroński, Krzysztof; Falileyev, Alexander; Liosis, Nikos; Schulte, Kim; Gupta, Ganesh Kumar; Izadifar, Raheleh; Markus, Patrycja; Williams, Nicholas; Loi, Simone; Sims-Williams, Nicholas; Findell, Martin; Adibifar, Shirin; Abete, Giovanni; Atanasov, Petar; Baiwir, Esther; Bastardas, Maria-Reina; Benkato, Adam; Bevevino, Lisa Shugert; Buchi, Éva; Cadorini, Giorgio; Cathcart, Chundra; Cheveau, Loïc; Christodoulou, Charalambos; Delorme, Jérémie; Dworkin, Steven N.; Ekici, Deniz; Farridnejad, Shervin; Gheitasi, Mojtaba; Hammarström, Harald; Hewitt, Steve; Khan, Afsar Ali; Khan, Muhammad Kamal; Khokhlova, Liudmila; Kim, Deborah; Lewin, Christopher; Lushaj, Borana; Mahmoudveysi, Parvin; Mahommadirad, Masoud; Mersch, Sam; Mustafa, Baydaa; Nemati, Fatemeh; Nourzaei, Maryam; Muircheartaigh, Peadar Ó; Oogjen, Virginia; Ourang, Muhammed; Pagan, Heather; Palmer, Timothy S.; Pepper, Steve; Purandare, Mandar; Rehman, Khwaja; Rhys, Guto; Røyneland, Unn; Sagar, Muhammad Zaman; Sandstedt, Jade Jørgen; Steensland, Lars; Taheri-Ardali, Mortaza; Talebi-Dastenaei, Mahnaz; Tittel, Sabine; Tresoldi, Tiago; De Vaan, Michiel; Verkerk, Annemarie; Versloot, Arjen; Videsott, Paul; Vuletić, Nikola; Widmer, Manuel; Zeini, Arash; Bibiko, Hans-Jörg; Runge, Fiona; Gray, Russell D.; Heggarty, Paul. - In: SCIENTIFIC DATA. - ISSN 2052-4463. - 12:1(2025), pp. 1-27. [10.1038/s41597-025-05445-3]

The Indo-European Cognate Relationships dataset

Abete, Giovanni
Membro del Collaboration Group
;
2025

Abstract

The Indo-European Cognate Relationships (IE-CoR) dataset is an open-access relational dataset showing how related, inherited words (‘cognates’) pattern across 160 languages of the Indo-European family. IE-CoR is intended as a benchmark dataset for computational research into the evolution of the Indo-European languages. It is structured around 170 reference meanings in core lexicon, and contains 25731 lexeme entries, analysed into 4981 cognate sets. Novel, dedicated structures are used to code all known cases of horizontal transfer. All 13 main documented clades of Indo-European, and their main subclades, are well represented. Time calibration data for each language are also included, as are relevant geographical and social metadata. Data collection was performed by an expert consortium of 89 linguists drawing on 355 cited sources. The dataset is extendable to further languages and meanings and follows the Cross-Linguistic Data Format (CLDF) protocols for linguistic data. It is designed to be interoperable with other cross-linguistic datasets and catalogues, and provides a reference framework for similar initiatives for other language families.
2025
The Indo-European Cognate Relationships dataset / Anderson, Cormac; Scarborough, Matthew; Jocz, Lechosław; Kümmel, Martin Joachim; Jügel, Thomas; Irslinger, Britta; Pooth, Roland; Liljegren, Henrik; Strand, Richard F.; Haig, Geoffrey; Geupel, Ulrich; Macak, Martin; Kim, Ronald I.; Anonby, Erik; Pronk, Tijmen; Belyaev, Oleg; Dewey-Findell, Tonya Kim; Boutilier, Matthew; Freiberg, Cassandra; Tegethoff, Robert; Serangeli, Matilde; Stroński, Krzysztof; Falileyev, Alexander; Liosis, Nikos; Schulte, Kim; Gupta, Ganesh Kumar; Izadifar, Raheleh; Markus, Patrycja; Williams, Nicholas; Loi, Simone; Sims-Williams, Nicholas; Findell, Martin; Adibifar, Shirin; Abete, Giovanni; Atanasov, Petar; Baiwir, Esther; Bastardas, Maria-Reina; Benkato, Adam; Bevevino, Lisa Shugert; Buchi, Éva; Cadorini, Giorgio; Cathcart, Chundra; Cheveau, Loïc; Christodoulou, Charalambos; Delorme, Jérémie; Dworkin, Steven N.; Ekici, Deniz; Farridnejad, Shervin; Gheitasi, Mojtaba; Hammarström, Harald; Hewitt, Steve; Khan, Afsar Ali; Khan, Muhammad Kamal; Khokhlova, Liudmila; Kim, Deborah; Lewin, Christopher; Lushaj, Borana; Mahmoudveysi, Parvin; Mahommadirad, Masoud; Mersch, Sam; Mustafa, Baydaa; Nemati, Fatemeh; Nourzaei, Maryam; Muircheartaigh, Peadar Ó; Oogjen, Virginia; Ourang, Muhammed; Pagan, Heather; Palmer, Timothy S.; Pepper, Steve; Purandare, Mandar; Rehman, Khwaja; Rhys, Guto; Røyneland, Unn; Sagar, Muhammad Zaman; Sandstedt, Jade Jørgen; Steensland, Lars; Taheri-Ardali, Mortaza; Talebi-Dastenaei, Mahnaz; Tittel, Sabine; Tresoldi, Tiago; De Vaan, Michiel; Verkerk, Annemarie; Versloot, Arjen; Videsott, Paul; Vuletić, Nikola; Widmer, Manuel; Zeini, Arash; Bibiko, Hans-Jörg; Runge, Fiona; Gray, Russell D.; Heggarty, Paul. - In: SCIENTIFIC DATA. - ISSN 2052-4463. - 12:1(2025), pp. 1-27. [10.1038/s41597-025-05445-3]
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11588/1046014
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact