How to Build a Global Tree of All Known Languages? - A Brief Demonstration.
By Julien d'Huy
[I] (https://unive-paris1.academia.edu/JuliendHuy) used the [Dedius’s binary coding (with MrBayes)] (http://rspb.royalsocietypublishing.org/content/royprsb/suppl/2010/08/27/rspb.2010.1595.DC1/rspb20101595supp1.pdf) of the [WALS dataset] (http://wals.info/), the softwares [Structure 2.3.4] (http://pritchardlab.stanford.edu/structure.html) (fig.1, see supplementary material), and [StructureHarvester] (http://taylor0.biology.ucla.edu/structureHarvester/) (fig.2) to estimate the most likelihood number (“K”) of founding populations (here, K = 20) needed to explain the structure of the corpus. For each cultural area, the Fst score oscillates between 42–88%, [which is much higher than the genetic variation among human populations] (http://www.pnas.org/content/106/42/17671.short), and suggests that the language families, once established, seem to have been very conservative. Then I excluded from the corpus each language with more than 30% data which refer to acculturation signal (i.e. less than 70% of single-origin). Indeed, these languages having a complex history, it is expected that they are intermediate between many others and “blur” the phylogenetic signal. Using the altered corpus, I created many trees and networks (fig.3-5), which correctly group, for the first time, all languages into known language families and show evidence for some higher level clusters. The Austronesian language family may not form a monophyletic group.
This idea, which has yet to be tested in larger datasets, may be a major step toward a complete understanding of the historical relationship between the world's languages and many other important questions, such as the reconstruction of the human proto-language's structure and its evolution.
Attachment: Electronic_supplementary_material.pdf (286 KB)
Published: 21 Sep, 2015
Results to compare with [Greenhill, Simon J., et al. "The shape and tempo of language evolution." Proceedings of the Royal Society of London B: Biological Sciences 277.1693 (2010): 2443-2450.] (http://rspb.royalsocietypublishing.org/content/277/1693/2443).
Julien d'Huy · 22 Sep, 2015
The cluster Austronesian_iaa, Afro_Asiatic_heb, Indo_European_iri and Indo_European_bre remains a problem, which could be eliminated using a bigger and more complex database.
Julien d'Huy · 22 Sep, 2015
Also see [another attempt to reconstruct a global tree] (http://asjp.clld.org/download), yet this tree in some cases apparently fails to produce acceptable classifications (however see [here] (https://www.academia.edu/8716268/Jackknifing_the_black_sheep_ASJP_classification_performance_and_Austronesian)).
Julien d'Huy · 23 Sep, 2015