[I] ( used the [Dedius’s binary coding (with MrBayes)] ( of the [WALS dataset] (, the softwares [Structure 2.3.4] ( (fig.1, see supplementary material), and [StructureHarvester] ( (fig.2) to estimate the most likelihood number (“K”) of founding populations (here, K = 20) needed to explain the structure of the corpus. For each cultural area, the Fst score oscillates between 42–88%, [which is much higher than the genetic variation among human populations] (, and suggests that the language families, once established, seem to have been very conservative. Then I excluded from the corpus each language with more than 30% data which refer to acculturation signal (i.e. less than 70% of single-origin). Indeed, these languages having a complex history, it is expected that they are intermediate between many others and “blur” the phylogenetic signal. Using the altered corpus, I created many trees and networks (fig.3-5), which correctly group, for the first time, all languages into known language families and show evidence for some higher level clusters. The Austronesian language family may not form a monophyletic group.
This idea, which has yet to be tested in larger datasets, may be a major step toward a complete understanding of the historical relationship between the world's languages and many other important questions, such as the reconstruction of the human proto-language's structure and its evolution.

Results to compare with [Greenhill, Simon J., et al. "The shape and tempo of language evolution." Proceedings of the Royal Society of London B: Biological Sciences 277.1693 (2010): 2443-2450.] (

The cluster Austronesian_iaa, Afro_Asiatic_heb, Indo_European_iri and Indo_European_bre remains a problem, which could be eliminated using a bigger and more complex database.

Also see [another attempt to reconstruct a global tree] (, yet this tree in some cases apparently fails to produce acceptable classifications (however see [here] (

