Category Archives: West Eurasia

Italian haploid genetics (second round)

More than a year ago I commented (as much as I could) on the study of Italian haploid genetics by Francesca Brisighelli et al. Sadly the study was published with several major errors in the figures, making it impossible to get anything straight. 
I know directly from the lead author that the team has been trying since then to get the paper corrected but this correction was once and again delayed by apparent inefficiency of PLoS ONE’s management, much to their frustration. Finally this week the correction has been published and the figures corrected.
So let’s give this study another chance:
Francesca Brisighelli et al., Uniparental Markers of Contemporary Italian Population Reveals Details on Its Pre-Roman Heritage. PLoS ONE 2012 (formally corrected in February 2014). Open accessLINK [doi:10.1371/journal.pone.0050794]
Notice please that you have to read the formal correction in order to access the new figures, the wrong ones are still in the paper as such. 
The corrected figures are central to the study:

Figure 1 (corrected). Map showing the location of the samples analyzed in the present study and those collected from the literature (see Table 1).
charts on the left display the distribution of mtDNA haplogroup
frequencies, and those on the right the Y-chromosome haplogroup

So now we know that the Northern mtDNA pie was duplicated in the original graph and that Central Italians are outstanding in R0(xH,V), which reaches 14% (probably most HV*), while they have some other peculiarities relative to their neighbors from North and South: some less U and no detected V. 
Other variations are more clinal: H decreases from North to South while J and T do the opposite.

Figure 3 (corrected). Phylogeny of Y-chromosome SNPs and haplogroup frequencies in different Italian populations.

In the Y-DNA side, the most obvious transition is between the high frequencies of R1b1a2-M269 (R1b3 in the paper) in the North versus much lower frequencies in the South. But also:
  • J2 is notorious in the Central region (and also the South) but rare in the North.
  • G frequencies in the South are double than those of Center and North.
  • The same happens with lesser intensity regarding E1b1b1-M35 (E3b in the study).
  • In contrast haplogroup I is most common in the North. However the Sardinian and sub-Pyrenean clade I2a1a-M26 (I1b2 in the paper), which is also the one documented in Chalcolithic Languedoc, is rare in all regions.

The study also deals with several isolated populations:

Figure 4. Haplogroup frequencies of Ladins, Grecani
Salentini and Lucera compared to the rest of the Italian populations
analyzed in the present study.

All them show large frequencies of mtDNA H relative to their regions. The Grecani Salentini do have some extra Y-DNA E1b1b1 (E3b) and J2, what may indeed underline their partial Greek origins. The Ladini show unusually high frequencies of R1b*(xR1b1a2) and K*(xR1a,R1b,L,T,N3), while the Lucerans are outstanding in their percentage of G.
I want to end this entry with a much needed scolding to the staff of PLoS ONE for their totally unacceptable original sloppiness and delay in the correction. And my personal thanks and appreciation to Francesca Brisighelli for her indefatigable persistence and enthusiasm for her work, which is no doubt of great interest.

Ancient European DNA and some debatable conclusions

There is a rather interesting paper still in preparation available online and causing some debate.
Iosif Lazaridis, Nick Patterson, Alissa Mittnik, et al., Ancient human genomes suggest three ancestral populations for present-day Europeans. BioArxiv 2013 (preprint). Freely accessibleLINK [doi:10.1101/001552]


Analysis of ancient DNA can reveal historical events that are difficult to discern through study of present-day individuals. To investigate European population history around the time of the agricultural transition, we sequenced complete genomes from a ~7,500 year old early farmer from the Linearbandkeramik (LBK) culture from Stuttgart in Germany and an ~8,000 year old hunter-gatherer from the Loschbour rock shelter in Luxembourg. We also generated data from seven ~8,000 year old hunter-gatherers from Motala in Sweden. We compared these genomes and published ancient DNA to new data from 2,196 samples from 185 diverse populations to show that at least three ancestral groups contributed to present-day Europeans. The first are Ancient North Eurasians (ANE), who are more closely related to Upper Paleolithic Siberians than to any present-day population. The second are West European Hunter-Gatherers (WHG), related to the Loschbour individual, who contributed to all Europeans but not to Near Easterners. The third are Early European Farmers (EEF), related to the Stuttgart individual, who were mainly of Near Eastern origin but also harbored WHG-related ancestry. We model the deep relationships of these populations and show that about ~44% of the ancestry of EEF derived from a basal Eurasian lineage that split prior to the separation of other non-Africans.

Haploid DNA
The Lochsbour skull.
The prominent browridge
is very unusual for
Paleolithic Europeans.
The new European hunter-gatherer samples carried all Y-DNA I and mtDNA U5a and U2e.
More specifically, the hunter-gatherer mtDNA lineages are:
  • Lochsbour (Luxembourg): U5b1a
  • Motala (Sweden):
    • Motala 1 & 3: U5b1a
    • Motala 2 & 12: U2e1
    • Motala 4 & 6: U5a2d
    • Motala 9: U5a2
Additionally the Stuttgart Linear Pottery farmer (female) carried the mtDNA lineage T2c1d1.
The Y-DNA lineages are:
  • Lochsbour: I2a1b*(xI2a1b1, I2a1b2, I2a1b3)
  • Motala 2: I*(xI1, I2a2,I2a1b3)
  • Motala 3: I2*(xI2a1a, I2a2, I2b)
  • Motala 6: uncertain (L55+ would make it Q1a2a but L232- forces it out of Q1)
  • Motala 9: I*(xI1)
  • Motala 12: I2a1b*(xI2a1b1, I2a1b3)
These are with certainty the oldest Y-DNA sequences of Europe so far and the fact that all them fall within haplogroup I(xI1) supports the notion of this lineage being once common in the subcontinent, at least in some areas. Today I2 is most common in Sardinia, the NW Balcans (Croatia, Bosnia, Montenegro), North Germany and areas around Moldavia.
I2a1b (which may well be all them) is currently found (often in large frequencies) in the Balcans and Eastern Europe with some presence also in the eastern areas of Central Europe. It’s relative I2a1a is most common in Sardinia with some presence in SW Europe, especially around the Pyrenees. I2a1 (probably I2a1a but not tested for the relevant SNPs) was also found, together with G2a, in a Chalcolithic population of the Treilles group (Languedoc) and seems to be somehow associated to Cardium Pottery Neolithic.
If you want my opinion, I’d think that I2a before Neolithic was dominant, like mtDNA U5 (and satellites U4 and U2e), in much of Central and Eastern Europe but probably not in SW Europe, where mtDNA U5 seems not so much hyper-dominant either, being instead quite secondary to haplogroup H (at least in Western Iberia). But we’ll have to wait until geneticists manage to sequence Y-DNA in several SW European Paleolithic remains to be sure.

Autosomal DNA and derived speculations
Most of the study (incl. the must-read supplemental materials) deals however with the autosomal DNA of these and other hunter-gatherers, as well as of some Neolithic farmers from Central Europe and Italy (Ötzi) and their comparison with modern Europeans. 
To begin with, they generated a PCA plot of West Eurasians (with way too many pointless Bedouins and Jews, it must be said) and projected the ancient Europeans, as well as a whole bunch of Circum-Pacific peoples on it:
The result is a bit weird because, as you can see, the East Asians, Native Americans and Melanesians appear to fall way too close to the peoples of the Caucasus and Anatolia. This seems to be a distorting effect of the “projection” method, which forces the projected samples to align relative to a set of already defined parameters, in this case the West Eurasian (modern) PCA. 
So the projection basically formulates the question: if East Asians, etc. must be forcibly to be defined in West Eurasian (WEA) terms, what would they be? And then answers it as follows: Caucasian/Anatolian/Iranian peoples more or less (whatever the hidden reasons, which are not too clear).
Similarly, it is possible (but uncertain) that the ancient European and Siberian sequences show some of this kind of distortion. However I have found experimentally that the PCA’s dimension 1 (but not the dimension 2, which corresponds largely to the Asian-specific distinctions) still correlates quite well with the results of other formal tests that the authors develop in the study and is therefore a valuable tool for visualization.
But this later. By the moment the PCA is asking and answering three or four questions by projecting ancient European and Siberian samples in the West Eurasian plot:
  • If ancient Siberians are forced to be defined in modern WEA terms, what would they be? Answer: roughly Mordvins (Afontova Gora 2) or intermediate between these and North Caucasus peoples (Mal’ta 1).
  • If ancient Scandinavian hunter-gatherers are forced in modern WEA terms, what would they be? Answer: extreme but closest (Skoglund) to Northern European peoples like Icelanders or Lithuanians.
  • If ancient Western European hunter-gatherers are forced in modern WEA terms, what would they be? Answer: extreme too but closest (La Braña 2) to SW European peoples like Basques and Southern French.
  • If ancient Neolithic/Chalcolithic farmers from around the Alps and Sweden are forced in modern WEA terms, what would they be? Answer: Canarians (next close: Sardinians, then Spaniards).
Whatever the case, there seems to be quite a bit of autosomal diversity among ancient Western hunter-gatherers, at the very least when compared with modern peoples. This makes some good sense because Europe was a big place already in Paleolithic times and must have harbored some notable diversity. Diversity that we may well find to grasp if we only sample people from the same areas once and again.
On the other hand, they seem to cluster in the same extreme periphery of the European cluster, opposed to the position of West Asians, and therefore suggesting that there has been some West Asian genetic flow into Europe since then (something we all assume, of course). 
Using Lochsbour as proxy for the WHG (Western hunter-gatherer) component, Mal’ta 1 as proxy for the ANE (ancient north Eurasian) one and Stuttgart as proxy for the EEF (early European farmer) one, they produce the following graph (to which I added an important note in gray):
The note in gray is mine: highlighting the contradictory position where the other Western hunter-gatherers may fall in because of assuming Lochsbour as valid proxy, when it is clearly very extreme. This was not tested in the study so it is inferred from the PC1, which seems to best approach the results of their formal tests in the WHG vs EEF axis, as well as those of the WHG vs Near East comparisons.
I tried to figure out how these formal tests are reflected, if at all in the PCA, mostly because the PCA is a much easier tool for comprehension, being so visual. Eventually I found that the dimension 1 (horizontal axis) is very close to the genetic distances measured by the formal tests (excepted those for the ANE component, obviously), allowing a visualization of some of the possible problems caused by their use of Lochsbour as only reference, without any control. Let’s see it:

The same PCA as above with a few annotations in magenta and green
While not exactly, the slashed vertical magenta line (median in the dimension 1 between Lochsbour and Stuttgart) approximates quite well the WHG vs EEF values measured in the formal tests. Similarly, the slashed green axis (median in PC1 between Lochsbour and an good looking Bedouin) approximates to a great extent the less precise results of the formal tests the authors applied to guesstimate the West Asian and WHG ancestry of EEFs, which ranged between 60% and almost 100% West Asian (my line is much closer to the 60% value, which seems more reasonable). 
When I tried to find an alternative median WHG/West Asian line, using Braña 2 and the first non-Euro-drifted Turk I could spot (Anatolia is much more likely to be the direct source of West Asian ancestry in Europe than Bedouins), I got exactly the same result, so no need to plot any second option (two wrongs sometimes do make one right, it seems). But when I did the same with La Braña 2 and Stuttgart I got a genuine good-looking alternative median line, which is the slash-and-dot magenta axis.
This alternative line is probably a much more reasonable 50% WHG-EEF approximation in fact and goes right through Spain, what makes good sense for all I know.
Of course the ideal solution would be that someone performed good formal tests, similar to those done in the study, with Braña 2 and/or Skoglund, which should be more similar to the actual WHG ancestry of modern Europeans than the extremely divergent Lochsbour sequence. An obvious problem is that La Braña produced only very poor sequences but, well, use Skoglund instead or sample some Franco-Cantabrian or Iberian other Paleolithic remains.
Whatever the solution, I think that we do have a problem with the use of Lochsbour as only WHG proxy and that it demands some counter-testing. 
What about the ANE component? I do not dare to give any alternative opinion because I lack tools to counter-analyze it. What seems clear is that its influence on modern Europeans seems almost uniformly weak and that it can be ignored for the biggest part. As happens with the WHG, it’s quite possible that the ANE would be enhanced if the sequence from Afontova Gora is used instead of that of Mal’ta but I can’t foresee how much. 
Finally some speculative food-for-thought. Again using the visual tool of the PCA, I spotted some curiosities:

Speculative annotations on the PCA

Most notably it is apparent that the two WHG populations (Western and Scandinavian) are aligned in natural axes, which seem to act as clusters. Extending both (dotted lines) they converge at a point closest to some French, notably the only “French” that tends towards “Southern France” and Basques. So I wonder: is it possible that these two WHG cluster-lines represent derived ancient branches from an original population of SW France. We know that since the LGM, the area of Dordogne (Perigord) was like the megapolis of Paleolithic Europe, with population densities that must have been several times those of other areas. We know that this region was at the origin of both Solutrean and Magdalenian cultures and probably still played an important role in the Epipaleolithic period. 
So I do wonder: is that “knot” a mere artifact of a mediocre representation or is it something much more real? Only with due research in the Franco-Cantabrian region we will find out. 

The Mal’ta aDNA findings

The recent sequencing of ancient DNA from the remains of a Central Siberian young boy, corresponding to the Gravettian site of Mal’ta, West of Lake Baikal, dated to c. 24,000 years calBP, has caught the interest of many anthropology enthusiasts. During my hiatus of more than two months, most people who asked me to retake blogging with an specific request, talked of these findings. Let’s see:
Maanasa Raghavan et al., Upper Palaeolithic Siberian genome reveals dual ancestry of Native Americans. Nature 2013. Pay per viewLINK [doi:10.1038/nature12736]


The origins of the First Americans remain contentious. Although Native Americans seem to be genetically most closely related to east Asians1, 2, 3, there is no consensus with regard to which specific Old World populations they are closest to4, 5, 6, 7, 8. Here we sequence the draft genome of an approximately 24,000-year-old individual (MA-1), from Mal’ta in south-central Siberia9, to an average depth of 1×. To our knowledge this is the oldest anatomically modern human genome reported to date. The MA-1 mitochondrial genome belongs to haplogroup U, which has also been found at high frequency among Upper Palaeolithic and Mesolithic European hunter-gatherers10, 11, 12, and the Y chromosome of MA-1 is basal to modern-day western Eurasians and near the root of most Native American lineages5. Similarly, we find autosomal evidence that MA-1 is basal to modern-day western Eurasians and genetically closely related to modern-day Native Americans, with no close affinity to east Asians. This suggests that populations related to contemporary western Eurasians had a more north-easterly distribution 24,000 years ago than commonly thought. Furthermore, we estimate that 14 to 38% of Native American ancestry may originate through gene flow from this ancient population. This is likely to have occurred after the divergence of Native American ancestors from east Asian ancestors, but before the diversification of Native American populations in the New World. Gene flow from the MA-1 lineage into Native American ancestors could explain why several crania from the First Americans have been reported as bearing morphological characteristics that do not resemble those of east Asians2, 13. Sequencing of another south-central Siberian, Afontova Gora-2 dating to approximately 17,000 years ago14, revealed similar autosomal genetic signatures as MA-1, suggesting that the region was continuously occupied by humans throughout the Last Glacial Maximum. Our findings reveal that western Eurasian genetic signatures in modern-day Native Americans derive not only from post-Columbian admixture, as commonly thought, but also from a mixed ancestry of the First Americans.

Haploid lineages
The Mal’ta boy, MA-1, carried distinct yDNA R* and mtDNA U* lineages. While both are clearly related to those dominant in Europe and parts of Asia (West, South) nowadays, they are also distinct from any specific dominant lineage today.
R* (yDNA) is neither R1 nor R2 but another distinct branch of R. This kind of R(xR1, R2) is most rare today and found mostly in and around NW South Asia. Following Wikipedia, this “other R” is found in:
  • 10.3% among the Burusho
  • 6.8% among the Kalash
  • 3.4% among the Gujarati
However I must say that I recall from old discussions that some R(xR1) is also found among Mongols and some North American Natives. I would have to find the relevant studies though (maybe in an update).
U* (mtDNA) is also quite rare today but has been found in Swabian Magdalenian hunter-gatherers, as well as in some Neolithic samples, although it may well be a totally different kind of U* (I could not discern the specific markers in the paper nor the supplementary materials and it must be reminded that the asterisk only means “others”).
Autosomal DNA
The study also shows some statistical inferences from the autosomal (or nuclear) DNA of the Mal’ta boy:
Figure 1 [b & c]
b, PCA (PC1 versus PC2) of MA-1 and worldwide human populations for which genomic tracts from recent European admixture in American and Siberian populations have been excluded19.
c, Heat map of the statistic f3(Yoruba; MA-1, X) where X is one of 147 worldwide non-African populations (standard errors shown in Supplementary Fig. 21). The graded heat key represents the magnitude of the computed f3 statistics.

Here we can appreciate that MA-1 is closest to Native Americans but still rather intermediate between them and South and West Eurasians. Interestingly East Asians are quite distant instead, suggesting that MA-1 was still not too much admixed with that continental population, unlike what happens with Native Americans, who are essentially East Asian in the autosomal and mtDNA aspects. So this kid appears to be some sort of a “missing link” in the Paleolithic ethnogenesis of Native Americans.

Figure 2 | Admixture graph for MA-1 and 16 complete genomes. An admixture graph with two migration edges (depicted by arrows) was fitted using TreeMix21 to relate MA-1 to 11 modern genomes from worldwide populations22, 4 modern genomes produced in this study (Avar, Mari, Indian and Tajik), and the Denisova genome22. Trees without migration, graphs with different number of migration edges, and residual matrices are shown in Supplementary Information, section 11. The drift parameter is proportional to 2Ne generations, whereNe is the effective population size. The migration weight represents the fraction of ancestry derived from the migration edge. The scale bar shows ten times the average standard error (s.e.) of the entries in the sample covariance matrix. Note that the length of the branch leading toMA-1 is affected by this ancient genome being represented by haploid genotypes.
Even if I am not too keen of TreeMix, in this case the results seem consistent.
We can appreciate here that a sample of Native Americans (the Karitiana, maybe not as “pure” as the Xavantes but still very much so) show up in a different branch from MA-1, reflecting their overwhelmingly East Asian ancestry, mostly by the maternal side (mtDNA). MA-1 instead hangs from the South-West Eurasian branch, soon after the split between South Asians and West Eurasians. Both have extremely drifted branches, surely indicating the small size of their founder populations, typical of the Far North. 
In addition to this basic tree, two admixture events are signaled: one is the already known Denisovan (H. erectus?) weak one into Australasian Natives (represented by Papuans) and the other one, quite more intense, is the one hanging from upstream of MA-1 to Native Americans (Karitiana), reflecting the partial South-West Eurasian ancestry of Native Americans (noticeable also in their dominant paternal ancestry: haplogroup Q). 
The fact that the admixture signal stems from quite upstream of MA-1 indicates that this boy (or rather his relatives) were not direct ancestors of Native Americans in any significant way but rather a different branch from the same trunk. Probably proto-Amerindians were already in this period at the North Pacific coasts, not sure if in Beringia or around Okhotsk or what but certainly they had already separated from the Mal’ta population.
What did we know of Native American genesis before this finding?
There are three principal lines of evidence:
  1. Y-DNA, which among Native Americans is essentially haplogroup Q (plus some C3, which is from NE Asia). By phylogenetically hierarchical diversity, haplogroup Q must have coalesced in West or Central Asia (or maybe South Asia?), very possibly in or near Iran. The NE Asian and Native American branches are clearly derived, even if more important numerically today.
  2. mtDNA, which among Native Americans is essentially from NE Asia (A, C, D), middle East Asia (B) but also in a small amount from West Asia (X2). 
  3. Archaeology: we can track, more or less directly, the proto-NAs by means of following the Upper Paleolithic sequence in Siberia and nearby areas. 
    1. C. 47,000 years ago (calBP) H. sapiens with Aurignacoid technology (i.e. linked to West Eurasian earliest Upper Paleolithic) reached Altai, displacing the Neanderthals to the Northern fringes of the district.
    2. C. 30,000 years ago, Upper Paleolithic (“mode 4”) technology with roots in Altai reached other parts of Siberia, Mongolia and North China, from where it expanded eastwards and southwards gradually in a process of, probably, cultural diffusion. 
    3. By c. 17,000 years ago they were already in North America and c. 15,000 years ago in South America. In the LGM they were probably in Beringia already (but this is only indirectly attested so far). 
So we already had a good idea about the origins of Native Americans: their ultimate roots, at least patrilineally, seem to be in Altai (where they were part of the wider West Eurasian colonization at the expense of Neanderthals with Aurignacian-like technology and dogs). Then, probably around 30,000 years ago they expanded eastwards through Siberia and maybe nearby areas, entering in intense and intimate contact with the already existent East Asian populations, with whom they admixed once and again, mostly by the female side. 
It would seem therefore that their society was already patrilocal because otherwise their patrilineages would have just got dissolved among the locals and would have never reached Beringia nor America in such dominant position.
Overall this is the quite clear notion that I have on Native American earliest genesis and for me there is no reasonable doubt about this narrative (except maybe in the fine details). However I must reckon that some individuals have reacted very negatively against it. But no matter how much they yell, I fail to see their arguments. 
How does this new finding affects this narrative?
It simply confirms it with further evidence. By 24,000 calBP the proto-NAs were surely already, as I said before, in NE Asia close to the Pacific coasts, so this Mal’ta population is a branch left behind in their migration (plus whatever new inflows from the West, which we can’t evaluate). The very low affinity level with East Asians, in spite of its quite Eastern location, shows that early East Asians had not yet reached, at least in significant numbers, so far North. If they had, they probably did only at more eastern longitudes, probably near the sea, where resources were more plentiful.
In other words: the first Central Siberians were of South+West Eurasian stock and the current East Asian genetic and phenotype hegemony in that area reflects post-LGM flows, mostly lead by yDNA N1. 
Early Native Americans were the product of admixture of these earliest Siberians with NE Asians, admixture that surely happened East of Lake Baikal, although the exact details are still unclear. 
What does MA-1 say about the West?
His mtDNA is generally consistent with other common U-derived lineages found in West Eurasian Upper Paleolithic, so not much other than he was somehow related, what is confirmed by autosomal analysis. 
His yDNA is more interesting maybe, nonetheless because it is probably the oldest sequence of this kind but also because it belongs to haplogroup R. It certainly discards whatever “molecular clock” guesstimates for R that are shorter than this site’s age but on its own it is not able to set a real age other than a bare minimum. 
So for example Eupedia‘s estimate of 29 Ka for R as such could still be valid, although I would say that extremely unlikely. 
Indirectly however it does say something by confirming the overall narrative of Native American origins as above and that means that Eupedia’s estimate of a mere 24 Ka age for haplogroup Q is almost certainly wrong by a lot. 
Using that tree, we would have to at least double the age of Q in order to fit with the Altai narrative (which begins at c. 47 Ka ago), what, extrapolating, implies an age for R of at least 58 Ka. I have estimated some 48 Ka of age for R1 and 68 Ka for P, so it makes good sense after these so necessary corrections. The exact ages we may never know but the approximate ages should be something like these. 
And that’s about all I can say. More in comments (and/or updates) if need be.

Update (Dec 6): R* and P* (and other rare clades) among Central Asians

A reader sent me copy of the study by Wei-Hua Shou et al. (2010) titled Y-chromosome distributions among populations in Northwest China identify significant contribution from Central Asian pastoralists and lesser influence of western Eurasians, published by Nature (doi:10.1038/jhg.2010.30).

While it is not the bit of info I was recalling above, it does add some information about unmistakable R(xR1,R2) and P(xQ,R) among Central Asian populations (from P.R. China territory). In detail:

  • R* is found in 5/31 Tayiks, 1/41 Kazakhs and 1/50 Uyghurs.
  • P* is found in 1/31 Tayiks and 1/43 Kirgizes. 

Also of interest should be the presence of:

  • Q(xQ1) in  8/35 Dongxiang (a Mongol ethnicity), 1/45 Kirgizes and 1/50 Tu (another Mongol ethnicity).
  • F(xG,H,I,J,K) in 2/32 Yugu (Yugurs, a distinct Uyghur sub-ethnicity), 2/41 Kazakh, 1/31 Tayiks and 1/50 Tu.
  • K(xN,O,P) in  32/533 total (i.e. 6% in Easternmost Central Asia), among which are most notable: 9/50 Uyghurs, 6/23 Uzbeks, 6/27 Bao’an (another small Mongol ethnicity), 3/32 Xibo (a Tungusic ethnicity), 2/32 Yugu and 2/5 Mongols. I guess that it is possible that this is a distinct K subclade, although it can well be either part of MNOPS (NO*?) or also belong to LT (L?).
  • R2 in 1/31 Tayiks and 2/27 Bao’an.

Belarusian uniparental ancestry

A new study on Belarusian haploid genetics provides some interesting insights for the wider European and West Eurasian picture.
Alena Kushniarevich et al., Uniparental Genetic Heritage of Belarusians: Encounter of Rare Middle Eastern Matrilineages with a Central European Mitochondrial DNA Pool. PLoS ONE 2013. Open accessLINK [doi:10.1371/journal.pone.0066499]


Ethnic Belarusians make up more than 80% of the nine and half million people inhabiting the Republic of Belarus. Belarusians together with Ukrainians and Russians represent the East Slavic linguistic group, largest both in numbers and territory, inhabiting East Europe alongside Baltic-, Finno-Permic- and Turkic-speaking people. Till date, only a limited number of low resolution genetic studies have been performed on this population. Therefore, with the phylogeographic analysis of 565 Y-chromosomes and 267 mitochondrial DNAs from six well covered geographic sub-regions of Belarus we strove to complement the existing genetic profile of eastern Europeans. Our results reveal that around 80% of the paternal Belarusian gene pool is composed of R1a, I2a and N1c Y-chromosome haplogroups – a profile which is very similar to the two other eastern European populations – Ukrainians and Russians. The maternal Belarusian gene pool encompasses a full range of West Eurasian haplogroups and agrees well with the genetic structure of central-east European populations. Our data attest that latitudinal gradients characterize the variation of the uniparentally transmitted gene pools of modern Belarusians. In particular, the Y-chromosome reflects movements of people in central-east Europe, starting probably as early as the beginning of the Holocene. Furthermore, the matrilineal legacy of Belarusians retains two rare mitochondrial DNA haplogroups, N1a3 and N3, whose phylogeographies were explored in detail after de novo sequencing of 20 and 13 complete mitogenomes, respectively, from all over Eurasia. Our phylogeographic analyses reveal that two mitochondrial DNA lineages, N3 and N1a3, both of Middle Eastern origin, might mark distinct events of matrilineal gene flow to Europe: during the mid-Holocene period and around the Pleistocene-Holocene transition, respectively.

Mitochondrial DNA
Belarusians have typical Central-Eastern European mtDNA pools, with some 37% H (of which: c. 11% H1 and 15% unclassified H*), c. 12% U5 (roughly half for each major subclade: U5a and U5b) and a diversity of other lineages:

Figure 2. Phylogeny of mtDNA haplogroups and their relative frequencies in Belarusians.
The tree is rooted relative to the RSRS according to [51]. Belarusian sub-populations are designated as BeE – East, BeWP – West Polesie, BeEP – East Polesie, BeN – North, BeC – Centre, BeW – West. Sample sizes and absolute frequencies are also given.

From the paper:

Frequencies of Belarusian mtDNA haplogroups do not differ considerably from other eastern European and Balkan populations, at least when major clades such as H1, H2, V, U5a and U5b, K, T and J are considered (Table S3). However, populations from the easternmost fringe of the eastern European region, the Volga-Uralic, have a decreased share of overall H mtDNAs and a noticeably increased frequency of haplogroup U4 as well as M-lineages compared to Belarusians (Table S3).

Of interest is the rare lineage N3, also found in around the Eastern Mediterranean (from Albania to Egypt) and in Iran, where it probably originated (highest diversity), spreading to Europe possibly in the Neolithic.
Another rare lineage, N1a3, is most frequent among Peninsular Arabs (but with low HVS-I diversity), being also found in the Eastern Mediterranean (from Sicily to Palestine) and the Caucasus. It is very rare in Central and Eastern Europe, excepted Mordvins (but again low diversity). It’s precise origins remain unclear (either West Asia or the European SE, included Italy, where it seems quite diverse). Based on their own age estimates, the authors suggest a rather old diversification of this lineage still in Paleolithic times.
Y chromosome DNA
Belarusian paternal ancestry is dominated by R1a (51%), I2a1 (“I2a”: 17%) and N1c (10%). Other notable lineages are R1b (6%) and I1 (5%). All them are within expectations.

Figure 6. Phylogeny of NRY haplogroups and their relative frequencies in Belarusians.
Haplogroup-defining biallelic markers are in parentheses. Belarusian sub-populations are designated as BeN – North, BeC – Centre, BeE – East, BeW – West, BeWP – West Polesie, BeEP – East Polesie. Sample sizes and absolute frequencies are also given.

I2a1 (“I2a” in the paper) is more common towards the South or SE. Instead N1c shows the opposite distribution, being more common in the North and West. This distribution is concordant with the wider East European picture.
Both lineages show star-like STR structures. Based on them, the authors suggest that, only for Belarusian lineages, N1c (fig. 7)may have spread from the Baltic area (Lithuania, Latvia NW Belarus), while I2a1 (fig. 8) would have spread from the Balcans and maybe Belarus itself. We should not read beyond Belarus here because of the sampling bias of the study (notably regarding N1c, with very few Uralic samples).

HERC2 haplotypes, phylogeny and frequencies

Palisto at Kurdish DNA has a most interesting report of his own production on the eye color gene HERC2, its variant haplotypes, their phylogeny and their frequency in West Eurasian and Pakistani populations.
Based on Kurdish haplotypes, he developed the following phylogeny:

All branches produce dark eye color, excepted the two colored in blue, which are associated with light eye color. 
The defining transitions from branch#3 to branch#1 are rs1129038 and rs12913832 (demonstrated to cause blue eyes in 99% of cases) while the transition to branch#2 is found at rs11636232
He also produced haplotype frequency tables for the two light eye color haplotypes (here the one sorted by branch#1 frequencies):

Branch#1 Branch#2
Brahui 2% 2%
Balochi 8% 2%
Balochi 12% 6%
Kalash 12% 16%
Sardinian 16% 4%
Palestinian 18% 3%
Burusho 18% 12%
Basque 19% 21%
Italians 25% 19%
Adygei 26% 6%
Orcadian 28% 41%
Galician 30% 17%
French 32% 30%
Russians 36% 46%
Italians 42% 27%
Swedes 42% 54%
Germans 46% 33%
Danes 52% 32%
Austrian 55% 28%
Swiss 69% 25%
In West Asia and Pakistan (the most plausible ancient origin of the trait), we see how the ancestral #1 variant is generally dominant, with the only exception of the Kalash, reaching the highest frequencies (18%) among the Burusho and Palestinians, among the studied populations. 
This pattern is continued (at overall quite higher frequencies) in Central Europe, Denmark, Italy and Galicia, with peak among the Swiss (69%). Instead the derived haplotype #2 seems dominant among Swedes, Russians and Orcadians. French and Basques are balanced for both types.

Update (Jun 25): map:

Includes also Kurdish data from Palisto’s update.

The two Balochi samples are pooled in one (same weight for each), instead the two Italian samples were retained separated and assumed to be from South and North Italy respectively (not sure but makes sense). 

See also:

Update (Jun 27): Kurdish DNA just published the HERC2 data a much wider sample of populations from all Eurasia and not anymore focusing only on the blue eye haplotypes but all them instead.

It is very interesting that ht3, ancestral to blue eyes’ haplotypes ht1 and, through this one, also ht2 , is widespread through the continent with very few exceptions: Russians, Belorussians, Lithuanians and a Mordvin tribe in Europe, as well as the Kurmi, Nihali, Chenchu and Puliyar in India.

Ht5 and ht6 are also very common in Eurasia, ht7 is rare in most groups but dominant in a few (Kurmi, Melanesians) while ht4 (ancestral to ht3) is rather rare as well (highest in South and Central Asia, as well as Lebanon). Other (undetermined) haplotypes are also concentrated in some populations like the Chenchu and have some importance across Asia.


Posted by on June 9, 2013 in pigmentation, West Eurasia


Algerian haploid genetics

This new study has particular interest for data miners willing to dig in the supplemental materials. It also has some other points of interest that I will discuss below and its general approach is loosely alright. However there are many nuances to be discussed in depth on the very complex NW African genetic landscape in which their tentative conclusions seem to lack enough depth of analysis (who grabs too much, squeezes little). Hence the complexity is too big for me to go issue by issue offering a criticism, so I will leave most of that open for the discussion, if the readers wish so.

Asmadan Bekada et al., Introducing the Algerian Mitochondrial DNA and Y-Chromosome Profiles into the North African Landscape. PLoS ONE 2013. Open accessLINK [doi:10.1371/journal.pone.0056775]
Mitochondrial DNA
The mtDNA landscape of Algeria and Northwest Africa is dominated (using HVS-I only to estimate it) by R-CRS (“H/HV” in table S2) with levels of 18-34% (29% in Algeria) almost comparable to Western Europe (~45%). This fraction we know from previous studies to be composed almost only by H1, H3, H4 and H7, all them attributed by Cherni to be originated (judging on diversity) in SW Europe (Iberia, France). Along with them HV0/V (7% in Algeria, 5-9% regionally) must be mentioned as also plausibly to be from that part of Europe (4-7%).
Another notable lineage is U6 (typical and most diverse in NW Africa), which reaches frequencies of 11% in Algeria (somewhat less in neighboring countries). Outside this area is only notable in Levant (~1%) and Iberia (~1,4%).
M1 reaching 7% in Algeria (~1-4% elsewhere in NW Africa, <1% in Europe and Highland West Asia, 1.2% in Levant, 2.4% in Peninsular Arabia) is also very much worth a mention, especially because the authors find an specifically NW African node centered in Algeria (HT2):

Figure 3. Reduced median network relating HVS-1 sequences of subhaplogroup M1.
(…) Black circles correspond to haplotypes observed in Algeria, whereas grey triangles pentagons correspond to lineages found in Egypt. Haplotype observed both in Algeria and Egypt are indicated using a black triangle. Grey circles indicate haplotypes observed in other geographical regions. (…)
The pattern suggests an Egypt-centered expansion for this lineage, however notice that East African M1 was not considered. 
Synthesis of mtDNA haplogroups or paragroups found in NW Africa at frequencies >2.5% (see table S2 for details and the many low frequency lineages as well), nomenclature as in table S2 (but some annotations in [square brackets] by me), frequencies for Algeria first (in brackets NW African range):
  • HV/H[R-CRS]: 28.8% (17.9-34.2%)
  • HV0/HV0a/V: 6.7% (4.6-8.3%)
  • R0a: 0.8% (0.8-3.2%)
  • U3*: 3.2% (1.1-3.2%)
  • U6a[U6a*]: 1.9% (1.9-7.8%)
  • U6a1’2’3: 9.4% (2.6-9.4%)
  • K*: 1.6% (0.7-4.8%)
  • T1a: 3.5% (0.0-5.6%)
  • T2b*: 1.9% (0.0-2.2%)
  • J[*]/J1c/J2[*]: 3.8% (1.3-3.8%)
  • M1[*]: 7.3% (0.7-7.3%)
  • L3b[*]: 0.3% (0.3-2.8%)
  • L3b1a3: 1.3% (0.0-2.8%)
  • L3e5: 1.6% (0.0-2.9%)
  • L2*: 0.5% (0.0-4.1%)
  • L2a[*]: 0.8% (0.0-3.2%)
  • L2a1*: 1.3% (0.7-4.8%)
  • L2a1b: 1.3% (0.8-3.5%)
  • L2d: 0.0% (0.0-2.8%)
  • L1b*: 3.0% (2.7%-9.0%)
Notice that in nearly all cases L(xM,N) highest frequency correspond to West Sahara. The exceptions are L2a* (Tunsian “Andalusians”) and L3e5 (Tunisians), suggesting maybe a local NW African deep rooting rather than ancient or recent flows from Tropical Africa. There are other lineages in the low frequency range in similar situation.
For this and other reasons I decided to color-code the list above according to my best guess about the origin of each lineage: NW African in deep red, Tropical African in brown, Egyptian in light brown, West Asian in green and European in blue. Unclear cases I left in black type.
Y chromosome DNA
Algerian and NW African Y-DNA is overwhelmingly dominated by E1b1b1b (M81), reaching 44% in Algeria (44-67% in the region), which is a NW African specific lineage. The second most important lineage by frequency is J1 (M304) with 22% in Algeria (0-22% in the region, 6-22% if we exclude Libya). None of the rest of the lineages reaches 7%, excepted E1b1b1c (M123) but only in West Sahara (11%, elsewhere it is very minor).
List of Y-DNA haplo-/paragroups with frequencies above 2.5% anywhere in NW Africa follows (based on table S6). Same notation as with mtDNA (Algerian frequency first, NW African range in brackets):
  • E1a (M33): 0.6% (0.0-5.3%)
  • E1b1[*] (P2): 5.2% (0.7-38.6%)
  • E1b1b1[*] (M35): 0.6% (0.0-4.2%)
  • E1b1b1a4 (V65): 1.9% (0.0-4.8%)
  • E1b1b1b (M81): 44.2% (44.2-67.4%)
  • E1b1b1c (M123): 1.3% (0.0-11.1%)
  • F[*] (M89): 3.9% (0.0-3.9%)
  • J1 (M267): 21.8% (0.0-21.8%)
  • J2a2 (M67): 3.9% (0.0-3.9%)
  • R1b1a (V88): 2.6% (0.9-6.9%)
  • R1b1b1a1b[*] (U198): 2.6% (0.0-2.6%)
  • R1b1b1a1b1 (U152): 2.6% (0.0-2.6%)
For more diverse samples of NW African Y-DNA (from previous studies), Wikipedia has a nice table.
I would like to highlight the problematic of J1 in Africa in general (including NW Africa). While there is no reasonable doubt that J1 as a whole originated in West Asia, it is found at rather high frequencies in East/NE Africa (Sudan, the Horn, Upper Egypt) and NW Africa with only very limited (at best) company by J2. Instead West Asian populations show a much more balanced apportion of the two major J sublineages, even in Saudi Arabia the J1:J2 proportion is of 8:3, almost 2:1. We do see this kind of apportioning in Lower Egypt, suggesting a “recent” (Neolithic or later) demic colonization from West Asia but we see exactly but nowhere else in Africa, where J1 is found always much more frequently than J2 (if the latter is found at all). 
In my understanding this excludes colonization from West Asia after the Pre-Pottery Neolithic B, which seems the most plausible scenario for the spread of “Highlander” J2 into “Lowland” West Asia (probably dominated by J1 initially). So J1 in Africa (excepted Lower Egypt) cannot be argued easily to be of “recent” Neolithic, much less Semitic or Arab origin: it must be older. 
Also Ethio Helix commented in this very interesting discussion at his blog that Tofanelli 2009 found low diversity on NW African J1. However, to my knowledge, nobody has looked at NE/East African J1 diversity nor a proper study has been done on the substructure of this lineage in Africa. This leaves wide open the possibility that NW African J1 has a NE African origin, surely related to the expansion of Capsian culture or internal African Neolithic flows. 
While this matter is not properly addressed, researchers will oversimplify and imagine J1 as simply West Asian influx. It is ultimately of course but I strongly suspect that it has a secondary and distinct NE African center at the Nile basin and this is being totally ignored. 
This study offers several rough comparisons with nearby regions (but not West Africa), however they oversimplify some stuff (the already mentioned Y-DNA J1 or assigning all mtDNA L(xM,N) to East Africa, when it seems obvious that some lineages may be deeply rooted in NW Africa or others probably come from West Africa). For whatever it is worth anyhow, here there are two such questionable comparisons:

Table 2. Geographic components (%) considered in Y-chromosome and mtDNA lineages.

Figure 2. Graphical relationships among the studied populations.
PCA plots based on mtDNA (a) and Y-chromosome (b) polymorphism. Codes are as in Supplementary Tables S2 and S6.

See also:


Posted by on February 23, 2013 in African genetics, mtDNA, North Africa, West Eurasia, Y-DNA


Lineages of West Asia compared to Africa and Europe

Just a quick mention of this new paper on the matri- and partilineages of West Asia:
Danielle A. Badro et al., Y-Chromosome and mtDNA Genetics Reveal Significant Contrasts in Affinities of Modern Middle Eastern Populations with European and African Populations. PLoS ONE 2013. Open accessLINK [doi:10.1371/journal.pone.0054616]


The Middle East was a funnel of human expansion out of Africa, a staging area for the Neolithic Agricultural Revolution, and the home to some of the earliest world empires. Post LGM expansions into the region and subsequent population movements created a striking genetic mosaic with distinct sex-based genetic differentiation. While prior studies have examined the mtDNA and Y-chromosome contrast in focal populations in the Middle East, none have undertaken a broad-spectrum survey including North and sub-Saharan Africa, Europe, and Middle Eastern populations. In this study 5,174 mtDNA and 4,658 Y-chromosome samples were investigated using PCA, MDS, mean-linkage clustering, AMOVA, and Fisher exact tests of FST’s, RST’s, and haplogroup frequencies. Geographic differentiation in affinities of Middle Eastern populations with Africa and Europe showed distinct contrasts between mtDNA and Y-chromosome data. Specifically, Lebanon’s mtDNA shows a very strong association to Europe, while Yemen shows very strong affinity with Egypt and North and East Africa. Previous Y-chromosome results showed a Levantine coastal-inland contrast marked by J1 and J2, and a very strong North African component was evident throughout the Middle East. Neither of these patterns were observed in the mtDNA. While J2 has penetrated into Europe, the pattern of Y-chromosome diversity in Lebanon does not show the widespread affinities with Europe indicated by the mtDNA data. Lastly, while each population shows evidence of connections with expansions that now define the Middle East, Africa, and Europe, many of the populations in the Middle East show distinctive mtDNA and Y-haplogroup characteristics that indicate long standing settlement with relatively little impact from and movement into other populations.

Maybe most interesting is this map:
Figure 1. Geographic distribution of mtDNA haplogroups.
Frequencies distribution from the current study and from the published data [30], [31], [35][48] as reported in Table 1.
A very important issue with this map is that “Malians” and “Burkinabe” are actually Tuaregs from those countries (samples taken from Pereira 2010), hence their large fractions of Eurasian lineage H (H1 in fact). Also I am a bit perplex at the large portions of “other”, which in the region considered can only be: L4, L5, L6 (all them small lineages from Africa and Yemen), R0(xH,V), N1, W, X and M1 (important in West Asia and some parts of Africa – I wonder why they are not listed on their own right) and maybe some other very rare lineages.
Regarding Y-DNA (and also largely mtDNA) the study focuses on statistical comparisons, not providing any comprehensive table nor map of haplogroup distribution. However for those interested in data mining the whole list of haplotypes (with purported haplogroup) of this study is available in table S2.
Leave a comment

Posted by on January 31, 2013 in African genetics, mtDNA, West Asia, West Eurasia, Y-DNA