Category Archives: Y-DNA

Ancient DNA from Clovis culture is Native American (also Tianyuan affinity mystery)

Figure 4 | [c] (…) maximum likelihood tree. 
A recent study on the ancient DNA of human remains from Anzick (Montana, USA), dated to c. 12,500 calBP, confirms close ties to modern Native Americans, definitely discarding the far-fetched and outlandishly Eurocentric “Solutrean hypothesis” for the origins of Clovis culture (what pleases me greatly, I must admit).
While this fits well with the expectations (at least mine), there is some hidden data that has surprised me quite a bit: it sits at the bottom of a non-discussed formal test graph in which modern populations are compared with both Anzick and Tianyuan (c. 40,000 BP, North China). See below.
Morten Rasmussen et al., The genome of a Late Pleistocene human from a Clovis burial site in western Montana. Nature 2014. Pay per viewLINK [doi:10.1038/nature13025]


Clovis, with its distinctive biface, blade and osseous technologies, is the oldest widespread archaeological complex defined in North America, dating from 11,100 to 10,700 14C years before present (bp) (13,000 to 12,600 calendar years bp)1, 2. Nearly 50 years of archaeological research point to the Clovis complex as having developed south of the North American ice sheets from an ancestral technology3. However, both the origins and the genetic legacy of the people who manufactured Clovis tools remain under debate. It is generally believed that these people ultimately derived from Asia and were directly related to contemporary Native Americans2. An alternative, Solutrean, hypothesis posits that the Clovis predecessors emigrated from southwestern Europe during the Last Glacial Maximum4. Here we report the genome sequence of a male infant (Anzick-1) recovered from the Anzick burial site in western Montana. The human bones date to 10,705 ± 35 14C years bp (approximately 12,707–12,556 calendar years bp) and were directly associated with Clovis tools. We sequenced the genome to an average depth of 14.4× and show that the gene flow from the Siberian Upper Palaeolithic Mal’ta population5 into Native American ancestors is also shared by the Anzick-1 individual and thus happened before 12,600 years bp. We also show that the Anzick-1 individual is more closely related to all indigenous American populations than to any other group. Our data are compatible with the hypothesis that Anzick-1 belonged to a population directly ancestral to many contemporary Native Americans. Finally, we find evidence of a deep divergence in Native American populations that predates the Anzick-1 individual.

Haploid DNA
The Y-DNA lineage of Anzick is Q1a2a1* (L54) to the exclusion of the common Native American subhaplogroup Q1a2a1a1 (M3). Among the modern compared sequences that of a Maya is the closest one.

The mtDNA belongs to the common Native American lineage D4h3a at its underived stage (root). 
For starters I must explain that these underived haplotypes can only be found within mtDNA and never in modern Y-DNA (common misconception) because this one accumulates mutations every single generation, while the much shorter mtDNA does only occasionally. Hypothetically we could find the exact ancestor of some modern Y-DNA haplogroup in ancient remains but that would be like finding the proverbial needle in the haystack. On the other hand, finding the underived stage in mtDNA, be it ancient or modern, does not mean that we are before a direct ancestor but just a non-mutated relative of her, who can be very distant in fact.

Autosomal DNA

In this aspect, the Anzick man shows clearly strongest affinities to Native Americans, followed at some distance by Siberian peoples, particularly those near the Bering Strait. 

Figure 2 | Genetic affinity of Anzick-1. a, Anzick-1 is most closely related to Native Americans. Heat map representing estimated outgroup f3-statistics for shared genetic history between the Anzick-1 individual and each of 143 contemporary human populations outside sub-Saharan Africa. (…)
However Anzick-1 shows clearly closer affinity to the aboriginal peoples of Meso, Central and South America (collectively labeled as SA) and less so to those of Canada and the American Arctic (labeled as NA). No data was available from the USA. 
This was pondered by the authors in several competing models of Native American ancestry:
Figure 3 | Simplified schematic of genetic models. Alternative models of the population history behind the closer shared ancestry of the Anzick-1 individual to Central and Southern American (SA) populations than Northern Native American (NA) populations; seemain text for further definition of populations. We find that the data are consistent with a simple tree-like model in which NA populations are historically basal to Anzick-1 and SA. We base this conclusion on two D-tests conducted on the Anzick-1 individual, NA and SA. We used Han Chinese as outgroup. a, We first tested the hypothesis that Anzick-1 is basal to both NA and SA populations using D(Han, Anzick-1; NA, SA). As in the results for each pairwise comparison between SA and NA populations (Extended Data Fig. 4), this hypothesis is rejected. b, Next, we tested D(Han, NA; Anzick-1, SA); if NA populations were a mixture of post-Anzick-1 and pre-Anzick-1 ancestry, we would expect to reject this topology. c, We found that a topology with NA populations basal to Anzick-1 and SA populations is consistent with the data. d, However, another alternative is that the Anzick-1 individual is from the time of the last common ancestral population of the Northern and Southern lineage, after which the Northern lineage received gene flow from a more basal lineage.
The most plausible model they believe is “c”, in which Anzick-1 is close to the origin of the SA population, while NA diverged before him. However model “d” in which Anzick-1 is close to the overall Native American root but NA have received further inputs from a mystery population (presumably some Siberians, related to the Na-Dené and Inuit waves) is also consistent with the data. Choosing between both “consistent” models (or something in between) clearly requires further investigation. 

Tianyuan and East Asian origins
All the above is very much within expectations, although refreshingly clarifying. But there is something in the formal tests (extended data fig. 5) that is most unexpected (but not discussed in the paper). 
The formal f3 tests of ED-fig.5 a to e fall all within reasonable expectations. Maybe the most notable finding is that, after all, the pre-Inuit people of the Dorset culture (represented by the Saqqaq remains) left some legacy in Greenland, but they also show some extra affinity with several Siberian populations (notably the Naukan, Chukchi, Koryak and Yukaghir, in this order) before to any other Native Americans, including Aleuts). 
But the really striking stuff is in figs. f and g, where it becomes obvious that the Tianyuan remains of Northern China show not a tad of greater affinity to East Asians (nor to Native Americans) than to West Eurasians. Also two East Asian populations (Tujia and Oroqen) are considerably more distant than the bulk of East Asian peoples to Tianyuan but also to Aznick.
Extended Data Figure 5 | Outgroup f3-statistics contrasted for different combinations of populations. (…) f, g, Shared genetic history with Anzick-1 compared to shared genetic history with the 40,000-year-old Tianyuan individual from China.
This is very difficult to explain, more so as Tianyuan’s mtDNA haplogroup B4’5 is part of the East Asian and Native American genetic pool, and the authors make no attempt to do it. 
The previous study by Qiaomei Fu et al. (open access) placed Tianyuan’s autosomal DNA near the very root of Circum-Pacific populations (East Asians, Native Americans and Australasian Aborigines) but after divergence from West Eurasians:
From Qiaomei Fu 2013
They even had doubts about the position of Papuans (the only Australasian representation) in that tree, which they suspected an artifact of some sort.
Since I saw that graph (h/t to an anonymous commenter at Fennoscandian Ancestry) I am squeezing my brain trying to figure out a reasonable explanation, considering that the formal f3 test has almost certainly more weight than the ML tree made with an algorithm. 
My first tentative explanation would be to imagine a shared triple-branch origin for Tianyuan, East Asians and West Eurasians, maybe c. 60 Ka ago (it must have been before the colonization of West Eurasia), to the exclusion of other, maybe isolated, ancient populations, whose admixture with the ancestors of the Tujia, Oroqen and Melanesians (maybe via Austronesians?) causes those striking low affinity values for these.
This would be a similar mechanism to the one explaining lower Tianyuan (and generally all ancient Eurasian) affinity for Palestinians (incl. Negev Bedouins) and also the Makrani, who have some African admixture and (in the Palestinian case) also, most likely, residual inputs from the remains of the first Out-of-Africa episode in Arabia.
However to this day we have no idea of which could be those hypothetical ancient isolated populations of East Asia. In normal comparisons such as ADMIXTURE analysis the Tujia and Oroqen appear totally normal within their geographic context, but this may be an artifact of not doing enough runs to reach higher K values, according to the cross-validation test, much more likely to discern the actual realistic components. 
The matter certainly requires further research, which may well open new avenues for the understanding the genesis of Eurasian populations, particularly those from the East.

Italian haploid genetics (second round)

More than a year ago I commented (as much as I could) on the study of Italian haploid genetics by Francesca Brisighelli et al. Sadly the study was published with several major errors in the figures, making it impossible to get anything straight. 
I know directly from the lead author that the team has been trying since then to get the paper corrected but this correction was once and again delayed by apparent inefficiency of PLoS ONE’s management, much to their frustration. Finally this week the correction has been published and the figures corrected.
So let’s give this study another chance:
Francesca Brisighelli et al., Uniparental Markers of Contemporary Italian Population Reveals Details on Its Pre-Roman Heritage. PLoS ONE 2012 (formally corrected in February 2014). Open accessLINK [doi:10.1371/journal.pone.0050794]
Notice please that you have to read the formal correction in order to access the new figures, the wrong ones are still in the paper as such. 
The corrected figures are central to the study:

Figure 1 (corrected). Map showing the location of the samples analyzed in the present study and those collected from the literature (see Table 1).
charts on the left display the distribution of mtDNA haplogroup
frequencies, and those on the right the Y-chromosome haplogroup

So now we know that the Northern mtDNA pie was duplicated in the original graph and that Central Italians are outstanding in R0(xH,V), which reaches 14% (probably most HV*), while they have some other peculiarities relative to their neighbors from North and South: some less U and no detected V. 
Other variations are more clinal: H decreases from North to South while J and T do the opposite.

Figure 3 (corrected). Phylogeny of Y-chromosome SNPs and haplogroup frequencies in different Italian populations.

In the Y-DNA side, the most obvious transition is between the high frequencies of R1b1a2-M269 (R1b3 in the paper) in the North versus much lower frequencies in the South. But also:
  • J2 is notorious in the Central region (and also the South) but rare in the North.
  • G frequencies in the South are double than those of Center and North.
  • The same happens with lesser intensity regarding E1b1b1-M35 (E3b in the study).
  • In contrast haplogroup I is most common in the North. However the Sardinian and sub-Pyrenean clade I2a1a-M26 (I1b2 in the paper), which is also the one documented in Chalcolithic Languedoc, is rare in all regions.

The study also deals with several isolated populations:

Figure 4. Haplogroup frequencies of Ladins, Grecani
Salentini and Lucera compared to the rest of the Italian populations
analyzed in the present study.

All them show large frequencies of mtDNA H relative to their regions. The Grecani Salentini do have some extra Y-DNA E1b1b1 (E3b) and J2, what may indeed underline their partial Greek origins. The Ladini show unusually high frequencies of R1b*(xR1b1a2) and K*(xR1a,R1b,L,T,N3), while the Lucerans are outstanding in their percentage of G.
I want to end this entry with a much needed scolding to the staff of PLoS ONE for their totally unacceptable original sloppiness and delay in the correction. And my personal thanks and appreciation to Francesca Brisighelli for her indefatigable persistence and enthusiasm for her work, which is no doubt of great interest.

La Braña 1 carried the very rare Y-DNA haplogroup C (possibly C6-V20)

La Braña 1 without makeup
(Check for the updates below, please).

The late Epipaleolithic forager from NW Iberia (previously discussed here) had the patrilineal haplogroup C6, found so far only very rarely among modern Europeans (Scozzari 2012). This, I must say, I know by the moment only from secondary sources (Eurogenes, Dienekes and a personal communication) because I have not been able yet to put my hands on the relevant paper and this key detail is not mentioned in the abstract.

Iñigo Olalde et al., Derived immune and ancestral pigmentation alleles in a 7,000-year-old Mesolithic European. Nature 2014. Pay per viewLINK [doi:10.1038/nature12960]

freely available supplementary materials.


Ancient genomic sequences have started to reveal the origin and the demographic impact of farmers from the Neolithic period spreading into Europe1, 2, 3. The adoption of farming, stock breeding and sedentary societies during the Neolithic may have resulted in adaptive changes in genes associated with immunity and diet4. However, the limited data available from earlier hunter-gatherers preclude an understanding of the selective processes associated with this crucial transition to agriculture in recent human evolution. Here we sequence an approximately 7,000-year-old Mesolithic skeleton discovered at the La Braña-Arintero site in León, Spain, to retrieve a complete pre-agricultural European human genome. Analysis of this genome in the context of other ancient samples suggests the existence of a common ancient genomic signature across western and central Eurasia from the Upper Paleolithic to the Mesolithic. The La Braña individual carries ancestral alleles in several skin pigmentation genes, suggesting that the light skin of modern Europeans was not yet ubiquitous in Mesolithic times. Moreover, we provide evidence that a significant number of derived, putatively adaptive variants associated with pathogen resistance in modern Europeans were already present in this hunter-gatherer.

Relevance for the overall understanding of macro-haplogroup C
Until the discovery of this C6 lineage, there were some strong reasons to suspect that Y-DNA C may have coalesced already in SE Asia or, at least, very close to it, with its subclades forming by pairs a three pointed star with geographical center in that area: C1 and C3 in NE Asia (and America), C2 and C4 in Wallacea and Australasia and C5 and some rather homogeneous C* in India.
The discovery of this C6 lineage and its confirmation as a Paleolithic one in Europe (i.e. not a “recent” arrival from somewhere else) add phylogenetic weight to the Western geography of haplogroup C, one of two main subdivisions of the main non-African Y-DNA lineage CF. However we cannot yet reach to conclusions about the “exact” origins of C because the macro-lineage still awaits improvement of its phylogenetic structure at the basal levels.
In plain language: it is quite likely that C2 and C4 form a monophyletic clade and I would not be surprised at all if C1 and C3 do the same. But then it is also possible that C5 and the Indian C* and/or the European C6 also form their own distinct branches. It is even possible that some of these lineages are related across subcontinental regions, as was recently found within MNOPS (aka K(xLT)). So we need first to know how they relate with each other a the top phylogenetic level before we can rush to any conclusion. In any case the discovery of C6 adds some preliminary weight to the hypothesis of C coalescing when still in South Asia.

Pigmentation genetics

There have been some rush to conclusions on the pigmentation of this and another Western European hunter-gatherer based only on genetics. I think that some of the conclusions are most likely incorrect, at least to some extent, because they are based on a SNP which only weights ~15% on skin coloration.

Judging on the figures (freely accessible, it seems), La Braña 1 carried two pigmentation alleles of gene SLC45A2 now rare among Europeans (but common elsewhere, i.e. the ancestral variant):

  • rs16891982, which affects hair color (7x chances of black hair among Europeans)
  • rs1426654, which affects skin pigmentation to some degree (correlated with skin color in Indians, irrelevant among modern Europeans because of fixation, weights only ~15% in Cape Verdeans’ skin coloration). 

Notice that while you can find online reconstructions that give La Braña 1 a very dark coloration, this is not necessarily the case at all but rather an oversimplistic  interpretation based only on one allele, allele that is not just dominant in West Asians and Europeans but also, for example, among Gujaratis, who are quite dark for European standards.

    It seems correct anyhow that this allele was only brought to Europe with Neolithic farmers (Stuttgart had it) but its alleged effect on pigmentation seems very much exaggerated.

    Fig. 4 from Beleza 2013 highlights that no single gene is decisive in skin pigmentation.

    It is probable anyhow that La Braña 1 had black hair.
    It is much more plausible that he had blue eyes because these are much more directly regulated by simple genetics.
    Continuity of immunity genetics
    La Braña 1 also had three immunity related alleles (derived variants) that have been retained at least to some extent by modern Europeans:
    • rs2745098 (PTX4)
    • rs11755393 (UHRF1BP1, related to lupus)
    • rs10421769 (GPATCH1)
    Comparison with global populations
    Fig. 5 (ED) offers various comparisons of La Braña 1 and Mal’ta 1 (from Siberia) with modern humans from around the World:

    Extended Data Figure 5: Pairwise outgroup f3 statistics.
    a, Sardinian versus Karitiana. b, Sardinian versus Han.
    c, La Braña 1 versus Mal’ta. d, Sardinian versus Mal’ta.
    e, La Braña 1 versus Karitiana. The solid line represents y = x.
    We can see in them that, La Braña 1 clusters well with modern Europeans, while Mal’ta instead strongly tends towards other Asians, often clustering with Pakistanis (“Central/South Asia” metapopulation).
    Maybe the most interesting graph is c, where we can see how the various populations deviate from the y=x line in the direction of La Braña (Europeans, West Asians) or Mal’ta (Native Americans particularly).
    Comparison with Neolithic samples and modern Europeans

    Extended Data Figure 4: Allele-sharing analysis.
    Each panel shows the allele-sharing of a particular Neolithic sample from refs 1 and 3 with La Braña 1 sample. The sample IDs are presented in the upper left of each panel (Ajv52, Ajv70, Ire8, Gok4 and Ötzi). In the upper right of each panel, the Pearson’s correlation coefficient is given with the associated P value.

    In all cases Swedes (SE), followed by Polish (PL), etc. share the greatest amount of alleles with La Braña 1, although I’m not sure if the differences are really that relevant (is really 69.3% significantly different from 68.7%?)
    In the vertical scale we can observe how the various populations tend more or less strongly towards various Neolithic samples (again with the same doubts about the significance of the differences). In the first row they are compared with Götland’s Pitted Ware individuals (of plausible Eastern European origins: strong cultural connections with Dniepr-Don Neolithic). Here Central Europeans show the greatest affinity with Ajv52 and Ajv70 (Basques Bulgarians also score high). There are some differences in the case of individual Ire8, whose closest modern relatives seem to be the Dutch. Swedes only score high re. Ajv52 but low to the others, while Finns score neutral-to-low relative to all them.
    The lower row compares with to mainstream Neolithic samples: Gok4 was a Megalithic farmer from SW Sweden and Ötzi was a Chalcolithic shepherd from Southern Tirol. The Swedish farmer is best approached by the Dutch, followed by various West-Central Europeans, while Basques Bulgarians, Finns and Swedes score low here. In the case of Ötzi nobody scores particularly high (some tendency in Switzerland and nearby areas), while Finns score clearly low.
    And that’s all I can say without direct access to the study. Enjoy.

    Update: I already got the paper (thanks again to the donor), I’ll see to update as need be once I have time to read it. Minor urgent edits above in red (and slashed out text).

    Update (Jan 29): The supplementary data is freely available (LINK) but I could not find it earlier. Almost all the information is in it, including a long list, much longer than mentioned above, of the SNPs found in La Braña 1, compared to various modern population frequencies. I don’t have time right now to dwell on it but I guess from a first read that I will have to amend some comments made on the issue of pigmentation above.

    Regarding the Y-DNA haplogroup, it is important to notice that its adscription withing haplogroup C seems very clear but its assignation to C6-V20 is more dubious because of the low quality of the genome. Only the V20 marker could be assigned, so the authors themselves are in doubt and wonder if it could alternatively be C* or C5, both with a South Asian affinity.

    In this sense I think it is worth noticing that the reference Y-DNA site ISOGG has recently revised the phylogeny of macro-haplogroup C and that they have already renamed C6-V20 as C1a2, making it a relative of the minor Japanese lineage earlier known as C1 (now renamed to C1a1), similarly South Asian C5-M356 has been renamed to C1b. So C1 is now perceived as a lineage that spans all Eurasia with an arguable South Asian centrality.

    Another (Papuan?) lineage once known as “C6” has long vanished from the phylogeny because of lack of plural samples, I understand.


    Human Y chromosome undergoes purifying selection

    A somewhat technical yet interesting study on Y chromosome evolution in humans:

    Melissa A. Wilson Sayres et al., Natural Selection Reduced Diversity on Human Y Chromosomes. PLoS ONE 2014. Open accessLINK [doi:10.1371/journal.pgen.1004064]


    The human Y chromosome exhibits surprisingly low levels of genetic diversity. This could result from neutral processes if the effective population size of males is reduced relative to females due to a higher variance in the number of offspring from males than from females. Alternatively, selection acting on new mutations, and affecting linked neutral sites, could reduce variability on the Y chromosome. Here, using genome-wide analyses of X, Y, autosomal and mitochondrial DNA, in combination with extensive population genetic simulations, we show that low observed Y chromosome variability is not consistent with a purely neutral model. Instead, we show that models of purifying selection are consistent with observed Y diversity. Further, the number of sites estimated to be under purifying selection greatly exceeds the number of Y-linked coding sites, suggesting the importance of the highly repetitive ampliconic regions. While we show that purifying selection removing deleterious mutations can explain the low diversity on the Y chromosome, we cannot exclude the possibility that positive selection acting on beneficial mutations could have also reduced diversity in linked neutral regions, and may have contributed to lowering human Y chromosome diversity. Because the functional significance of the ampliconic regions is poorly understood, our findings should motivate future research in this area.

    Positive selection (or directional selection) happens when a variant gets so good that everything else becomes bad by comparison. This may be just because an environmental change, possibly caused by migration (or whatever other reason) substantially alters the rules of the game. Much more rarely a novel mutation (or accumulation of several of them) may happen to generate a phenotype that is much more fit even for pre-existent conditions. As I understand it, positive selection does happen only rarely (but spectacularly). An example in humans is the selection of whiter skin shades in latitudes far away from the tropics (because of the “photosynthesis” of vitamin D in the skin, crucial for early brain development), another more generalized one is the selection for improved brains (not necessarily just bigger), able to face changing conditions more dynamically and develop more efficient tools and weapons.
    Purifying selection (or negative selection) is quite different and surely much more common. As novel mutations arise randomly, in at least many cases, the vast majority I dare say, they happen to be harmful for a previously well-tuned genotype (and its derived phenotype). As result, the carriers have decreased opportunities for reproduction, when they don’t just die right away. Natural selection acts mostly this way and in many cases the types can become very stable for this reason, as happens with genera that have been successful on this planet since long before humankind arose, such as sharks or crocodiles.
    This last is what seems to be happening to the human Y chromosome: novel mutations are at least quite often harmful (maybe they cause sterility or whatever other traits in the male that cause decreased reproductive efficiency) and they are regularly pruned off the tree by natural selection. 

    Purifying selection slows down the effective mutation rate

    Interestingly the authors mention that:

    … if purifying selection is the dominant force on the Y chromosome, the topology of the tree should remain intact, but the coalescent times are expected to be reduced.

    That would be, I understand, because the observed mutation rate has little relation with the actual accumulated (effective) mutation rate, which is much slower because of the continuous pruning of the negative selection.
    Purifying selection has also been observed in the mitochondrial DNA, having the same kind of slowing impact on the “molecular clock”.

    Posted by on January 26, 2014 in evolution, human evolution, molecular clock, Y-DNA


    Ancient European DNA and some debatable conclusions

    There is a rather interesting paper still in preparation available online and causing some debate.
    Iosif Lazaridis, Nick Patterson, Alissa Mittnik, et al., Ancient human genomes suggest three ancestral populations for present-day Europeans. BioArxiv 2013 (preprint). Freely accessibleLINK [doi:10.1101/001552]


    Analysis of ancient DNA can reveal historical events that are difficult to discern through study of present-day individuals. To investigate European population history around the time of the agricultural transition, we sequenced complete genomes from a ~7,500 year old early farmer from the Linearbandkeramik (LBK) culture from Stuttgart in Germany and an ~8,000 year old hunter-gatherer from the Loschbour rock shelter in Luxembourg. We also generated data from seven ~8,000 year old hunter-gatherers from Motala in Sweden. We compared these genomes and published ancient DNA to new data from 2,196 samples from 185 diverse populations to show that at least three ancestral groups contributed to present-day Europeans. The first are Ancient North Eurasians (ANE), who are more closely related to Upper Paleolithic Siberians than to any present-day population. The second are West European Hunter-Gatherers (WHG), related to the Loschbour individual, who contributed to all Europeans but not to Near Easterners. The third are Early European Farmers (EEF), related to the Stuttgart individual, who were mainly of Near Eastern origin but also harbored WHG-related ancestry. We model the deep relationships of these populations and show that about ~44% of the ancestry of EEF derived from a basal Eurasian lineage that split prior to the separation of other non-Africans.

    Haploid DNA
    The Lochsbour skull.
    The prominent browridge
    is very unusual for
    Paleolithic Europeans.
    The new European hunter-gatherer samples carried all Y-DNA I and mtDNA U5a and U2e.
    More specifically, the hunter-gatherer mtDNA lineages are:
    • Lochsbour (Luxembourg): U5b1a
    • Motala (Sweden):
      • Motala 1 & 3: U5b1a
      • Motala 2 & 12: U2e1
      • Motala 4 & 6: U5a2d
      • Motala 9: U5a2
    Additionally the Stuttgart Linear Pottery farmer (female) carried the mtDNA lineage T2c1d1.
    The Y-DNA lineages are:
    • Lochsbour: I2a1b*(xI2a1b1, I2a1b2, I2a1b3)
    • Motala 2: I*(xI1, I2a2,I2a1b3)
    • Motala 3: I2*(xI2a1a, I2a2, I2b)
    • Motala 6: uncertain (L55+ would make it Q1a2a but L232- forces it out of Q1)
    • Motala 9: I*(xI1)
    • Motala 12: I2a1b*(xI2a1b1, I2a1b3)
    These are with certainty the oldest Y-DNA sequences of Europe so far and the fact that all them fall within haplogroup I(xI1) supports the notion of this lineage being once common in the subcontinent, at least in some areas. Today I2 is most common in Sardinia, the NW Balcans (Croatia, Bosnia, Montenegro), North Germany and areas around Moldavia.
    I2a1b (which may well be all them) is currently found (often in large frequencies) in the Balcans and Eastern Europe with some presence also in the eastern areas of Central Europe. It’s relative I2a1a is most common in Sardinia with some presence in SW Europe, especially around the Pyrenees. I2a1 (probably I2a1a but not tested for the relevant SNPs) was also found, together with G2a, in a Chalcolithic population of the Treilles group (Languedoc) and seems to be somehow associated to Cardium Pottery Neolithic.
    If you want my opinion, I’d think that I2a before Neolithic was dominant, like mtDNA U5 (and satellites U4 and U2e), in much of Central and Eastern Europe but probably not in SW Europe, where mtDNA U5 seems not so much hyper-dominant either, being instead quite secondary to haplogroup H (at least in Western Iberia). But we’ll have to wait until geneticists manage to sequence Y-DNA in several SW European Paleolithic remains to be sure.

    Autosomal DNA and derived speculations
    Most of the study (incl. the must-read supplemental materials) deals however with the autosomal DNA of these and other hunter-gatherers, as well as of some Neolithic farmers from Central Europe and Italy (Ötzi) and their comparison with modern Europeans. 
    To begin with, they generated a PCA plot of West Eurasians (with way too many pointless Bedouins and Jews, it must be said) and projected the ancient Europeans, as well as a whole bunch of Circum-Pacific peoples on it:
    The result is a bit weird because, as you can see, the East Asians, Native Americans and Melanesians appear to fall way too close to the peoples of the Caucasus and Anatolia. This seems to be a distorting effect of the “projection” method, which forces the projected samples to align relative to a set of already defined parameters, in this case the West Eurasian (modern) PCA. 
    So the projection basically formulates the question: if East Asians, etc. must be forcibly to be defined in West Eurasian (WEA) terms, what would they be? And then answers it as follows: Caucasian/Anatolian/Iranian peoples more or less (whatever the hidden reasons, which are not too clear).
    Similarly, it is possible (but uncertain) that the ancient European and Siberian sequences show some of this kind of distortion. However I have found experimentally that the PCA’s dimension 1 (but not the dimension 2, which corresponds largely to the Asian-specific distinctions) still correlates quite well with the results of other formal tests that the authors develop in the study and is therefore a valuable tool for visualization.
    But this later. By the moment the PCA is asking and answering three or four questions by projecting ancient European and Siberian samples in the West Eurasian plot:
    • If ancient Siberians are forced to be defined in modern WEA terms, what would they be? Answer: roughly Mordvins (Afontova Gora 2) or intermediate between these and North Caucasus peoples (Mal’ta 1).
    • If ancient Scandinavian hunter-gatherers are forced in modern WEA terms, what would they be? Answer: extreme but closest (Skoglund) to Northern European peoples like Icelanders or Lithuanians.
    • If ancient Western European hunter-gatherers are forced in modern WEA terms, what would they be? Answer: extreme too but closest (La Braña 2) to SW European peoples like Basques and Southern French.
    • If ancient Neolithic/Chalcolithic farmers from around the Alps and Sweden are forced in modern WEA terms, what would they be? Answer: Canarians (next close: Sardinians, then Spaniards).
    Whatever the case, there seems to be quite a bit of autosomal diversity among ancient Western hunter-gatherers, at the very least when compared with modern peoples. This makes some good sense because Europe was a big place already in Paleolithic times and must have harbored some notable diversity. Diversity that we may well find to grasp if we only sample people from the same areas once and again.
    On the other hand, they seem to cluster in the same extreme periphery of the European cluster, opposed to the position of West Asians, and therefore suggesting that there has been some West Asian genetic flow into Europe since then (something we all assume, of course). 
    Using Lochsbour as proxy for the WHG (Western hunter-gatherer) component, Mal’ta 1 as proxy for the ANE (ancient north Eurasian) one and Stuttgart as proxy for the EEF (early European farmer) one, they produce the following graph (to which I added an important note in gray):
    The note in gray is mine: highlighting the contradictory position where the other Western hunter-gatherers may fall in because of assuming Lochsbour as valid proxy, when it is clearly very extreme. This was not tested in the study so it is inferred from the PC1, which seems to best approach the results of their formal tests in the WHG vs EEF axis, as well as those of the WHG vs Near East comparisons.
    I tried to figure out how these formal tests are reflected, if at all in the PCA, mostly because the PCA is a much easier tool for comprehension, being so visual. Eventually I found that the dimension 1 (horizontal axis) is very close to the genetic distances measured by the formal tests (excepted those for the ANE component, obviously), allowing a visualization of some of the possible problems caused by their use of Lochsbour as only reference, without any control. Let’s see it:

    The same PCA as above with a few annotations in magenta and green
    While not exactly, the slashed vertical magenta line (median in the dimension 1 between Lochsbour and Stuttgart) approximates quite well the WHG vs EEF values measured in the formal tests. Similarly, the slashed green axis (median in PC1 between Lochsbour and an good looking Bedouin) approximates to a great extent the less precise results of the formal tests the authors applied to guesstimate the West Asian and WHG ancestry of EEFs, which ranged between 60% and almost 100% West Asian (my line is much closer to the 60% value, which seems more reasonable). 
    When I tried to find an alternative median WHG/West Asian line, using Braña 2 and the first non-Euro-drifted Turk I could spot (Anatolia is much more likely to be the direct source of West Asian ancestry in Europe than Bedouins), I got exactly the same result, so no need to plot any second option (two wrongs sometimes do make one right, it seems). But when I did the same with La Braña 2 and Stuttgart I got a genuine good-looking alternative median line, which is the slash-and-dot magenta axis.
    This alternative line is probably a much more reasonable 50% WHG-EEF approximation in fact and goes right through Spain, what makes good sense for all I know.
    Of course the ideal solution would be that someone performed good formal tests, similar to those done in the study, with Braña 2 and/or Skoglund, which should be more similar to the actual WHG ancestry of modern Europeans than the extremely divergent Lochsbour sequence. An obvious problem is that La Braña produced only very poor sequences but, well, use Skoglund instead or sample some Franco-Cantabrian or Iberian other Paleolithic remains.
    Whatever the solution, I think that we do have a problem with the use of Lochsbour as only WHG proxy and that it demands some counter-testing. 
    What about the ANE component? I do not dare to give any alternative opinion because I lack tools to counter-analyze it. What seems clear is that its influence on modern Europeans seems almost uniformly weak and that it can be ignored for the biggest part. As happens with the WHG, it’s quite possible that the ANE would be enhanced if the sequence from Afontova Gora is used instead of that of Mal’ta but I can’t foresee how much. 
    Finally some speculative food-for-thought. Again using the visual tool of the PCA, I spotted some curiosities:

    Speculative annotations on the PCA

    Most notably it is apparent that the two WHG populations (Western and Scandinavian) are aligned in natural axes, which seem to act as clusters. Extending both (dotted lines) they converge at a point closest to some French, notably the only “French” that tends towards “Southern France” and Basques. So I wonder: is it possible that these two WHG cluster-lines represent derived ancient branches from an original population of SW France. We know that since the LGM, the area of Dordogne (Perigord) was like the megapolis of Paleolithic Europe, with population densities that must have been several times those of other areas. We know that this region was at the origin of both Solutrean and Magdalenian cultures and probably still played an important role in the Epipaleolithic period. 
    So I do wonder: is that “knot” a mere artifact of a mediocre representation or is it something much more real? Only with due research in the Franco-Cantabrian region we will find out. 

    Siberian haploid DNA

    A new study is available with plenty of data on the haploid genetics of Siberian populations with focus on Tungusic peoples.
    Anna T. Duggan et al., Investigating the Prehistory of Tungusic Peoples of Siberia and the Amur-Ussuri Region with Complete mtDNA Genome Sequences and Y-chromosomal Markers. PLoS ONE 2013. Open accessLINK [doi:10.1371/journal.pone.0081605]
    Maybe the most informative graphic is fig. 1, which shows the scatter of mitochondrial DNA:
    Figure 1. Map of Siberia showing approximate locations of sampled populations and their basic haplogroup composition.
    For the meaning of abbreviations, check table 1.
    Typical NE Asian haplogroups like C and D are quite widely distributed, up to the point of becoming difficult to say much about them. Instead A is more concentrated (Nyukhza, Iengra, both of them Evenks, and Koryaks particularly), while Z does appear to show a similar pattern (but with presence among Kamchatka instead of Koryaks and a relevant distributon in NE Siberia (Berezovka and some Yakuts). 
    Haplogroup B is rare instead, only showing up in Southern Yakuts. It must be mentioned in any case because of its relevance in the original peopling of America. 
    G is not too common, with the partial exception of G1, which shows an Eastern Siberian concentration.
    Y is concentrated among Nivkhs (no surprises here), while F seems most important in Yakutia (like B, it is not a typical Northern lineage but its bulk distribution lays further South).
    West Eurasian lineages, marked in Brown are concentrated in the Evens of Nyukhza, as well as among some Yakuts. Their presence among Yakuts is easy to understand considering their partial Turkic ancestry but the Nyukhza even larger apportion seems to me derived of some other kind of contact with Altai and the steppe, although the authors seem to favor Yakut admixture instead.
    Premonitory FAQ: 
    Which is the difference between “M_N” and “Other”? 
    No idea: ask the authors. But I’m quite positive that “Other” cannot mean L(xM,N) but rather “other M and N”. Speculatively, it could indicate the difference between some M and N sublineages they have tested for and others which they did not. It’s sloppy nomenclature in any case.

    [Important post-script note: excepted the basal SNP markers for C and N, which were tested for, all the haplogroups are defined based on STR markers, what may be wrong].

    Table 4 lists the Y-DNA haplogroups for Evenks, Evens, Yakuts and Yukaghirs only. C3c1 is very dominant in the Tungusic populations: 87/127 among Evenks, 43/89 among Evens, but all the opposite among Yakuts (1/184) and rather weak also among Yukaghirs (2/13).
    Yakuts are dominated by N1c (173/184), lineage that has also some presence among the other sampled populations: Evenks: 18/127 (Nyukhza and Iengra groups), Evens: 30/89 (particularly Sakkyryyr and Sebjan groups), Yukaghir: 4/13.
    Q1 is found mostly among Yukaghirs (4/13) with a single Yakut other case.
    N1b is also of some importance among Tungusic peoples: 18/127 among Evenks (Taimyr and Stony Tunguska) and 13/89 among Evens (essentially in Tompo).
    C3* is found mostly among Nyukhza Evens (13/78), who also harbor most of the Western lineage I detected in the area (4/78). 
    The other meaningful Western lineage spotted is, of course, R1a, which is found in two variants: R1a(xR1a1) is concentrated among Taimyr Evenks (3/18) with only another sample among Stony Tunguska Evenks (1/40). R1a1 instead is concentrated among Yakuts (4/184).
    There are also erratics (isolated single-individual samples) of C*, J2, O and F*.
    There is also other interesting material in the study but I can only extend myself so much. I strongly recommend reading it for everyone with interest in Siberian and related populations, be these Uralics, Native Americans or generally East and Central Asians.

    Posted by on December 21, 2013 in East Asia, mtDNA, population genetics, Siberia, Y-DNA


    Ancient East Asian Y-DNA maps

    I’m fusing here data from two different and complementary sources:
    • Hui Li et al. Y chromosomes of prehistoric people along the Yangtze River. Human Genetics 2007. → LINK (PDF) [doi:10.1007/s00439-007-0407-2]
    • A 2012 study integrally in Chinese (so integrally that I don’t even know who the authors are → LINK) but whose content was discussed in English (after synthetic translation) at Eurogenes blog. I deals with a variety of ancient Y-DNA from the Northern parts of P.R. China.

    Update (Dec 25): much of the Northeastern aDNA is also discussed in an English language study (h/t Kristiina):

    Yinqiu Cui et al. Y Chromosome analysis of prehistoric human populations in the West Liao River Valley, Northeast China. BMC 2013. Open access LINK [doi:10.1186/1471-2148-13-216]

      Combining the data from both sources, I produced the following maps:

      Neolithic (before ~4000 BP):

      Metal Ages (after ~4000 BP):

      I find particularly interesting the first map because it outlines what seem to be three distinct ethnic (or at the very least genetic) regions in the Neolithic period:
      • A Central-South region dominated by O3
      • An Eastern area around modern Shanghai dominated by O1
      • A Northern region dominated by N
      Later on, in the Metal Ages, a colonization of the North/NE by these O3 peoples seems apparent, followed, probably at a later time, by a colonization of the West (Taojiazhai).
      We do not have so ancient data for the West but we can still see a diversity of lineages, notably Q (largely Q1, if not all), C (most likely C3, also in the NE) and N (also in the NE). While the arrival of O3 to this area was probably late, the arrival of R1a1a is quite old, however it is still almost certainly related to the first Indoeuropean migrations eastwards, which founded the Afanasevo culture in the area of Altai.
      I find also very interesting the presence, with local dominance often, of N (including an instance of N1c) and Q in the Northern parts of P.R. China, because these lineages are now rather uncommon but are still dominant in Northern Asia, Northeastern Europe and Native America. The fact that they were still so important in the Northern Chinese frontier in the Neolithic and even in the Metal Ages should tell us something about their respective histories and, in the case of N, origins as well.
      It is also notable that no D was detected anywhere. However the regions with greatest D frequencies like Tibet, Yunnan or Japan were not studied.

      Posted by on December 15, 2013 in aDNA, Bronze Age, China, East Asia, Iron Age, Neolithic, Y-DNA