Category Archives: Finnic peoples

A review of haplogroup N (Y-DNA)

Haplogroup N (Y-DNA) is spread from the Baltic to the South China Sea being one of those rare genetic links between East and West Eurasia (other than ultimate common ancestry) and one of the two Y-DNA lineages which expanded across the Northern Eurasian continent (the other one being Q).
While it is apparent to me and many others that the lineage originated in East Asia and expanded first Northwards to Siberia and later Westwards to Europe. I have found sometimes reluctance to accept this fact or difficulty understanding why. Some of the data of this paper may be of help in this regard. It is also a good exercise for those learning to understand how haploid genetics can be decoded into a meaningful pattern that reveals key parts of the untold history of peoples. 
Hong Shi et al., Genetic Evidence of an East Asian Origin and Paleolithic Northward Migration of Y-chromosome Haplogroup N. PLoS ONE 2013. Open access → LINK [doi:10.1371/journal.pone.0066102]


The Y-chromosome haplogroup N-M231 (Hg N) is distributed widely in eastern and central Asia, Siberia, as well as in eastern and northern Europe. Previous studies suggested a counterclockwise prehistoric migration of Hg N from eastern Asia to eastern and northern Europe. However, the root of this Y chromosome lineage and its detailed dispersal pattern across eastern Asia are still unclear. We analyzed haplogroup profiles and phylogeographic patterns of 1,570 Hg N individuals from 20,826 males in 359 populations across Eurasia. We first genotyped 6,371 males from 169 populations in China and Cambodia, and generated data of 360 Hg N individuals, and then combined published data on 1,210 Hg N individuals from Japanese, Southeast Asian, Siberian, European and Central Asian populations. The results showed that the sub-haplogroups of Hg N have a distinct geographical distribution. The highest Y-STR diversity of the ancestral Hg N sub-haplogroups was observed in the southern part of mainland East Asia, and further phylogeographic analyses supports an origin of Hg N in southern China. Combined with previous data, we propose that the early northward dispersal of Hg N started from southern China about 21 thousand years ago (kya), expanding into northern China 12–18 kya, and reaching further north to Siberia about 12–14 kya before a population expansion and westward migration into Central Asia and eastern/northern Europe around 8.0–10.0 kya. This northward migration of Hg N likewise coincides with retreating ice sheets after the Last Glacial Maximum (22–18 kya) in mainland East Asia.

Hong Shi has previously produced very interesting materials and this is no exception, however I find the use of chronological guesstimates as if these would be objective findings and treated as part of the central discourse (and not the mere side note where they belong) a bit nauseating and a cause of confusion.

Figure 4. Proposed prehistoric migration routes for Hg N lineage.
(the pattern is correct but the dates are mere hunches, not any sort of objective facts)

Above we can see the reconstructed pattern of expansion of Y-DNA N in three phases. In my understanding the dates are not way off, although I can only imagine that there is still room for improvement, especially regarding the “red” phase. After all NO may have split c. 60 Ka ago and the main branch, O, c. 50 Ka BP – and not the mere 25-30 Ka that Shi calculated (in a previous study but mentioned again here).
But the really interesting part is not molecular-clock-o-logy but this:

Figure 3. Median-joining networks for sub-haplogroups of Hg N lineage using Y-STR alleles.

diagnostic mutations used to classify the sub-haplogroups are labeled
on the tree branches. Each node represents a haplotype and its size is
proportional to the haplotype frequency, and the length of a branch is
proportional to the mutation steps. The colored areas indicate the
geographic origins of the studied populations or language groups.

Here we can appreciate, with the labyrinthine limitations of the use of (too few?) STR markers, the apparent structure of the various haplogroups and paragroups under N. We can also see the STR diversity in numerical terms:

Table 3. Y-STRs diversity of Hg N sub-haplogroups.

Sadly the category “Han Chinese” is almost useless and one wonders why Shi et al. changed from the North/South polarity in the key paragroup N* to such a confusing terminology in N1.
In any case, it is quite evident that N arose in South China, spread, already as N1 to NE Asia and, later, some of that N1 (N1c mostly but also some N1b) spread Westwards reaching to Finland and other Eastern European populations. In the haplotype graph we can appreciate a distinct European-specific branch within N1b.

Update (Jul 28): some new findings (not considered in the study) and updated nomenclature.
See comments’ section for greater details. Special thanks to Palamede for his efforts in clarifying the matter.
Commercial testing company FTDNA has recently detected some new markers within haplogroup N1 that alter the phylogeny. A synthesis of these findings can be seen in this graph.
This new nomenclature was adopted by ISOGG but the study discussed here does not include it, using instead a 2011 nomenclature. Hence we must understand that:
  • N* and N1* remain as such
  • “N1a” (M128) is now known as N1c2a
  • “N1b” (P43) is now N1c2b
  • “N1c” (M46/Tat) is now N1c1
Therefore the N1 tree splits as:
  • N1a (new clade, P189)
  • N1b (new clade, L732)
  • N1c (new clade including all previously named subhaplogroups)
    • N1c1 (M46/Tat, former N1c)
    • N1c2 (new clade, L666)
      • N1c2a (M128, former N1a)
      • N1c2b (P43, former N1b)
As far as I could gather, N1(xN1c) is so far only clearly represented by two FTDNA-tested singletons: a Slovakian (N1a) and someone of Polish surname (N1b1). However I may be missing some details. Whatever the case it is possible that, unless more samples show up in these groupings the tree may be later reverted to the original state (or something in between) because isolated individuals or families do not haplogroups make. 
Also it is important to understand that commercial DNA testing companies have very unbalanced samples, clearly dominated by people of NW European (and to lesser extent other European) ancestry, what is not too useful when discerning what is where, producing sometimes the false impression of greater European diversity just because of greater number of samples.
On the other, hand the Hong Shi data reported above clearly shows a great number (and diversity) of East Asians within N1*, so the most likely conclusion is that the few Europeans within N1* are mere erratics within clades of East Asian origin, surely brought Westward by the overall N1 tide. 
So in essence the conclusions of the paper remain unchallenged.

Autosomal DNA of NE Europeans

A paper of some interest is available these days at the Public Library of Science:
Andrey V. Kruhnin et al., A Genome-Wide Analysis of Populations from European Russia Reveals a New Pole of Genetic Diversity in Northern Europe. PLoS ONE 2013. Open accessLINK [doi:10.1371/journal.pone.0058552]


Several studies examined the fine-scale structure of human genetic variation in Europe. However, the European sets analyzed represent mainly northern, western, central, and southern Europe. Here, we report an analysis of approximately 166,000 single nucleotide polymorphisms in populations from eastern (northeastern) Europe: four Russian populations from European Russia, and three populations from the northernmost Finno-Ugric ethnicities (Veps and two contrast groups of Komi people). These were compared with several reference European samples, including Finns, Estonians, Latvians, Poles, Czechs, Germans, and Italians. The results obtained demonstrated genetic heterogeneity of populations living in the region studied. Russians from the central part of European Russia (Tver, Murom, and Kursk) exhibited similarities with populations from central–eastern Europe, and were distant from Russian sample from the northern Russia (Mezen district, Archangelsk region). Komi samples, especially Izhemski Komi, were significantly different from all other populations studied. These can be considered as a second pole of genetic diversity in northern Europe (in addition to the pole, occupied by Finns), as they had a distinct ancestry component. Russians from Mezen and the Finnic-speaking Veps were positioned between the two poles, but differed from each other in the proportions of Komi and Finnic ancestries. In general, our data provides a more complete genetic map of Europe accounting for the diversity in its most eastern (northeastern) populations.

I’m not too sure of how to analyze this paper because, on one side, there’s some missing data, especially in regards to the ADMIXTURE analysis (FST distances between components) and then for some reason the Chinese control was totally removed from further analysis as well, making very difficult for example to estimate if and how much East Asian admixture exists in these NE European populations. Then on the other side, nearly all Finno-Ugrian peoples (as well as the Mezen Russians, genetically Finno-Ugrian as well) are highly endogamous peoples, what almost invariably distorts ADMIXTURE analysis by creating many localized components of dubious relevance.
The ADMIXTURE analysis was presented, as often happens quite incorrectly, for values under the cross-validation optimum, which in this case is at least known: K=6 and K=7 (very similar lowest values):

Figure 4. ADMIXTURE clustering of individuals from the populations studied.
Results obtained at K = 2 to 5 are shown. Each individual is represented by a vertical line composed of colored segments, in which each segment represents the proportion of an individual’s ancestry derived from one of the K ancestral populations. Individuals are grouped by population (labeled on the bottom of the graph). In addition to populations used in principal component analysis, a Chinese sample (Han Chinese from Beijing [22]) was included. The results at K = 5 are also accompanied by average ancestral proportions by population (*). Population designations are the same as in Figure 1.
[From fig. 1:] Key: Komi_Izh – Izhemski Komi, Komi_Pr – Priluzski Komi, Rus_Tv – Russians from Tver, Rus_Ku – Russians from Kursk, Rus_Mu – Russians from Murom, Rus_Me – Russians from Mezen, Finns_He – Finns from Helsinki, Finns_Ku – Finns from Kuusamo, Rus_HGDP – Russians from the Human Genome Diversity Panel.
At least in the supplemental materials we find the missing K-values:

Figure S4. Results of ADMIXTURE clustering at K = 6 to 8. The number of populations and their order are the same as at Figure 4.
[Note: per fig. S5, the optimal K-values are K=6 and K=7]

Something that may call your attention is the relatively high value of the Chinese component in Italians (Tuscans, judging on the locator map). This anomalous effect (unheard of in other studies) may well be caused because a West Asian control is clearly missing and Italians have relatively high West Asian affinity, being otherwise relatively isolated within this Northern European sample. 
Notice also how every single endogamous Finno-Ugric population forms their own cluster: a generic Finno-Ugrian component at K=3 (red), a distinction between the Komi and the Finnic component at K=4 (red and purple), then at K=5 we get a mini-break with a more general North/South Europe distinction showing up (yellow and blue components), but at “optimal” K=6 and K=7, we still see other localized components forming: first Komi_Pr (brown) and then the Vepsian one (grey). So out of seven “optimal” components (K=7), four are local corresponding to highly endogamous populations. 
But I’m running a bit ahead of myself, admittedly. The endogamy index is analyzed as ROH values: nROH for the mean and cROH for the average:

Table 2. Summary of ROH statistics of 16 European populations.

We can see here that large and relatively cosmopolitan populations like Germans and Italians have low ROH values. Czechs and Central Russians come next, with Poles already showing a bit higher endogamy index. Latvians and Estonians are still relatively low but Northern Finno-Ugrian peoples (including Mezen Russians) deviate a lot, with values (at the non-asterisk columns) that are at best almost double than those for Estonians and, at worst, six times higher.
So in this particular case, and quite exceptionally, I’d say that K=2 or K=3 are the most realistic K-values, in spite of scoring quite poor in the cross-validation test. Of course that the N-S European distinction shown at K=5 is also real and not caused by any “effect” but otherwise the clusters showing up correspond to extreme drift caused by isolation and endogamy and therefore only tell us about that peculiarity of the European Far North. 
K=2 is surely the most informative level for East Asian genetic influence, except for  the already mentioned Italian anomaly (which may also affect to lesser extent Central Europeans). However because this study is so limited in this aspect, I’d encourage the development of more informative studies, which could for example ponder the FST distances between components, always informative, and/or use other population sampling strategies that better capture this aspect.
After all this is a study focused on Russia, even if that way it has also produced some valuable information for much of NE Europe.
Figure 3. Principal component
analysis of the combined autosomal genotypic data of individuals from
Russia and seven European countries (Finnland, Estonia, Latvia, Poland,
Czech Republic, Germany [5] and Italia [22]).

first two PCs are shown. The color legend for the predefined population
labels is indicated within the plot. Population designations are the
same as in Figure 1.

Appendix: Finno-Ugrian peoples/languages map by Marting/Nug (anti-copyright):


Der Sarkissian’s NE European ancient DNA formally published

This entry is a bit redundant because a previous version of this paper (presented as doctoral thesis) was already discussed in length months ago. Let us recall that this study provided the first confirmed (by coding region testing) mtDNA sequence belonging to haplogroup H in Northern Europe prior to the Neolithic*

Clio Der Sarkissian et al., Ancient DNA Reveals Prehistoric Gene-Flow from Siberia in the Complex Human Population History of North East Europe. PLoS Genetics, 2013. Open accessLINK [doi:10.1371/journal.pgen.1003296]


North East Europe harbors a high diversity of cultures and languages, suggesting a complex genetic history. Archaeological, anthropological, and genetic research has revealed a series of influences from Western and Eastern Eurasia in the past. While genetic data from modern-day populations is commonly used to make inferences about their origins and past migrations, ancient DNA provides a powerful test of such hypotheses by giving a snapshot of the past genetic diversity. In order to better understand the dynamics that have shaped the gene pool of North East Europeans, we generated and analyzed 34 mitochondrial genotypes from the skeletal remains of three archaeological sites in northwest Russia. These sites were dated to the Mesolithic and the Early Metal Age (7,500 and 3,500 uncalibrated years Before Present). We applied a suite of population genetic analyses (principal component analysis, genetic distance mapping, haplotype sharing analyses) and compared past demographic models through coalescent simulations using Bayesian Serial SimCoal and Approximate Bayesian Computation. Comparisons of genetic data from ancient and modern-day populations revealed significant changes in the mitochondrial makeup of North East Europeans through time. Mesolithic foragers showed high frequencies and diversity of haplogroups U (U2e, U4, U5a), a pattern observed previously in European hunter-gatherers from Iberia to Scandinavia. In contrast, the presence of mitochondrial DNA haplogroups C, D, and Z in Early Metal Age individuals suggested discontinuity with Mesolithic hunter-gatherers and genetic influx from central/eastern Siberia. We identified remarkable genetic dissimilarities between prehistoric and modern-day North East Europeans/Saami, which suggests an important role of post-Mesolithic migrations from Western Europe and subsequent population replacement/extinctions. This work demonstrates how ancient DNA can improve our understanding of human population movements across Eurasia. It contributes to the description of the spatio-temporal distribution of mitochondrial diversity and will be of significance for future reconstructions of the history of Europeans.

As you may realize, the Sardinian sequences discussed also in the thesis are not part of this paper. Also the emphasis is on the presence of Oriental lineages (C*, C1, C5, D* and Z1a) in North-Eastern Europe prior to the Neolithic, which is, no doubt another element of interest.

Table 1. Results for mitochondrial DNA typing
Yuzhni Oleni Ostrov is in Karelia,Popovo in Northern Russia and Bol’shoy in Sápmi (Lapland)

As mentioned in the previous entry there is another (not shown) 13th century Sámi sample (Chalmny-Varre) which is totally modern in composition: dominated by haplogroup V7e and complemented by U5b1b1 and U5a1.
The simulations performed by the authors suggest that:

The model of genetic continuity between aUzPo and present-day Saami was
found to fit the observed data better than the model of genetic
continuity between aUzPo and present-day NEE.

The model also suggests that modern NE Europeans from the area (Russian, Finns and even to some extent Karelians and Volga-Ural peoples) are product of later migration from Central Europe, however they could not test this for the Saami because they could not find a plausible source population for them. 
This is maybe best visualized in figure 2:

Figure 2. Principal Component Analysis of mitochondrial haplogroup frequencies.

In this graph the Sámi look rather continuous with ancient locals but they show even more continuity with ancient Pitted Ware populations from the Baltic (aPWC, Chalcolithic semi-foragers with possible roots in Eastern European Neolithic) and related “foragers” from NE Poland and Lithuania (corresponding to various periods, even Chalcolithic in some cases), as well as more genuine pre-Neolithic hunter-gatherers from Germany (all them pooled as aHG).
It must be mentioned in any case that there is no single known case of haplogroup V in any pre-Neolithic sample (neither in Europe, nor anywhere else). And this one makes up the bulk of modern Sámi mtDNA pool.


* Other such strongly confirmed haplogroup H (mtDNA) sequences were also reported last year for Northern Iberia Paleolithic and Epipaleolithic remains. Many other Paleolithic sequences are suspect but have only been tested for the Hyper-Variable Region (HVS), which is often not conclusive for this haplogroup, causing in the recent past some people to (wrongly) reject the presence of this lineage, now the most common in Europe, before the Neolithic. Now we know that it did exist in both Northern and Southern Europe, although many questions remain on its commonality and history.

Ancient DNA from Eastern Europe and Sardinia

A very interesting doctoral thesis has been known these days (h/t Jean). The thesis by Clio S. I. Dersarkissian (directed by A. Cooper and W. Haak) includes novel ancient mtDNA from North Eastern Europe (Karelia and surroundings) specially and also some Scythian and Sardinian burials from the Metal Ages.
Clio Simone Irmgard Dersarkissian, Mitochondrial DNA in ancient human populations of Europe. University of Adelaide, 2011 (thesis). Freely accessible ··> LINK [identifier:
The most interesting findings may be those from Karelia:
  • First pre-Neolithic mtDNA H in Northern and Eastern Europe and one of the few findings strongly confirmed in such haplogroup before Neolithic. It clearly reinforces the already well established notion that mtDNA H existed in Europe before the Neolithic.
  • U2e – which might well be descendant or otherwise related to the U2 of Kostenki.
  • C1 – suggesting pre-Neolithic Siberian influences in Northern and Eastern Europe. The specific sublineage (named as “C1f”) has not yet been sequenced elsewhere.
There are some more interesting data regarding ancient NE Europeans, Scythians and Sardinians but let’s see that by parts.

Epipaleolithic peoples from Karelia and Northern Russia

Possibly the most impacting findings of this paper are those regarding two Epipaleolithic sites in Karelia (Uznyi Oleni Ostrov) and nearby parts of Northern Russia (Popovo, in Russia proper but not far from the Karelian border), as well as one more recent site from Sápmi (Lapland).

As I mentioned above, the U2e and C1 (“C1f”) findings are unusual and suggestive of ancestral connections with Kostenki (Early Upper Paleolithic site from Southern Russia with U2 mtDNA) and Central Asia and Siberia. In fact an overall comparison with modern populations, shows strong affinities with West Siberians and Uyghurs for these Epipaleolithic Karelians.
Instead the Bronze Age Sami site shows more generic or distributed Siberian affinities, although there are populations in West Siberia (Nenets?) that also fit well with that mtDNA genetic pool. Bashkirs show similar affinity to both ancient populations (see ch. 1, fig. 3 – p. 103).
Not shown here are the results for the 18th century Sami site of Chalmny-Varre, which look a very modern Sami mtDNA pool, dominated by V7e and complemented by U5b1b1 and U5a1. 

Confirming the existence of mtDNA H in pre-Neolithic Europe

I really want to underline this, because certain influential people have been dead set into denying the existence of mtDNA haplogroup H in Europe altogether before the Neolithic. Why? Because they have a theory (a hypothesis more properly speaking) and they can’t accept to be wrong about it.
That hypothesis (very popular in some circles) states that European aboriginal hunter-gatherers were very radically annihilated by Neolithic invaders from West Asia (never mind that archaeology alone is much more complicated than that, they don’t seem to like thinking too much, much less looking at the matter from all the angles).
And a central battle they have fought is denying the possibility that mtDNA H (he most common haplogroup today in Western Europe) existed in the continent before Neolithic. The whole haplogroup, in their imaginary reality, could only have arrived with the industrious (and seemingly quite genocidal) farmers from West Asia (who almost never even mixed with anyone aborigine, how odd).
Reality began questioning their findings since 2005 but back in the day only HVS-I or at best HVS-II (control regions of the mtDNA chain) were used, leading to inconclusive results, specially in regards to short-stemmed haplogroup H. So they could still deny and deny…
But, recently, two different new studies have found unmistakable mtDNA H in Magdalenian people from Cantabria and Epipaleolithic people from the Basque Country. The reaction of some such knowledgeable aficionados has been simply unbelievable: they have flatly rejected the results without any reason; these findings are simply too inconvenient truths for their conjectures to be accepted. They are so obsessed with their fantasies that they can’t even accept mounting evidence against them: they have stopped being scientific and begun being fanatics.
Very sad, really.
This finding in Karelia adds to the mounting unquestionable evidence on the matter: mtDNA haplogroup H not only existed in pre-Neolithic Eruope but it was quite extended, roughly through the areas in which is today abundant (and not just SW Europe as I came to suspect for some time). However in most regions was still far less common than it is today (or even totally missing, as seems to be the case in Central Europe).
Said that, it is not too clear yet where does all the improved knowledge of ancient genetics lead us to but what is clear is that mtDNA H is older and specifically older-in-Europe than some (too many) people have been insisting on.
Also it seems more and more obvious that the popular Neolithic farmers did not define the modern genetic landscape of Europe at all. They certainly introduced lineages that surely did not exist before but their overall influence seems limited and it does look like, after an initial burst, they declined also quite abruptly.
This is something that has been in the news these days (but no paper yet) and that I observed also in 2009 in relation to some similar studies (see: here and here). The age that we begin seeing modern-like mtDNA pools actually varies a lot, for example:
  • SW Europe: Basque Country: Neolithic (at least) ··> Hervella 2009 (discussed here).
  • Central Europe: Elbe Basin: Bronze Age or Chalcolithic ··> Schilz 2006[de], Schweitzer 2008.
  • Far North Europe: Sápmi: some time after the Bronze Age and before the 18th century (this study).
  • Central Asia: Iron Age (see below).
I conjecture here that (before the Medieval agricultural revolution) Northern latitudes could in general support lower population densities, being also more susceptible to the effect of climatic fluctuations. But more data is needed before we can have some consolidated certainty.
In any case, I took some time to make a couple of updated maps of the European and North African (1) Late Upper Paleolithic (Magdalenian and Oranian cultures) known ancient mtDNA and (2) Epipaleolithic. With this last one I found some conceptual difficulties so I had to take decisions, which were:
  • A most recent date boundary of 4000 BCE (which already overlaps with Neolithic in most regions since 1500 or more years before). Actually the most recent sites are c. 4200 BCE from Lithuania and c. 4600 BCE from Navarre.
  • No inclusion of any Neolithic data even if contemporary. The only possible exception was Franchti Cave (Greece), which has a sequence beginning in the Epipaleolithic (or Mesolithic) but is largely Neolithic. The exact adscription of the sequenced individual is not known.

The results are:

Late Upper Paleolithic mtDNA from Europe and North Africa
R* and specially R*-CRS can well be H and have often been reported as such but we do not know for sure

Epipaleolithic mtDNA from Europe (until 4000 BCE)
R* and specially R*-CRS can well be H and have often been reported as such but we do not know for sure

Some of these data (and others from more recent periods) can be seen in the dedicated Ancient mtDNA maps page at this blog. It needs some updating however: not much time has passed since I created those maps but new findings do pile up quickly these days. 

Ancient Scythian mtDNA

Another point of interest of the thesis is the ancient Scythian tombs from the Don basin (Iron Age, proto-historical). The results show some greater Eastern genetic influence than modern peoples (Russians) do.

The results, which place ancient Scythians closer to modern Central Asians than to Eastern Europeans are consistent with other recent studies that show an inflow of Eastern Asian mtDNA lineages into Central Asia even before the Turkic invasions of the Roman period and early Middle Ages.

Bronze Age Sardinian mtDNA

Finally the thesis deals with Sardinians from the Bronze Age (Nuraghic period). The sites are both from the most central parts of Sardinia, so they may be more representative of an early refuge population than to the overall Bronze Age of the island but still they are curious and interesting:

Dersarkissian argues that this suggests continuity but with many doubts, partly because the source of the genetic data (isolated teeth) did not allow for any certain identification of individuals. Still the resulting mtDNA pool (no matter how you look at it) is not really modern but rather reminds of Central European and Mediterranean Neolithic sites. 
The may well be some of the last Neolithic immigrants, who, instead of replacing the hunter-gatherer aborigines all around (as some imagined too dearly) were the ones taking refuge in this turbulent period in the highlands of Sardinia.
Who knows?!