RSS

Category Archives: self-research

Egyptian autosomal genetics in the regional context (quick ‘Admixture’ run)

[Important caveat: apparently both Egyptian samples are from the Delta region, the one most affected historically by Eurasian influence. The one labeled Egypt or Egypt (1) is Henn’s sample (n=18), while the one labeled egyptan (sic) or Egyptian (2) is Behar’s (n=12)].

Some readers questioned whether the strong Iberian affinity apparently found in Egypt in the previous Admixture run focused on North Africans was actually masked Highland West Asian or otherwise non-Peninsular Arab West Asian influences. I was initially skeptic because I had expected by default that Saudi Arabs would represent better all possible West Asian influences than Iberians.

I was mostly wrong as shown here.

Methods: I just run Admixture for K=4, K=6 and K=8 on a selection from the 1000 Genomes sample, following GNXP’s instructions and using the following populations: both Egyptian samples (n=18 and n=12), plus 10 individuals from each of the following populations: Spaniards, Moroccans, Maasai, Ethiopians, Saudi Arabs, Palestinians, Turks and Kurds. The selection of Maasai and Ethiopians to represent Tropical Africans was made because previous research (mine and Henn’s) showed these two being, of the available samples, the ones to best represent ultra-Saharan influence in Egypt specifically.

Results: I am showing only the K=8 results because the lower K levels do not seem overly informative (if anyone wants them, feel free to ask).

1. The K=8 graph:

2. The K=8 numerical apportions (same as above but in figures, minimally edited by me to improve visualization):

3. The K=8 ADMIXTURE summary, showing Fst distances between components (‘pops’), minimally edited to improve quick understanding (component ‘ethnic’ labels):

Highlights:

Egypt:

There is (and I could eventually detect) an Egyptian-specific component, of West Eurasian affinity (look at the Fst table), what implies that it’s surely descendant of the pre-Neolithic Egyptians of Asian origin. Paleolithic Egyptians that I presume existed based on other genetics (mtDNA X1, M1 and such), as well as Eurasian-like iconography like the Qurta rock art, similar to materials from SW Europe and Anatolia (but admittedly the Egyptian Paleolithic, with a few exceptions, is not well known on archaeological grounds being such a sedimentary and then desertic area overall, and also because archaeology in Egypt has been largely focused on the quite impressive pharaonic period).

This Egyptian-specific component represents 29% of one sample but only 19% of the other one, being also of some relevance in Ethiopia (9%). This and other differences between the two samples suggest some structure to be unveiled within Egypt but I lack the means (diverse enough samples) to do it. Anyhow the two samples are only somewhat different.

Besides this component, Egyptians show a diverse array of external influences, possibly Neolithic immigrants (?). The most important ones are the Kurdish or Highland West Asian component (17-20%) and the two Arab components together (14-25%) but others (Berber, Palestinian, East African) are also quite influential. The Iberian influence was largely a mirage (although still weights 4-8%).

East Africans:

Among the other populations, the most interesting finding of this run is that the Maasai appear, unlike in other research, to be 96% themselves (but still less distant from Eurasians than the average Tropical African, which is in the Fst=0.2 range), with at most residual admixture from Eurasians (mostly Egyptian/Palestinian). Ethiopians in turn appear here as somewhat admixed Maasai, with North African (mostly Egyptian) and peninsular Arab influences. However my previous relevant exercise showed that, at sufficient K-depth (or with a different sample strategy), Ethiopians eventually converge in their own specific and dominant genetic cluster (91%), which, as in the Fulani case, is similarly distant (and not too distant) to West Eurasians and Tropical Africans, indicating (I understand) very ancient homogenized intercontinental admixture. It surely requires an specifically designed run to understand these matters well enough.

Arabian-Egyptian (Arab 2) very distant component:

Also notice the Arab 2 extremely distant Fst values, in the >0.2 range. On first impression I thought they were the Maasai component for that reason but nope. We may be here before another OoA remnant, which is very relevant in Arabia peninsula and also in the second Egyptian sample (c. 12% in both cases) and totally absent in Iberia instead.

In any case, it is again evident that different sample strategies can produce quite different results and therefore it is good to look at these matters with an open mind and many complementary perspectives.

Update: K=4 and K=6 graphs, for the record and because some kind of speculation may have some use for them:

Update (Feb 1, after realizing that both samples are from the Delta):

The finding of an Egyptian-specific component may be even more relevant further South. If some areas of the Delta have retained some 30% of this component, it’s probable that it’d be even better preserved towards the interior. On the other hand I’d also expect more Tropical African influence further South but that should be at least balanced by a significant decrease of (post-)Neolithic West Asian influences.

Of course only real samples will provide real answers.

30 Comments

Posted by Maju on January 30, 2012 in African genetics, autosomal DNA, Egypt, North Africa, population genetics, self-research, West Eurasia

(Sephardi) Jews in the context of the Levant and Anatolia

02 Jan

In June 2010, in rapid succession, we had the opportunity of learning from two papers which studied Jewish genetics in the context of wider samples.

The first one (Atzmon 2010 – PPV, discussed by me here) showed us, even if hidden in the supplementary material only, that the main Jewish cluster was extremely close to their “Turks and Cypriots” sample, specially to some of them (Cypriots apparently):

This central or pivotal group of Jews, marked as GRK+TUR and SYR, are the so-called Sephardi, which, in spite of their name (Sepharad means Spain) do not necessarily originate in the Iberian peninsula but rather share historically a pan-Mediterranean type of ritual, distinct from that of Ashkenazim and other Jewish populations (see also this article if you wish to understand better the complexity of Jewish ethnic divisions).

Just a few days later we got Behar 2010 (also PPV, discussed here), which did not deepen enough in the component analysis, in my understanding, but determined at least that Western Jews (Sephardi, Ashkenazi and Moroccan Jews) were of West Asian origin apparently.

However how exactly they related with West Asians was mostly unresolved (excepting that Palestinians have a lot of genetic distinctiveness and this was not shared with modern Jews nor nearly any other population at any meaningful level).

So, as I began toying around with ADMIXTURE, using the very basic but functional instructions by Razib, one of the ideas I had to explore were Western Jews (other Jewish communities seem to be derived from their host populations but Western Jews appeared as West Asian in these studies).

Extreme distinctiveness of Ashkenazim and Moroccan Jews

I doubted whether to retain the Ashkenazi and Moroccan Jewish samples or just work with Sephardites. In the end I retained them… but then I had to correct and start all over. Why? Ashkenazim specially cluttered the analysis with their extreme specificity (possibly because of some extreme bottleneck in their origins and/or inbreeding, although I can’t say for sure).

The results were hardly a satisfactory answer for the question I meant to ask: how do Western Jews relate with the diversity of Anatolia and the Levant, the area where they probably originated? But rather placed these two communities as extremes in the area studied, as unlikely references rather than being referred to the wider native populations. Example:

Instead Sephardi Jews showed up as less strikingly monolithic and had been found in previous studies to be, quite obviously, central to the Jewish diaspora. So I decided to start all over retaining only Sephardi Jews, which should be enough to give the key answers about Western Jewish origins.

Analysis using only Sephardites

This analysis was more productive. The whole run is as follows:

The deeper K levels are not really too informative, specially not for the matter at hand, the origin of Sephardi Jews (and by extension all Western Jews probably). Probably the greatest interest is between K=4 and K=6. I decided to retain the last one as main snapshot:

The labels of the components in the last two images (bottom and right respectively) are a mere conceptual reference.

Kebaran (source)

Since K=2 there is an obvious distinction between Palestinians and the rest. This is coincident with what we find in Behar 2010 for example and what I have found in previous analysis: that Palestinians show a marked distinctiveness even in West Asia. My hypothesis is that they retain best a distinctiveness that may be as old as the Kebaran culture or even older. I understand that this means that, very possibly, Palestinians are the true descendants of historical Jews, Canaanites, etc.

Early PPNB (source)

However this analysis was not designed to discern Palestinian affinities but those of Jews, so I won’t discuss this farther. Just to say that if the Palestinian pole is akin to Kebaran-Natufian-PPNA, then the other main detected cluster, shared abundantly by everyone in the region, could well be akin to PPNB.

In any case, the result is that, no matter how deep you go, Sephardi Jews are not clearly distinct from other West Asians, specially not from Cypriots and most often than not also not distinct from Turks. The main difference is certain very weak and slippery relation with Palestinian and other Levantines.

Particularly, component K5 (pop4 in the Fst table, blue in the bar graph), which I labeled as Palestine1, stroke as quite interesting. This component is most common among Palestinians (15% at K=6) but second most common among Sephardi Jews (5%), being smaller among all other populations analyzed. It is possible, I speculate, that this component is a remnant of a genetic link between the Palestinian population, long ago of Jewish religion and identity most likely, and the Diaspora Jews, most of whose ancestry seems to have other origins.

It would be, in this regard, most interesting to analyze genuine Palestinian Jews, descendants of those c. 10% Palestinians of Jewish religion who existed before the Zionist colonial project began, sadly even this notion of Palestinian Jewish has vanished or been erased, even if it was once common enough.

Regardless, in almost everything else, Sephardi Jews are identical to Cypriots and Turks, what suggests that my idea of Western Jews having originated not in Palestine but in the Hellenistic Diaspora, which was largely product not of emigration but of proselytism (and in this context early Christianity was just a Jewish sect of messianic character).

It remains to see how Ashkenazi and Moroccan Jews fit in this description. But while I had to renounce to analyze them, all the previous data strongly suggests that they are not too distinct from Sephardi Jews and should share at least partly that same origin, followed by intense bottlenecks.

_______________________________________

Update (Jan 11): before I forget, I must mention this other paper (that I did not know about) mentioned by PConroy in the discussion:

Avshalom Zoossmann-Diskin, The origin of Eastern European Jews revealed by autosomal, sex chromosomal and mtDNA polymorphisms. Biology Direct 2010. Open access.

The author finds Easter European Jews (the bulk of Ashkenazim) to be of essentially European origin mtDNA-wise, with special mention to Italy.

46 Comments

Posted by Maju on January 2, 2012 in population genetics, self-research, West Asia

North African genetics through the prism of ADMIXTURE

29 Dec

I believe that with this exercise, which took me just a morning’s time, I’m walking a path that has not been explored before: analyzing the autosomal genetics of North Africans on their own right, without being part of a larger context, be it African or West Eurasian or global. At least I’m not aware of any such paper nor self-research exercise in the blogosphere either.

Said that, I did get in the study five exogenous samples, in order to estimate possible external influences. These are: 10 Fulani, 10 Mandinka, 10 Ethiopians, 10 Saudi Arabs and 10 Spaniards. I did not alter the diverse HGDP North African samples (including two different Egyptian samples), except for two things: I removed the Moroccan Jews altogether and cut the Mozabite sample to 10 individuals, because of suspicion that their alleged isolation might distort the larger analysis.

More or less as I expected, at K=10, which was my preliminary goal, each of the exogenous ethnicities described one distinct component, while the other five components were North African specific.

What I did not expect at all was that Tunisians would show up as distinctive as they did (see below). I wonder if there is something special in that sample or if the measure applies to all (or most) Tunisians. Very strange and unexpected in any case.

In the end, concerned that I might be missing something of relevance, I made two more runs and one of them struck “genetic gold”, it seems to me: a small South Moroccan component very distant from everything else, which might well be a remnant of the Aterian period or something like that.

Method: I used a fraction (as described in the previous lines) of the global HGDP sample following the method explained at Gene Expression to operate ADMIXTURE and associated programs (Plink, R).

Results:

K=2 – Without surprises: Tropical African vs. West Eurasian components.

K=3 – Big surprise: the first North African specific component is concentrated in Tunisia, not Morocco, not Mozabites… but Tunisians, uh?

K=4 – As another North African specific component (red, most common among Sahrawis, then Moroccans) shows up, the Tunisian component (green) retracts, so to say, to the Tunisian borders.

K=5 – Not happy with one, the algorithm finds a second Tunisian component, restricted also to that country. I’m as perplex as you may be.

K=6 – West Asian (Saudi Arab) and European (Spaniard) components diverge.

K=7 – Second (non-Tunisian) NW African component shows up. This one (turquoise) is most concentrated among Mozabites.

K=8 – A Fulani-specific component shows up. Intriguingly it is almost equidistant by Fst measure from the Mandenka and the Sahrawi components (0.105 and 0.115 respectively). All the North African specific components are much closer to West Eurasian ones than to the Mandenka component, so this might suggest a very old kind of trans-Saharan admixture, then homogenized in a single component.

K=9 – Not happy with one, the Fulani show a second component in a row. This one is neatly Tropical African (very distant from all and only somewhat close to the Mandinka component and the other Fulani component but at the 0.163 and 0.173 Fst values, which is also very distant). I imagine that this has to do with the Fulani L1b mtDNA lineage but never mind because the component will vanish again as we move on.

K=10 – A Morocco-centered component shows up here (green), also found in Algerians and Libyans. A distinct Ethiopian-specific component is also defined (influencing Egypt and Libya significantly and to much lesser extent also NW Africa).

K=11 – A small and very interesting component exclusive of South Morocco shows up.

(Note: at K=12 there is a third Tunisian component, go figure!, but I don’t think that is informative at all so it’s not shown).

Note: A reader suggested that some North Africans in these samples are heavily admixed with Tropical Africans, distorting the results in that aspect. I can’t say but, if I manage to get working the program variant that should show individual instead of population bars, then we will find out.

Fst Distances at K=11:

Notice please that the South Moroccan component is extremely distant to all (Eurasians and Africans alike). I will speculate (as I have done before seeing this) that this component, now almost only restricted to Southern Morocco and heavily admixed, is a residue of the Aterian period and is related to a vaguely “Khoisanid” or equally vaguely “Mongoloid” phenotype found in the region.

Component apportions (numerical) at K=11:

Detail of K=11 graph:

47 Comments

Posted by Maju on December 29, 2011 in African genetics, autosomal DNA, North Africa, population genetics, self-research, West Eurasia

Playing around with ADMIXTURE

26 Dec

I decided to gift myself these Saturnalia with the basic knowledge of how to use the ADMIXTURE program. It is not easy but with the help of Razib’s instructions, a good dose of patience and some computer savvy-ness I managed yesterday to have something done, even if not exactly what I wanted.

First of all I cleaned up the population file from all populations that have no apparent relation with West Eurasia and also a bunch of tiny minorities like Druzes, Bedouins, etc., which tend to be rather non-informative, and so on. I still retained a number of populations from all around Europe: several North Africans, even more West Asians and Caucasians and then also some peoples from Central Asia and Siberia. I committed two errors however: I removed most NW European representatives by taking out both the CEU (Utah Euroamericans) and North European samples and I accidentally retained two Caucasian Jewish populations.

Good enough for a draft, not good enough for the strategy I had in mind. I went all the way down to K=7 but I will show here only one panel, and only because it offers a perspective that my second attempt, today, did not achieve so neatly (different strategy, different results): to show a clear cut of the European and West Asian components:

example from a previous run: Europe – West Asia duality

We can see here four components:

Red: West Asian
Purple: European
Green: North African
Cyan: Siberian

North African genetic influence in Europe is almost trivial and concentrated in Iberia and the Balcans, although this influence is more apparent in West Asia. Siberian influence is also minor, excepting the Chuvash and to much lesser extent Russians and other East Europeans.

However West Asian influence is more important and concentrates in the Balcans and Italy. North Caucasian peoples are clearly West Asians genetically speaking, even if they technically live in Europe. In turn European genetic influence outside the subcontinent is concentrated along the Northern African coast, Asia Minor and Cyprus.

I’d say that the West Asian (red) component correlates quite strictly with the extent of demic replacement in the Neolithic (although, naturally, the demic wave would have been each generation more European and less West Asian).

Today’s strategy

Today I decided to be more methodical and also to reduce population numbers in order to speed up the process. I decided to only keep one North African and one Siberian populations (Moroccans and Selkups) and to reduce a lot the West Asian and Caucasian array of samples (I retained: Palestinians, Kurds, Turks and Georgians). I retained all non-Caucasus European populations, including the omissions of the previous day: CEU and North Europeans.

However I cut all samples to 10 members. Actually Belarus (only 9) and another unknown sample by error have just 9 but that should not affect the results. I doubted about retaining higher numbers for larger populations like North Europeans, Russians, French and Spaniards but in the last moment I chose not to (next time I probably get in 20 of each instead of just 10). In any case the smaller number of samples allowed me to go faster with the runs and reach deeper levels quite easily.

And I went on with the runs, getting this:

… and this:

The color code is a bit crazy and absolutely un-cool but I have managed to figure that it gives red to pop0 and then similarly spaced hues until blue or magenta. I’d rather prefer if the program was able to keep the same color for each comparable component but that seems to require human intervention (dyeing).

I decided that it was best to spend my time putting them side by side as above (also human intervention).

Points of interest

K=3

As in the previous trial, the first detached populations were North Africans (Moroccans) and Siberians (Selkups). Nothing unexpected. The Siberian component is clearly more distant than the North African one from the main component (European in this case, because the West Asian specificity is masked between Europe and North Africa once the samples have been reduced).

Fst (components):

Siberian–Berber 0.131
Siberian–European 0.112
Berber–European 0.054

It’s clear (and is consistent along runs) that North Africans (Berber for short) are much closer to Europeans than Siberian natives (including the partly European Selkups). West Asians generally stay 50-50 between the European and North African components (because their specificity has not yet been unveiled because of the effects of sample size, smaller than usual).

I did not run K=2 but I imagine that it’d result in Selkups vs the rest, meaning East Asians vs West Eurasians overall.

I could express the distances in a neutral form pop0, pop1 as the program does but I think it’s more confusing (I get confused myself), so maybe better to use a label and hope it is a good choice.

Most Fst distances are in the 0.040-0.070 range. I won’t emphasize them.

K=4

The division of Europe into two components takes place at this stage. I decided to label them NE European and SW European because the latter is too influential in NW Europe and too low in the Balcans to be merely “South” (more presence among Northern Europeans than in Romania or Turkey), even if the NE component is more of a general presence. I wonder where they come from, if they are the produce of a duality in the early colonization of Europe, something like Aurignacian vs Gravettian or what? In any case both seem equally European and not originated outside the subcontinent. They are persistent across runs.

K=5

The West Asian specificity shows up, with focus in Georgia. West Asians finally stop looking like a mere amalgam of Europeans and North Africans and display their unique personality.

I insist in this being a mere effect of the sampling strategy: more West Asian samples would have caused this specificity to show up earlier in the runs (K=4) but, maybe more importantly, the European difference would have been the one eclipsed by the West Asian component. I actually have one example from yesterday’s exercise:

counter-example from a previous run

Here Europeans and West Asians appear all mostly Green, which is primarily the West Asian component (and not the European one yet). While some North African affinity persists, this has nothing to do with the 50-50 eclipse of West Asian specificity that we can see in the main exercise.

This is a good example why we must beware of the exactitude of the components produced by these algorithms because often, differences in sample strategies and depth of analysis may show or hide critical insight.

K=6 – Slovenian Neanderthals or what?!

Since this level of analysis we get a small and quite puzzling new component that almost only exists in Slovenes and is not even dominant among them. Usually you don’t get such a lesser component, much less shows up once and again in several K-depths. It is also just the third European-specific component, what the heck?!

The explanation may be that it is extremely distant from all the rest, so even if small it had little choice but surfacing.

The Fst distances of the Slovenian odd component are extreme: 0.312, 0.233, 0.241, 0.284, 0.239 with each of the other components. By comparison, the largest distance of the Selkup component is just 0.155, while the largest distance I got between World populations in an ad-hoc K=3 run was 0.195.

So this component, whatever it means, is significantly more distant to everything else in the region than continental populations are between each other. I can only think in massive local Neanderthal admixture but I know this is so weird and unlikely that a mere algorithm error is probably the truth.

If you have any idea… I welcome it.

K=7

New component: Palestinian!

K=8

An Orcadian component shows up (but vanishes at K=10).

K=9

A lesser Kurdish component shows up but it does not have the weird Fst distances of the Slovenian one, in spite of the first sight similitude.

K=10

The Orcadian and Kurdish components vanish (may they resurface in further runs? – I never run them). Instead Chuvash, Basque and a distinct Sardinian specific components show up.

I stopped here because it was taking longer and longer (some 50 mins for just this last run) and my patience is limited (specially when I have no clear goal).

This is the detailed spreadsheet snapshot of the exact distribution of the components at K=10:

click to expand

And the K=10 detail:

Mini update: the K5 detail, which is in a sense a simplified display of the same general scheme of things: showing the two main European components, one West Asian (Caucasus) component, the North African and the Siberian components:

Many doubts

The toy seems curious and I did at least manage to make it work at the basics. But I’d like to know:

How to sort populations so they show up in some logical order, like all Moroccan samples side by side and such.
Can I command Plink to retain populations instead of just remove them?
Where can I get other samples? I’m particularly interested in samples of SW Europe but really whatever will do: I’ll follow the candy bait, I reckon.
How can I make the results show individual instead of whole-population bars?
How can I get the data (cross-ref-validation?) that indicates when the likelihood of meaning of a run is low or high.
Etc. (surely a lot remains in the ink jar – I just forgot)

Thanks in advance.

Update (Dec 28): Fst distances

Table of Fst genetic distances at K=10:

I marked with red stars the extreme (>0.2) Fst distances of the Slovene component, orange ones the those in highest quintile (after removing the Slovene oddity), which are all from the Siberian component, and green ones the lowest quintile Fst distances.

I also made an Euler diagram sketching Fst genetic distances between the various West Eurasian components:

Where Fst distances in the lowest quintile (after removing the Slovene oddity; <0.084) are shown with continuous lines and the second quintile (0.084-0.107) are shown with dotted lines. (Note: image corrected from first posted version, which had an error).

I think it gives an interesting impression of the possible relations between the various components, in which the NE Euro and Caucasian components (and to a slightly lesser extent the Basque one) seem pivotal, almost as if all the other West Eurasian components are peripheral outgrowths. The short Fst distance between NE Euro and Caucasus (or Highland West Asia) components already showed up in some of the analysis of Dienekes, raising some eyebrows, at least mine. However, as he does not use the smaller components, some of the correlations, notably that the Basque component is also in that pivotal zone, were not apparent at the time.

PS- highly tentative reconstruction of pop. history (excluding the Slovene odd component), based on average Fst (Fst(core))towards the “core” Caucasus/NE Euro components:

Fst(core)=0.125 – Divergence of Siberian/East Asian component (0.110 Chinese/CEU per Wikipedia): Eurasian expansion after the OoA.
Fst(core)=0.102-0.100 – Divergence of Sardinian (?) and North African components: Dabban industries?
Fst(core)=0.091 – Divergence of SW European component (Aurignacian?)
Fst(core)=0.084 – Divergence of the Palestinian component
Fst(core)=0.079 – Chuvash component
Fst(core)=0.065 – Basque component
Fst=0.060 – Caucasus and NE Euro divergence

A rough estimate of the possible Caucasus/NE euro divergence timing (by comparing the Fst values with those of presumably Aurignacoid divergences) would place it c. 24 to 30 Ka ago (depending on what values are used for the Aurignacoid divergence: 40 or 44 Ka ago and of which component is considered the SW Euro or the North African one). So I’d dare say that Basque, Caucasian and NE Euro components appear to have split ways (with all reservations) in the Gravettian period.

(Not sure how well it fits but this kind of maths would place the Siberian/East Asian divergence c. 55-60 Ka ago, a bit too recently IMO and the odd Slovenian component’s divergence, if real, c. 110 Ka ago, weirdly old but H. sapiens rather than Neanderthal).

43 Comments

Posted by Maju on December 26, 2011 in autosomal DNA, European origins, population genetics, self-research, West Eurasia

On the origin of mitochondrial macro-haplogroup N

12 Dec

The notion that the migration of Homo sapiens out of Africa had to pivot around West Asia has been deeply entrenched in our minds, partly because geographical common sense, partly because Eurocentrism, partly maybe because of the Judeo-Christian-Muslim religious background of most influential researchers historically…

However in the last years this idea has been challenged by the coastal migration theory that proposes a migration mostly along the coasts of the Indian Ocean rather than through the interior of Asia. This theory was first outlined by population geneticists, who needed to explain the facts of haplogroup distribution in Eurasia, not at all more diverse towards the West, as we could expect from the classical models pivoting around the Fertile Crescent, but rather towards the East and very specially in South Asia. Later it has been also corroborated, with lesser shadings maybe, by archaeologists who have sought material support in Arabia and India and found it.

While the origin of mitochondrial macro-haplogroup M in South Asia is seldom contested, that of its “sister” N is seldom agreed upon. The reason is that it is distributed somewhat evenly through all Eurasia, Australasia and even America.

This map, from the Metspalu 2005 paper (open access), illustrates the issue and how even renowned geneticists doubted not long ago on where to place the urheimat of the haplogroup:

The phylogeny has anyhow been refined in these six and a half years and you may notice that Australasia is not even included in the map, although it does play an important role, being surely more important than West Eurasia. In any case the map is illustrative of this state of confusion. Confusion that I will try (once again and hopefully for good) to dispel in this article.

The facts of mtDNA N

Macro-haplogroup N has 15 acknowledged basal haplogroups scattered through all Eurasia and Aboriginal Australia. They have diverse numerical importance but what matters to me here is how many mutations (coding region transitions, to be more precise) they are downstream of the N node. Why? Because this is surely indicative of the timing of their respective expansions in relation with N as such.

Looking at this measure we find the following classes of N sub-haplogroups:

Elder daughters: one coding region mutation downstream of N: N1’5, N9, N11, S and R. Notice that among these R holds a special place, not for any phylogenetic reason but because it has a scatter as wide as that of her mother N, suggestive of a very early coalescence and some sort of association between both expansions.
Two mutations downstream of N: N10 and O.
Four mutations downstream of N: N2 (incl. W), A and X.
Extremely long stems, rare clades without any known node under N: N8, N13, N14, N21, N22.

This distinction is not very important but I have always present in any case, because it implies that the various classes of subhaplogroups expanded at different moments after the N node. Notably there is a “pause” at the place of the third mutation and then after the fourth. So we can well imagine the expansion of N as a double explosion, first the two first categories and then the third and maybe the fourth.

Representing each haplogroup as a dot, where they might have coalesced (often a hunch within the local region), the result is as follows:

1.- Estimated coalescence of basal subhaplogroups of N

The size of the dots represents only the “class”, that is: how many mutational steps they are under N, the larger the closer they are and the earlier they must have coalesced (according to the laws of probability). The peculiar macro-haplogroup R (whose approx coalescence location was estimated in the past and I will not explain here) has been painted of a lighter blue and given a slightly larger size.

I have also outlined the cloud of N expansion at mutational steps 1 and 2 (no difference), which are followed by an apparent pause at mutational step 3, as mentioned above. The cloud has been pushed northwards a bit in East Asia in order to avoid disputes on where exactly did N9 coalesce (it does not make much of a difference if you prefer Beijing over Shanghai for this clade’s coalescence in the end).

Notice that this N cloud is almost identical as would be the M cloud (not shown but look here for a reference if you wish). Whether they were simultaneous or, as I think, N coalesced and expanded a bit after M did, their geography was the same: South Asia, East Asia and Australasia without distinctions. This T-shaped region (with the East on top) was the homeland of the first Eurasian (or more properly non-African) population of Homo sapiens (excepted those who remained in Arabia, which are another story).

The geographic origin of N

Alright, I have described the scatter of N subhaplogroups and the most likely sequence of the expansion but my main purpose here is to estimate the origin, the urheimat of N: where did the N matriarch, the ultimate matrilineal ancestor of all N people today, live?

I apply the statistical principle by which the derived basal haplogroups should tend to remain not too far away from the common origin. Being the most removed ones, exceptions and never the rule. It does makes sense, right?

Hence if we can estimate the centroid of the geometry described by the 15 haplogroups, we will have found the origin of N – or at least a raw estimate of it. There are several methods to estimate centroids but I chose to use the geometric one. In fact, for simplicity, I divided the subhaplogroups in three sets of five (so they all weight the same) and estimated their centroids by geometric decomposition. Then I estimated the centroid of the resulting triangle.

If I am correct the raw centroid of N is at the lower Mekong:

2.- Possible origins of mtDNA N (blue flowers): A – ‘raw’ geometric centroid, B – corrected against directionality.

I have argued on occasion that, in order to compensate for the directionality of the expansion, a correction can be applied to the geometric centroid or raw estimate of the origin. This correction should pull the origin towards the parent node, in this case L3 in East Africa (estimated here). How much? Maybe 1/4, maybe 1/3… this step, even if probably very reasonable, is a guess and not rocket science. Here I chose to use 1/4 and then look for the closest coast, which is that of Bengal – alternatively I can use a crooked line that follows the geography and get the same result (even less ambiguously Bengal again).

If I would have chosen a 1/3 value for the correction, it would fall in a more central part of India, if 1/5 in Burma surely. We can’t be sure of where exactly that happened but we can be more than reasonably sure that it was between India and Cambodia.

And nowhere else: not in West Asia, not in Altai… thanks for the suggestions but I have heard that before… many times… always without a single piece of evidence nor well-reasoned backing of any sort.

The data says otherwise: around the Bay of Bengal or even further East maybe.

Getting R into the picture

I have said before (and is obvious for anyone interested on population genetics) that mtDNA R is peculiar. While it is not different phylogenetically from other subclades of N which are separated by just one coding region mutation, its geographic distribution is very different, because R, like its mother N, is everywhere.

In order to show it more clearly, I drew approximate origins of all basal R-subclades (in lighter blue). The size of the circles follows the same logic as do those of N above, representing only the distance from the mother node (R in this case, what means one step further downstream in relation with N), and hence a probable order of coalescence:

3.- Scatter of N (deep blue) and R (cyan) subhaplogroups. The flower indicates the possible common origin.

The scatter of R fits very curiously within that of N(xR). They do not overlap too much maybe and it looks on first sight like R could have pushed other N around to the margins of the common expansion cloud. However this does not seem to happen with M, so maybe another explanation is needed, like undifferentiated N and R traveling together, mostly under the leadership of the latter and causing different founder effects in different locations.

Whatever the case it is worth a good meditation, because it is possible that both haplogroups (mother N and daughter R) coalesced in rapid succession in a single region (Bengal probably).

64 Comments

Posted by Maju on December 12, 2011 in coastal route, Eurasia, Eurasian colonization, mtDNA, Oceania, self-research

For what they were… we are

Category Archives: self-research

Egyptian autosomal genetics in the regional context (quick ‘Admixture’ run)

(Sephardi) Jews in the context of the Levant and Anatolia

North African genetics through the prism of ADMIXTURE

Playing around with ADMIXTURE

On the origin of mitochondrial macro-haplogroup N

Recent Posts

Archives

Categories

Meta