The notion that the migration of Homo sapiens out of Africa had to pivot around West Asia has been deeply entrenched in our minds, partly because geographical common sense, partly because Eurocentrism, partly maybe because of the Judeo-Christian-Muslim religious background of most influential researchers historically…
However in the last years this idea has been challenged by the coastal migration theory that proposes a migration mostly along the coasts of the Indian Ocean rather than through the interior of Asia. This theory was first outlined by population geneticists, who needed to explain the facts of haplogroup distribution in Eurasia, not at all more diverse towards the West, as we could expect from the classical models pivoting around the Fertile Crescent, but rather towards the East and very specially in South Asia. Later it has been also corroborated, with lesser shadings maybe, by archaeologists who have sought material support in Arabia and India and found it.
While the origin of mitochondrial macro-haplogroup M in South Asia is seldom contested, that of its “sister” N is seldom agreed upon. The reason is that it is distributed somewhat evenly through all Eurasia, Australasia and even America.
This map, from the Metspalu 2005 paper (open access), illustrates the issue and how even renowned geneticists doubted not long ago on where to place the urheimat of the haplogroup:
The phylogeny has anyhow been refined in these six and a half years and you may notice that Australasia is not even included in the map, although it does play an important role, being surely more important than West Eurasia. In any case the map is illustrative of this state of confusion. Confusion that I will try (once again and hopefully for good) to dispel in this article.
The facts of mtDNA N
Macro-haplogroup N has 15 acknowledged basal haplogroups scattered through all Eurasia and Aboriginal Australia. They have diverse numerical importance but what matters to me here is how many mutations (coding region transitions, to be more precise) they are downstream of the N node. Why? Because this is surely indicative of the timing of their respective expansions in relation with N as such.
Looking at this measure we find the following classes of N sub-haplogroups:
- Elder daughters: one coding region mutation downstream of N: N1’5, N9, N11, S and R. Notice that among these R holds a special place, not for any phylogenetic reason but because it has a scatter as wide as that of her mother N, suggestive of a very early coalescence and some sort of association between both expansions.
- Two mutations downstream of N: N10 and O.
- Four mutations downstream of N: N2 (incl. W), A and X.
- Extremely long stems, rare clades without any known node under N: N8, N13, N14, N21, N22.
This distinction is not very important but I have always present in any case, because it implies that the various classes of subhaplogroups expanded at different moments after the N node. Notably there is a “pause” at the place of the third mutation and then after the fourth. So we can well imagine the expansion of N as a double explosion, first the two first categories and then the third and maybe the fourth.
Representing each haplogroup as a dot, where they might have coalesced (often a hunch within the local region), the result is as follows:
|1.- Estimated coalescence of basal subhaplogroups of N|
The size of the dots represents only the “class”, that is: how many mutational steps they are under N, the larger the closer they are and the earlier they must have coalesced (according to the laws of probability). The peculiar macro-haplogroup R (whose approx coalescence location was estimated in the past and I will not explain here) has been painted of a lighter blue and given a slightly larger size.
I have also outlined the cloud of N expansion at mutational steps 1 and 2 (no difference), which are followed by an apparent pause at mutational step 3, as mentioned above. The cloud has been pushed northwards a bit in East Asia in order to avoid disputes on where exactly did N9 coalesce (it does not make much of a difference if you prefer Beijing over Shanghai for this clade’s coalescence in the end).
Notice that this N cloud is almost identical as would be the M cloud (not shown but look here for a reference if you wish). Whether they were simultaneous or, as I think, N coalesced and expanded a bit after M did, their geography was the same: South Asia, East Asia and Australasia without distinctions. This T-shaped region (with the East on top) was the homeland of the first Eurasian (or more properly non-African) population of Homo sapiens (excepted those who remained in Arabia, which are another story).
The geographic origin of N
Alright, I have described the scatter of N subhaplogroups and the most likely sequence of the expansion but my main purpose here is to estimate the origin, the urheimat of N: where did the N matriarch, the ultimate matrilineal ancestor of all N people today, live?
I apply the statistical principle by which the derived basal haplogroups should tend to remain not too far away from the common origin. Being the most removed ones, exceptions and never the rule. It does makes sense, right?
Hence if we can estimate the centroid of the geometry described by the 15 haplogroups, we will have found the origin of N – or at least a raw estimate of it. There are several methods to estimate centroids but I chose to use the geometric one. In fact, for simplicity, I divided the subhaplogroups in three sets of five (so they all weight the same) and estimated their centroids by geometric decomposition. Then I estimated the centroid of the resulting triangle.
If I am correct the raw centroid of N is at the lower Mekong:
|2.- Possible origins of mtDNA N (blue flowers): A – ‘raw’ geometric centroid, B – corrected against directionality.|
I have argued on occasion that, in order to compensate for the directionality of the expansion, a correction can be applied to the geometric centroid or raw estimate of the origin. This correction should pull the origin towards the parent node, in this case L3 in East Africa (estimated here). How much? Maybe 1/4, maybe 1/3… this step, even if probably very reasonable, is a guess and not rocket science. Here I chose to use 1/4 and then look for the closest coast, which is that of Bengal – alternatively I can use a crooked line that follows the geography and get the same result (even less ambiguously Bengal again).
If I would have chosen a 1/3 value for the correction, it would fall in a more central part of India, if 1/5 in Burma surely. We can’t be sure of where exactly that happened but we can be more than reasonably sure that it was between India and Cambodia.
And nowhere else: not in West Asia, not in Altai… thanks for the suggestions but I have heard that before… many times… always without a single piece of evidence nor well-reasoned backing of any sort.
The data says otherwise: around the Bay of Bengal or even further East maybe.
Getting R into the picture
I have said before (and is obvious for anyone interested on population genetics) that mtDNA R is peculiar. While it is not different phylogenetically from other subclades of N which are separated by just one coding region mutation, its geographic distribution is very different, because R, like its mother N, is everywhere.
In order to show it more clearly, I drew approximate origins of all basal R-subclades (in lighter blue). The size of the circles follows the same logic as do those of N above, representing only the distance from the mother node (R in this case, what means one step further downstream in relation with N), and hence a probable order of coalescence:
|3.- Scatter of N (deep blue) and R (cyan) subhaplogroups. The flower indicates the possible common origin.|
The scatter of R fits very curiously within that of N(xR). They do not overlap too much maybe and it looks on first sight like R could have pushed other N around to the margins of the common expansion cloud. However this does not seem to happen with M, so maybe another explanation is needed, like undifferentiated N and R traveling together, mostly under the leadership of the latter and causing different founder effects in different locations.
Whatever the case it is worth a good meditation, because it is possible that both haplogroups (mother N and daughter R) coalesced in rapid succession in a single region (Bengal probably).