Estonian geneticist Mait Metspalu has in the past performed leading research of the genetic pool of South Asia, so crucial to understand not just the subcontinental populations but all Eurasia as a matter of fact. Again he and his team provide us with valuable material to understand this region and its wider continental context:
The authors added 142 samples from India to pre-existing catalogs and found that:
30% of SNPs found in Indian populations were not seen in HapMap populations and that compared to these populations (including Africans) some Indian populations displayed higher levels of genetic variation, whereas some others showed unexpectedly low diversity.
Reinforcing the generally acknowledged notion that India hosts very large, albeit largely untapped, genetic diversity.
Nothing really new in the wider picture but always worth reminding the basics (principal component analysis of Eurasians):
|Supp. Fig. 12
|Supp. Fig. 2 (part)
The Pakistan-India (ANI-ASI) duality
These two components are apparent at both the PC analysis (PC2 and PC4) but maybe more clearly within the ADMIXTURE cluster analysis. The authors decided to use K=8 where I would have used K=13 (preferred by the combination of both check algorithms shown at Supp. Fig. 4 b and c) but the result is only different (for this purpose) in the inclusion or not of Caucasian populations in the ANI-equivalent component (k5 in the maps below).
Iranians are always included, as are Central Asians but quite less emphatically anyhow at K=13 than at K=8, as the affinity splits between the Baloch
(ANI) component and the Caucasus-specific one. However Russians do not show any Caucasus-specific affinity and show instead strong influence of the ANI component, which seems to correlate well with Y-DNA R1a, specially once the Caucasus affinity is detached at K=13.
Whatever the case at K=8:
The authors do in fact make an effort to discern if the Baloch-ANI could represent the much discussed Indoeuropean (or Aryan) invasion (hardly doubted in the linguistic plane but not clearly supported in the genetic one). They conclude however that the arrival of the ANI component in South Asia should be much older, at least 12,500 years old, that is: clearly pre-Neolithic
– and in any case not related to the Indo-Aryan invasion
Barely outlined South Asian internal structure
It is interesting that at deeper K levels (K=18) a Gujarat-centered component (middle green), distinct from the two mentioned so far appears and takes a dominant role in most populations, particularly displacing the Baloch (light green) component:
|Cut from Supp. Fig. 4a
I would like to encourage transcending the limitations of the chosen K=8 level of analysis and dive in the K=18 analysis found in the Supplemental Figures’ PDF
(fig. 4). As said before, the optimal level of analysis seems to be K=13 or maybe K=12, rather than the chosen one of K=8. Above K=10 in any case. However many of the improvements of greater resolution take place outside of South Asia, so for most purposes there is no difference (other than the inclusion or exclusion of the Caucasus’ populations in the ANI bloc).
Something else that I miss here is a regional, South Asian specific (maybe with the inclusion of some West Asian and SE Asian controls), analysis. It may have offered interesting insights but it is just outlined, with just four South-Asian-specific components at K=18: more than enough for the pan-Eurasian analysis but surely quite limited to discern the details of population structure in South Asia alone.
One of the most specific findings of this survey is the detection of a group of alleles (at genes DOK5, CLOCK) that have been apparently selected for in South Asians but that has become harmful as diet and lifestyles change today, favoring type 2 diabetes.