Monthly Archives: October 2012

Finally some improved knowledge of haplogroup R1a1 (Y-DNA)

Haplogroup R1a, most of which is R1a1, dominant in Northern South Asia and Eastern Europe, as well as in much of Central Asia, has been giving headaches to population geneticists, academic and amateur alike, because key markers were not identified, making most of the haplogroup look like an amorphous goo, the same in India as in Europe. It seems that this may change now:
Horolma Pamjav et al., Brief communication: New Y-chromosome binary markers improve phylogenetic resolution within haplogroup R1a1. AJPA 2012. Pay per view ··> LINK [10.1002/ajpa.22167]


Haplogroup R1a1-M198 is a major clade of Y chromosomal haplogroups which is distributed all across Eurasia. To this date, many efforts have been made to identify large SNP-based subgroups and migration patterns of this haplogroup. The origin and spread of R1a1 chromosomes in Eurasia has, however, remained unknown due to the lack of downstream SNPs within the R1a1 haplogroup. Since the discovery of R1a1-M458, this is the first scientific attempt to divide haplogroup R1a1-M198 into multiple SNP-based sub-haplogroups. We have genotyped 217 R1a1-M198 samples from seven different population groups at M458, as well as the Z280 and Z93 SNPs recently identified from the “1000 Genomes Project”.

The two additional binary markers present an effective tool because now more than 98% of the samples analyzed assign to one of the three sub-haplogroups. R1a1-M458 and R1a1-Z280 were typical for the Hungarian population groups, whereas R1a1-Z93 was typical for Malaysian Indians and the Hungarian Roma. Inner and Central Asia is an overlap zone for the R1a1-Z280 and R1a1-Z93 lineages. This pattern implies that an early differentiation zone of R1a1-M198 conceivably occurred somewhere within the Eurasian Steppes or the Middle East and Caucasus region as they lie between South Asia and Eastern Europe. The detection of the Z93 paternal genetic imprint in the Hungarian Roma gene pool is consistent with South Asian ancestry and amends the view that H1a-M82 is their only discernible paternal lineage of Indian heritage.

Not having access to the paper right now, I can’t say much more but I believe that the abstract alone is very informative already.

Distribution of R1a per Underhill 2010


Fig. 1 – MJ trees
(click to expand)
A reader already sent me a copy of the paper and I think that it has two aspects:
On one side the paper effectively detects these markers and study them, as well as R-M458 in Hungarians and related ethnic groups (Csangos, Szeklers, Hungarian Roma), as well as in Malaysian Indians, Uzbeks and Mongols. This part is informative, even if the selected Asian populations may not be the best choice (Mongols are low in R1a and so are Tamils who make up the bulk of Malaysian Indians).

On the other side, the authors attempt to read too much, not just on these haplogroups but specially on molecular-clock-o-logic estimates, (based on the Zhivotovsky mutation rate, now considered obsolete even by molecular clock enthusiasts). A corrected age estimate would be roughly doubly old[ref 1, ref 2] and that means that neither the Kurgan expansion nor the Neolithic one could account for its arrival to Europe.
Even using Underhill age estimates, they’d imply at least LGM dates for the arrival to Europe after the due correction. Their own dates, after due x2 correction, give Late Upper Paleolithic dates for the haplogroups researched here. 
Also the authors insist on arguing against a South Asian origin of R1a1 (Underhill 2010) on what sound like weak and fallacious arguments:

Previous publications have pointed out that regions of highest haplogroup frequencies do not always indicate the territory of origin (Cinnioglu et al., 2004) and high STR diversity may not be exclusively an indicator of in-situ diversification but could also be the consequence of repeated gene flow from different sources (Zerjal et al., 2002; Sharma et al., 2009).

Basically they are nagging: “Underhill could hypothetically be wrong in his conclusions but we have no evidence whatsoever that he is – just saying”. 
The real reason is that they seem to hope to find a more westerly origin for the lineage and attribute it again to Indoeuropean expansions, in line with classic speculations for which the high South Asian STR diversity levels are a big problem. However it is most unlikely that a bunch of horse-riding nomads could so radically alter the genetic landscape of the whole subcontinent, more so when its agriculture was already fully developed, sustaining no doubt high densities. 
But notwithstanding all those highly questionable opinions, the discovery of new haplogroups adding to our comprehension of this major lineage is a great advance.


It seems that some of the data exposed in this paper was already floating around in some circles because ISOGG already includes the “new” haplogroups in its phylogenetic synthesis. Most interestingly the two “European” clades (along with a third one, whose geography I ignore so far) make up a larger haplogroup (R1a1a1b1aS198/Z282), which is “brother” of the “Asian” one (R1a1a1b2S202/Z93).

As I was just commenting elsewhere the key to the origins of R1a is not so much in these low level haplogroups but in the higher “asterisk” paragroup, which (from memory) used to be concentrated in Pakistan and nearby areas of India, etc.

But once reached the level of R1a1a1b1 (S339/Z283), this lineage seems to have split in two: one which we can describe as “European” and another which we can describe as “Indian”.

The European half is treated in this paper as two of its subclades only and separately, what may be confuse. Hence I am adding here a synthesis of the current ISOGG phylogeny of R1a, with some annotations, for easier reference:

  • R1a* ··> Iran, Persian Gulf, Turkey
  • R1a1  (L120/M516, L122/M448, M459, Page65.2/SRY1532.2/SRY10831.2)
    • R1a1* ··> Iran, Caucasus, Greece, Scandinavia
    • R1a1a (L168, L449, M17, M198, M512, M514, M515)
      •  R1a1a* ··> where? (not clear)
      •  R1a1a1 (M417, Page7)
        • R1a1a1* ··> where?
        • R1a1a1a (L664/S298)  ··> where?
        • R1a1a1b (S224/Z645, S441/Z647)
          • R1a1a1b* ··> where?
          • R1a1a1b1 (S339/Z283)
            • R1a1a1b1* ··> where?
            • R1a1a1b1a (S198/Z282)
              • R1a1a1b1a*
              • R1a1a1b1a1 (M458) ··> Central & East Europe
              • R1a1a1b1a2 (S204/Z91, S466/Z280) ··> Europe, Central Asia
              • R1a1a1b1a3 (S221/Z284, S443/Z289) ··> where?
            •  R1a1a1b2  (S202/Z93) ··> India, Central Asia

All the data on the geography of top level “asterisk” paragroups is from Underhill 2010, already mentioned above. It suggest a West Asian origin for R1a overall and spread to West and East since the R1a1a level or lower.

I used colors to emphasize the clades discussed here (purple for the larger haplogroup, blue for the European-leaning clade and red for the Indian-leaning one).

Clades in cursive are “proposed”, not yet consolidated.


Bonze Age city discovered in Northern China

Shimao was protected by a thick wall (source)
The site of Shimao in Northern Shaanxi was already known but new measures indicate it covers more than 4 km² being surrounded by a thick wall, what makes it a city without almost any possible debate.
I could only find so much information on the finding but the date provided (4000 years ago) would place it between the Late Neolithic culture of Longshan and the Early Bronze one of Erlitou, which is typically identified with the Xia dynasty. [But see update].
Sources in English: Mole, In Spanish: Spanish People.

Update: Va_Highlander has been looking up articles in Chinese (see comments) and it seems that the site is identified as having a long sequence from mid-Longshan to Erlitou (or Xia Dynasty).

The site is located on the Northern edge of the Loess Plateau, some 20 km away from the Yellow River. 

References in Chinese language: Guancha, Smwhys and

Bronze swords

Posted by on October 30, 2012 in archaeology, Bronze Age, China, East Asia, Neolithic


Emotional care or neglect of young children decisive for brain size, intelligence

An image says more than a thousand words. And in this case the image is a brain scan… or rather two side by side:

Both brains belong to 3-years-old children

… the child with the shrunken brain was neglected and abused by its
mother, and the child with the larger and more fully developed brain was
raised in a loving, supportive home and was looked after by its mother…

This is not just about IQ or head size, which may vary at least largely because of this type of environmental causes, but about everything in life, including emotional and social intelligence and the general ability to carry on with a normal, well integrated life (or become human waste). 
The first years of life are critical for all our development and parental love, very specially that of our mothers, is probably more important than almost anything else.

Source: Medical Daily.


Posted by on October 30, 2012 in Anthropometry, biology, epigenetics, mind


The genetic and phenotype complexity of the Oceanic language area

In this entry, rather than discussing Polynesians alone, which seem to be just the tip of the Eastern Austronesian iceberg, I’ll try to understand here the complexity of speakers of Oceanic languages, the main native language family of Island Oceania. 
Oceanic is a branch of Austronesian but for the purposes of this entry we will only mention other Austronesian peoples/languages tangentially. The focus is Oceanic because we can’t understand the parts without the whole here most probably. 


Oceanic languages are scattered as follows:

  Admiralties and Yapese
  St Matthias
  Western Oceanic and Meso-Melanesian (two distinct sub-families)
  Southeast Solomons
  Southern Oceanic
Black enclosed zones are pockets of languages from other families.
(CC by kwami)

It is certainly interesting that Micronesian and Fijian-Polynesian seem to be particularly related among them. Instead the Western Oceanic and Admiralty subfamilies (both from the islands near Papua) seem to have separated early on or diverged farther for whatever other reasons (stronger substrate influence for example).


Lapita pot from Tonga (source)
As I cited recently, Polynesians seem to have spread from Society Islands in the 1190-1290 CE window. The genesis of the Micronesian family is not well understood… but the overall genesis of Oceanic languages seems to be at the Lapita culture, which spread through Island Melanesia (excluding Papua) and some nearby islands (notably Tonga and Samoa also Marquesas c. 300 CE(ref)).
Early Lapita culture is dated to c. 1350-750 BCE, while a Late phase is dated to c. 250 BCE, spreading to the Solomon Islands, which show no indications of the earlier period (Ricaut 2010, fig. 2).
So a simplified chronology for Oceanic expansion would be
  1. Lapita culture from near Melanesia to Vanuatu and Kanaky (New Caledonia), then to:
    1. Fiji, Samoa and Tonga since c. 900 BCE
    2. Solomon Is. c. 250 BCE
  2. Arrival to Society Islands (Tahiti, etc.) c. 300-800 CE from maybe Samoa.
  3. Main Polynesian expansion to the farthest islands (Hawaii, Rapa Nui, Aotearoa-NZ) c. 1200 CE from Society Is.

Phenotype (‘race’)

A classical and unavoidable element in the ethnographic division of the region is phenotype, appearance (i.e. ‘race’). Since the first European arrival to the area the division between black Melanesians and white Polynesians (very relative as we will see now) has been part of all our conceptualizations of the region. 
Conscious of that and wanting to get a better impression I collected from the Internet what I estimate may be representative faces from the Oceanic linguistic zone and nearby areas (other Austronesians and Melanesians) and put them on a map:

Click to expand

A relatively homogeneous Polynesian phenotype can be identified and one can imagine that it stems from the area of Samoa-Tonga, considering the previous prehistorical review. But otherwise the diversity, gradations and abundance of local uniqueness seems quite impressive.
Based on other cases, one would imagine also that phenotype differences would be coincidental with genetic ones. However this is not too easy to discern, partly because Polynesians have strong founder effects that blur the matter, partly because there is no obvious strict dividing line between the various phenotypes and partly because of the insistence of some in considering Lapita as a Polynesian phenomenon, when it is obviously an Oceanic one, including and emphasizing the Melanesian side of the diverse Oceanic landscape, of which the Polynesian-Micronesian branch is just one element (famous and extended but not the core). 
The main Y-DNA lineage among Polynesians is C2a1 (P33), not found outside Polynesia senso stricto but reaching there frequencies of 63-90% (excepted Tonga where it’s only 33%). This is a clear founder effect in this population.

C subclades in SE Asia and Oceania
(from Karafet 2010, annotated with ISOGG nomenclature)
C2a1 is clearly derived from a Melanesian superset C2a (M208) still found as C2a(xC2a1) at low frequencies in Samoa (8%) and Tahiti (4%) but also in Vanuatu (2%) and coastal Papua (13%). C2a establishes a probably genetic link of Polynesians with Lapita culture and Melanesian peoples in general.
An earlier pylogenetic stage is C2 (M38), which is probably in the region since the very first colonization process some 50 thousand years ago (or maybe even earlier). C2(xC2a) is most common in Wallacea (East Indonesia, East Timor), where it reaches maybe figures of 33% on average. It is however also found in highland Papua (13%) and Vanuatu (20%) but as it is most doubtful that C2a evolved as recently as Lapita times, we should really focus on C2a as such rather than the wider C2, which only seems to confuse the matter.
The lack of C2(xC2a) in most of the Oceanic languages’ area clearly indicates that the expansion (and subsequent founder effects) did not begin in Wallacea but in  Melanesia, at least in what regards to C sublineages.
The other major Polynesian haplogroup is O3a2 (P201), which would seem to have originated in Philippines and maybe arrived there via Micronesia:

O3 subclades in SE Asia and Oceania
(from Karafet 2010, annotated with ISOGG nomenclature)

Melanesian populations also sport some lineages that are not common among other Oceanic-speaker peoples, notably K, M and S. However they are irregularly shared with Wallacea (Eastern Indonesia, East Timor). Like C2 these lineages coalesced in the region soon after colonization by Homo sapiens.
In the motherly side of things genetic, the absolutely dominant mtDNA lineage among Polynesians (the so-called Polynesian motif) is B4a1a1, which ultimately stems from East or rather SE Asia. However it probably arrived to the region (again) via Melanesia, albeit maybe somewhat tangentially.

From Friedlander 2007 (fig. 4)

Spatial frequency distribution of haplogroup B4a* and B4a1a1 in Island Southeast Asia and the western Pacific, created using the Kriging algorithm of the Surfer package of haplogroups. Figure 4b presents the detailed distribution for Northern Island Melanesia. Data details are provided in table S3.

The matrilineal Polynesian motif does offer a possible pattern of settlement, maybe related specifically to Late Lapita, that could allow us to understand the possible origin of the phenotype differences between Melanesians and Polynesians, as could do the Y-DNA lineage O3a2. However there are lots of remnants of quite strictly Melanesian Early Lapita, as is evident by the (Y-DNA) C2a lineages retained so strongly among Polynesians within their own founder effects, whose importance we cannot afford to dismiss.

Other mtDNA lineages like Q1 or M27 are of relevance in Melanesian populations. Q1 did make its way into some Polynesian populations but as minority lineage only.

Update (Oct 31):

Terry in the comments sections grunts a lot but now and then provides useful complementary data, for example this Y-DNA map of the region from Kayser 2006:

Kayser 2006 – fig. 1
Frequency distribution of (A, B) NRY and (C, D) mtDNA haplogroups found in Polynesia with a genetic origin in (A, C) Asia or (B, D) Melanesia.

As is apparent since Kayser’s publication (if not before), the Melanesian patrilineages are much more common (actually dominant) among Polynesians than the matrilineages from the same origin, what is attributable to a founder effect related to the Lapita culture.
Another interesting reference is this Y-DNA map of Papua (New Guinea) and some nearby islands (from Mona 2007):

Mona 2007 FIG. 2.—Y-chromosome haplogroups and their frequencies in populations from the Bird’s Head region and elsewhere in New Guinea. Data from other populations of New Guinea were used from previous studies (Kayser et al. 2003, 2006). Size of the pie charts is according to sample size of the groups. Abbreviations are as in supplementary table S1, Supplementary Material online.

Both maps and/or the data in the relevant papers provide key information on possible origins for the C2a-M208 patrilineal founder effect, so important in general in the Oceanic peoples and specially the Polynesian branch. The exact origin cannot be pinpointed without further research (or maybe not at all) but it’s clear that C2a-M208 only exists from Papua (New Guinea) to the East, so it must have a Melanesian origin be it Papuan or from the nearby islands.


  • François-Xavier Ricaut et al., Ancient Solomon Islands mtDNA: assessing Holocene settlement and the impact of European contact. Journal of Archaeological Science, 2010 ··> LINK (PDF).
  • Jonathan S. Friedlaender et al., Melanesian mtDNA Complexity. PLoS ONE, 2007 ··> LINK (open access).
  • Tatiana Karafet et al., Major East-West Division Underlies Y Chromosome Stratification Across Indonesia. MBE 2010 ··> LINK (free access).
  • Michael Knapp et al., Complete mitochondrial DNA genome sequences from the first New Zealanders. PNAS 2012 ··> LINK (open access).
  • Manfred Kayser et al., Melanesian and Asian Origins of Polynesians: mtDNA and Y Chromosome Gradients Across the Pacific. MBE 2006 ··> LINK (free access).
  • Stephano Mona et al., Patterns of Y-Chromosome Diversity Intersect with the Trans-New Guinea Hypothesis. MBE 2007 ··> LINK (free access).

Note: updates after first posted version in maroon color.


IL-4 genetic combo protects Indian hunter-gatherers from Malaria

Or, more precisely, protects many of those who have it in diverse populations but it is most concentrated among hunter-gatherers of the so-called Ancestral Tribal Populations (ATP).
Aditya Nath Jha et al., IL-4 Haplotype -590T, -34T and Intron-3 VNTR R2 Is Associated with Reduced Malaria Risk among Ancestral Indian Tribal Populations. PLoS ONE 2012. Open access ··> LINK [doi:10.1371/journal.pone.0048136]



Interleukin 4 (IL-4) is an anti-inflammatory cytokine, which regulates balance between TH1 and TH2 immune response, immunoglobulin class switching and humoral immunity. Polymorphisms in this gene have been reported to affect the risk of infectious and autoimmune diseases.


We have analyzed three regulatory IL-4 polymorphisms; -590C>T, -34C>T and 70 bp intron-3 VNTR, in 4216 individuals; including: (1) 430 ethnically matched case-control groups (173 severe malaria, 101 mild malaria and 156 asymptomatic); (2) 3452 individuals from 76 linguistically and geographically distinct endogamous populations of India, and (3) 334 individuals with different ancestry from outside India (84 Brazilian, 104 Syrian, and 146 Vietnamese).


The 590T, 34T and intron-3 VNTR R2 alleles were found to be associated with reduced malaria risk (P<0.001 for 590C>T and 34C>T, and P = 0.003 for VNTR). These three alleles were in strong LD (r2>0.75) and the TTR2 (590T, 34T and intron-3 VNTR R2) haplotype appeared to be a susceptibility factor for malaria (P = 0.009, OR = 0.552, 95% CI = 0.356 –0.854). Allele and genotype frequencies differ significantly between caste, nomadic, tribe and ancestral tribal populations (ATP). The distribution of protective haplotype TTR2 was found to be significant (χ23 = 182.95, p-value <0.001), which is highest in ATP (40.5%); intermediate in tribes (33%); and lowest in caste (17.8%) and nomadic (21.6%).


Our study suggests that the IL-4 polymorphisms regulate host susceptibility to malaria and disease progression. TTR2 haplotype, which gives protection against malaria, is high among ATPs. Since they inhabited in isolation and mainly practice hunter-gatherer lifestyles and exposed to various parasites, IL-4 TTR2 haplotype might be under positive selection.

The protection is not absolute but it holds very strong statistical significance for the R2-R3 heterozygous combo, as shown in fig. 1:
Figure 1. Distribution of IL-4 intron-3 VNTR polymorphism.
and B: genotype and allelic distribution between malaria case control
groups, respectively; C and D: genotype and allelic distribution among
caste, nomadic, tribe and ancestral tribe, respectively.

Combo that is most common (near-optimal distribution) among the ATPs. The correlation holds for the four linguistic families with variations being more a matter of individual ATP tribes: from 35% among the AoNaga (TB, Nagaland) to 67% among the Baiga (IE, Madhya Pradesh) or 63% among the Onge (Jarawa-Onge, Andaman Is.)
See also:


Variation in human (modern and archaic) and chimpanzee lipoprotein APOE

This new study has some interest in understanding some details, of metabolic relevance, of the genetics of humans and our closest relatives:
Annick McIntosh et al., The Apolipoprotein E (APOE) Gene Appears Functionally Monomorphic in Chimpanzees (Pan troglodytes). PLoS ONE 2012. Open access ··> LINK [doi:10.1371/journal.pone.0047760]



The human apolipoprotein E (APOE) gene is polymorphic, with three primary alleles (E2, E3, E4) that differ at two key non-synonymous sites. These alleles are functionally different in how they bind to lipoproteins, and this genetic variation is associated with phenotypic variation for several medical traits, including cholesterol levels, cardiovascular health, Alzheimer’s disease risk, and longevity. The relative frequencies of these alleles vary across human populations, and the evolution and maintenance of this diversity is much debated. Previous studies comparing human and chimpanzee APOE sequences found that the chimpanzee sequence is most similar to the human E4 allele, although the resulting chimpanzee protein might function like the protein coded for by the human E3 allele. However, these studies have used sequence data from a single chimpanzee and do not consider whether chimpanzees, like humans, show intra-specific and subspecific variation at this locus.

Methodology and Principal Findings

To examine potential intraspecific variation, we sequenced the APOE gene of 32 chimpanzees. This sample included 20 captive individuals representing the western subspecies (P. troglodytes verus) and 12 wild individuals representing the eastern subspecies (P. t. schweinfurthii). Variation in our resulting sequences was limited to one non-coding, intronic SNP, which showed fixed differences between the two subspecies. We also compared APOE sequences for all available ape genera and fossil hominins. The bonobo APOE protein is identical to that of the chimpanzee, and the Denisovan APOE exhibits all four human-specific, non-synonymous changes and appears functionally similar to the human E4 allele.


We found no coding variation within and between chimpanzee populations, suggesting that the maintenance of functionally diverse APOE polymorphisms is a unique feature of human evolution.

The relevant details are all in table 1:

Table 1. Variation at key APOE functional sites in Homo and Pan.
There is uncertainty about the correctness of the only known Neanderthal triplet.
Even if E4 seems to be the ancestral type, E3 is the most common allele in our species, ranging from 50% in most populations to as much as 90% among some tribes.

East Asian oaks in the Ice Age

It may sound botanically erudite but it is also of great relevance in order to better understand the ecology and geography of people living in East Asia in the Upper Paleolithic. Hence worth mentioning here.
Dongmei Chen et al., Phylogeography of Quercus variabilis Based on Chloroplast DNA Sequence in East Asia: Multiple Glacial Refugia and Mainland-Migrated Island Populations. PLoS ONE 2012. Open access ··> LINK [doi:10.1371/journal.pone.0047268]


The biogeographical relationships between far-separated populations, in particular, those in the mainland and islands, remain unclear for widespread species in eastern Asia where the current distribution of plants was greatly influenced by the Quaternary climate. Deciduous Oriental oak (Quercus variabilis) is one of the most widely distributed species in eastern Asia. In this study, leaf material of 528 Q. variabilis trees from 50 populations across the whole distribution (Mainland China, Korea Peninsular as well as Japan, Zhoushan and Taiwan Islands) was collected, and three cpDNA intergenic spacer fragments were sequenced using universal primers. A total of 26 haplotypes were detected, and it showed a weak phylogeographical structure in eastern Asia populations at species level, however, in the central-eastern region of Mainland China, the populations had more haplotypes than those in other regions, with a significant phylogeographical structure (NST = 0.751 > GST = 0.690, P < 0.05). Q. variabilis displayed high interpopulation and low
intrapopulation genetic diversity across the distribution range. Both
unimodal mismatch distribution and significant negative Fu’s FS indicated a demographic expansion of Q. variabilis
populations in East Asia. A fossil calibrated phylogenetic tree showed a
rapid speciation during Pleistocene, with a population augment occurred
in Middle Pleistocene. Both diversity patterns and ecological niche
modelling indicated there could be multiple glacial refugia and possible
bottleneck or founder effects occurred in the southern Japan. We dated
major spatial expansion of Q. variabilis population in eastern
Asia to the last glacial cycle(s), a period with sea-level fluctuations
and land bridges in East China Sea as possible dispersal corridors. This
study showed that geographical heterogeneity combined with climate and
sea-level changes have shaped the genetic structure of this wide-ranging
tree species in East Asia.

Maybe most interesting of all is this map: 

Figure 5. Ecological niche modelling.
distribution probability (in logistic value) is shown in each 2.5
arc-min pixel, based on the palaeodistribution modelling at present
(0BP) (a) and at the last glacial maximum (LGM) (21KaBP) (b). The
distribution of river systems on the exposed East China Sea during the
LGM was drawn from Shota et al. (2012). Occurrence records of Q. variabilis at present are also plotted as black points in the maps.

Much more data for those interested in the genetic details can be found in the paper.