Category Archives: molecular clock

Human Y chromosome undergoes purifying selection

A somewhat technical yet interesting study on Y chromosome evolution in humans:

Melissa A. Wilson Sayres et al., Natural Selection Reduced Diversity on Human Y Chromosomes. PLoS ONE 2014. Open accessLINK [doi:10.1371/journal.pgen.1004064]


The human Y chromosome exhibits surprisingly low levels of genetic diversity. This could result from neutral processes if the effective population size of males is reduced relative to females due to a higher variance in the number of offspring from males than from females. Alternatively, selection acting on new mutations, and affecting linked neutral sites, could reduce variability on the Y chromosome. Here, using genome-wide analyses of X, Y, autosomal and mitochondrial DNA, in combination with extensive population genetic simulations, we show that low observed Y chromosome variability is not consistent with a purely neutral model. Instead, we show that models of purifying selection are consistent with observed Y diversity. Further, the number of sites estimated to be under purifying selection greatly exceeds the number of Y-linked coding sites, suggesting the importance of the highly repetitive ampliconic regions. While we show that purifying selection removing deleterious mutations can explain the low diversity on the Y chromosome, we cannot exclude the possibility that positive selection acting on beneficial mutations could have also reduced diversity in linked neutral regions, and may have contributed to lowering human Y chromosome diversity. Because the functional significance of the ampliconic regions is poorly understood, our findings should motivate future research in this area.

Positive selection (or directional selection) happens when a variant gets so good that everything else becomes bad by comparison. This may be just because an environmental change, possibly caused by migration (or whatever other reason) substantially alters the rules of the game. Much more rarely a novel mutation (or accumulation of several of them) may happen to generate a phenotype that is much more fit even for pre-existent conditions. As I understand it, positive selection does happen only rarely (but spectacularly). An example in humans is the selection of whiter skin shades in latitudes far away from the tropics (because of the “photosynthesis” of vitamin D in the skin, crucial for early brain development), another more generalized one is the selection for improved brains (not necessarily just bigger), able to face changing conditions more dynamically and develop more efficient tools and weapons.
Purifying selection (or negative selection) is quite different and surely much more common. As novel mutations arise randomly, in at least many cases, the vast majority I dare say, they happen to be harmful for a previously well-tuned genotype (and its derived phenotype). As result, the carriers have decreased opportunities for reproduction, when they don’t just die right away. Natural selection acts mostly this way and in many cases the types can become very stable for this reason, as happens with genera that have been successful on this planet since long before humankind arose, such as sharks or crocodiles.
This last is what seems to be happening to the human Y chromosome: novel mutations are at least quite often harmful (maybe they cause sterility or whatever other traits in the male that cause decreased reproductive efficiency) and they are regularly pruned off the tree by natural selection. 

Purifying selection slows down the effective mutation rate

Interestingly the authors mention that:

… if purifying selection is the dominant force on the Y chromosome, the topology of the tree should remain intact, but the coalescent times are expected to be reduced.

That would be, I understand, because the observed mutation rate has little relation with the actual accumulated (effective) mutation rate, which is much slower because of the continuous pruning of the negative selection.
Purifying selection has also been observed in the mitochondrial DNA, having the same kind of slowing impact on the “molecular clock”.

Posted by on January 26, 2014 in evolution, human evolution, molecular clock, Y-DNA


Homo sapiens was in China before 100,000 years ago!

This finding consolidates the recent dating of African-like industries of India to c. 96,000 years ago, as well as other previous discoveries from mostly China, and, jointly, they totally out-date not just the ridiculous “60 Ka ago” mantra for the migration out-of-Africa (which we know is dated to c. 125,000 years ago in Arabia and Palestine) but also the previous estimates of c. 80,000 years ago for India (Petraglia 2007).
Guanjung Shen et al., Mass spectrometric U-series dating of Huanglong Cave in Hubei Province, central China: Evidence for early presence of modern humans in eastern Asia. Journal of Human Evolution, 2013. Freely accessible at the time of writing thisLINK [doi:10.1016/j.jhevol.2013.05.002]


Most researchers believe that anatomically modern humans (AMH) first appeared in Africa 160-190 ka ago, and would not have reached eastern Asia until ∼50 ka ago. However, the credibility of these scenarios might have been compromised by a largely inaccurate and compressed chronological framework previously established for hominin fossils found in China. Recently there has been a growing body of evidence indicating the possible presence of AMH in eastern Asia ca. 100 ka ago or even earlier. Here we report high-precision mass spectrometric U-series dating of intercalated flowstone samples from Huanglong Cave, a recently discovered Late Pleistocene hominin site in northern Hubei Province, central China. Systematic excavations there have led to the in situ discovery of seven hominin teeth and dozens of stone and bone artifacts. The U-series dates on localized thin flowstone formations bracket the hominin specimens between 81 and 101 ka, currently the most narrow time span for all AMH beyond 45 ka in China, if the assignment of the hominin teeth to modern Homo sapiens holds. Alternatively this study provides further evidence for the early presence of an AMH morphology in China, through either independent evolution of local archaic populations or their assimilation with incoming AMH. Along with recent dating results for hominin samples from Homo erectus to AMH, a new extended and continuous timeline for Chinese hominin fossils is taking shape, which warrants a reconstruction of human evolution, especially the origins of modern humans in eastern Asia.

The range of dates for the teeth is ample but the oldest one is of 102.1 ± 0.9 Ka ago. Other dates are very close to this one: 99.5 ± 2.2, 99.3 ± 1.6, 96.8 ± 1.0, etc. (see table 1), so there can be little doubt about their accuracy. 
The Huanglong teeth (various views)
Now, how solidly can these teeth be considered to belong to the species Homo sapiens? Very solidly it seems:

The seven hominin teeth from Huanglong Cave have been assigned to AMH
mainly because of their generally more advanced morphology than that of H. erectus and other archaic populations (Liu et al., 2010b),
especially in terms of the crown breath/length index. These teeth also
lack major archaic suprastructural characteristics listed by Bermúdez de Castro (1988)
for eastern Asian mid-Pleistocene hominins, such as “strong tuberculum
linguale (incisors), marked lingual inclination of the buccal face
(incisors and canines), buccal cingulum (canines and molars), wrinkling
(molars), taurodontism (molars), swelling of the buccal faces (molars)”
(Tim Compton, Personal communication). However, in their roots, these
teeth still retain a few archaic features, being more robust and
complicated than those of modern humans (Liu et al., 2010b).

Zhirendong jaw
Let’s not forget that further South in China, in Zhirendong, a “modern” jaw was found and dated to c. 100,000 years ago as well.
As for the so-called “molecular clock”:

The new timeline for human evolution in China is in disagreement with
the molecular clock that posits a late appearance for AMH in eastern
Asia (e.g., Chu et al., 1998).

… too bad for the “clock”, because a clock that doesn’t inform us of time with at least some accuracy is totally useless.

The less homogeneous European "populations" are Italians and French

This comes from a recent IBD study on Europe:
Peter Ralph & Graham Coop, The Geography of Recent Genetic Ancestry across Europe. PLoS Biology, 2013. Open accessLINK [doi:10.1371/journal.pbio.1001555] 


The recent genealogical history of human populations is a complex mosaic formed by individual migration, large-scale population movements, and other demographic events. Population genomics datasets can provide a window into this recent history, as rare traces of recent shared genetic ancestry are detectable due to long segments of shared genomic material. We make use of genomic data for 2,257 Europeans (in the Population Reference Sample [POPRES] dataset) to conduct one of the first surveys of recent genealogical ancestry over the past 3,000 years at a continental scale. We detected 1.9 million shared long genomic segments, and used the lengths of these to infer the distribution of shared ancestors across time and geography. We find that a pair of modern Europeans living in neighboring populations share around 2–12 genetic common ancestors from the last 1,500 years, and upwards of 100 genetic ancestors from the previous 1,000 years. These numbers drop off exponentially with geographic distance, but since these genetic ancestors are a tiny fraction of common genealogical ancestors, individuals from opposite ends of Europe are still expected to share millions of common genealogical ancestors over the last 1,000 years. There is also substantial regional variation in the number of shared genetic ancestors. For example, there are especially high numbers of common ancestors shared between many eastern populations that date roughly to the migration period (which includes the Slavic and Hunnic expansions into that region). Some of the lowest levels of common ancestry are seen in the Italian and Iberian peninsulas, which may indicate different effects of historical population expansions in these areas and/or more stably structured populations. Population genomic datasets have considerable power to uncover recent demographic history, and will allow a much fuller picture of the close genealogical kinship of individuals across the world.

Most interesting in my understanding is table 1 (right), which describes the IBD relation of the sampled populations within themselves and with other Europeans.
From this table it seems very apparent that Italians and French are not homogeneous at all and therefore, in my opinion, should not be treated as single populations in genetic studies but butchered at least a bit by regions (whose optimal dimensions are yet to be determined).
The degree of internal homogeneity of the samples (only n=5 or greater) can be simplified as follows:
  • Very low (<1): Italy, France.
  • Quite Low (1-1.4): Germany, UK, Belgium, England, Austria, French-Swiss, 
  • Somewhat low (1.5-1.9): Spain, German-Swiss, Greece, Portugal, Netherlands, Hungary.
  • Somewhat high (2-2.9): Czech R., Romania, Scotland, Ireland, Serbia, Croatia,
  • Quite high (3-3.9): Sweden, Poland
  • Very high (4-5): Bosnia, Russia*
  • Extremely high (>10): Albania
  • I ignored strangely labeled samples like “Switzerland” and “Yugoslavia”, which seem to mean actually “other” within these labels.  I retained the “United Kingdom” category for its large sample size, much larger than its obvious parts.
  • The level of relatedness of Russians may be exaggerated by the small sample: n=6, still above my cautionary threshold. 
  • I suspect that the extreme disparity of sample sizes may influence the results to some extent.
Eastern Europeans seem much more strongly related with others, especially other Eastern Europeans, than Western ones, while NW Europeans are more related with other groups (usually at regional level) than SW ones. In fact the Italian and Iberian peninsula show very low levels of “recent” relatedness with other populations, which is a bit perplexing, considering their non-negligible roles in Medieval and Modern European history. I guess that this may be partly caused by geographic barriers (mountains) and also by these areas having large populations since Antiquity or before. 

Figure 3. Geographic decay of recent relatedness.
In all figures, colors give categories based on the regional groupings of Table 1. (A–F) The area of the circle located on a particular population is proportional to the mean number of IBD blocks of length at least 1 cM shared between random individuals chosen from that population and the population named in the label (also marked with a star). Both regional variation of overall IBD rates and gradual geographic decay are apparent. (G–I) Mean number of IBD blocks of lengths 1–3 cM (oldest), 3–5 cM, and >5 cM (youngest), respectively, shared by a pair of individuals across all pairs of populations; the area of the point is proportional to sample size (number of distinct pairs), capped at a reasonable value; and lines show an exponential decay fit to each category (using a Poisson GLM weighted by sample size). Comparisons with no shared IBD are used in the fit but not shown in the figure (due to the log scale). “E–E,” “N–N,” and “W–W” denote any two populations both in the E, N, or W grouping, respectively; “TC-any” denotes any population paired with Turkey or Cyprus; “I-(I,E,N,W)” denotes Italy, Spain, or Portugal paired with any population except Turkey or Cyprus; and “between E,N,W” denotes the remaining pairs (when both populations are in E, N, or W, but the two are in different groups). The exponential fit for the N–N points is not shown due to the very small sample size. See Figure S8 for an SVG version of these plots where it is possible to identify individual points.
We can also see in the above figure (bottom) how most of the relatedness, especially along longer distances belongs to the oldest dates (1-3 cM).
The authors suggest that low heterogeneity within some of these groupings is influenced by regional variation, what makes good sense to me. This they illustrate with the examples of Italy and Great Britain:

Figure 2. Substructure in (A) Italian and (B) U.K. samples.
The leftmost plots of (A) show histograms of the numbers of IBD blocks that each Italian sample shares with any French-speaking Swiss (top) and anyone from the United Kingdom (bottom), overlaid with the expected distribution (Poisson) if there was no dependence between blocks. Next is shown a scatterplot of numbers of blocks shared with French-speaking Swiss and U.K. samples, for all samples from France, Italy, Greece, Turkey, and Cyprus. We see that the numbers of recent ancestors each Italian shares with the French-speaking Swiss and with the United Kingdom are both bimodal, and that these two are positively correlated, ranging continuously between values typical for Turkey/Cyprus and for France. Figure (B) is similar, showing that the substructure within the United Kingdom is part of a continuous trend ranging from Germany to Ireland. The outliers visible in the scatterplot of Figure 2B are easily explained as individuals with immigrant recent ancestors—the three outlying U.K. individuals in the lower left share many more blocks with Italians than all other U.K. samples, and the individual labeled “SK” is a clear outlier for the number of blocks shared with the Slovakian sample.
In the UK, there is a negative correlation between blocks shared with Ireland and those shared with Germany, what seems to imply a dual origin of Britons. 
Age estimates (double them?):
The authors also get to estimate ages, however it seems obvious from their own data that the results should be multiplied by 2.2 or something like that to make good sense:

Figure 4. Estimated average number of most recent genetic common ancestors per generation back through time.
Estimated average number of most recent genetic common ancestors per generation back through time shared by (A) pairs of individuals from “the Balkans” (former Yugoslavia, Bulgaria, Romania, Croatia, Bosnia, Montenegro, Macedonia, Serbia, and Slovenia, excluding Albanian speakers) and shared by one individual from the Balkans with one individual from (B) Albanian-speaking populations, (C) Italy, or (D) France. The black distribution is the maximum likelihood fit; shown in red is smoothest solution that still fits the data, as described in the Materials and Methods. (E) shows the observed IBD length distribution for pairs of individuals from the Balkans (red curve), along with the distribution predicted by the smooth (red) distribution in (A), as a stacked area plot partitioned by time period in which the common ancestor lived. The partitions with significant contribution are labeled on the left vertical axis (in generations ago), and the legend in (J) gives the same partitions, in years ago; the vertical scale is given on the right vertical axis. The second column of figures (F–J) is similar, except that comparisons are relative to samples from the United Kingdom.

I say that mainly because the shared ancestry between Balcans and both Italy and France is dated here to around 3000 or 3500 years ago, when it would fit much better to c. 7500 years ago (as much as 8000 BP for some parts of Italy), when the Neolithic expansion was ongoing. There is no particular reason why the Balcans would be related to France and Italy c. 3000 years ago specifically, unless one believes in undocumented massive Mycenaean migrations or something like that (and what about Albania then?)
However I am getting a headache with this issue because no correction, low or high seems good enough for all pairs, so, well, just take this part with your usual dose of healthy skepticism.
Some (annotated) excerpts:

In most cases, only pairs within the same population are likely to share genetic common ancestors within the last 500 years [i.e.: ~1100 years]. Exceptions are generally neighboring populations (e.g., United Kingdom and Ireland). During the period 500–1,500 ya [i.e. ~1100-3300 years ago: most of the Metal Ages], individuals typically share tens to hundreds of genetic common ancestors with others in the same or nearby populations, although some distant populations have very low rates. Longer ago than 1,500 ya [i.e. before ~3300 years ago: before the Late Bronze Age crisis], pairs of individuals from any part of Europe share hundreds of genetic ancestors in common, and some share significantly more.

On Italy:

There is relatively little common ancestry shared between the Italian peninsula and other locations, and what there is seems to derive mostly from longer ago than 2,500 ya [i.e. ~5500 y.a.: Megalithic era onwards]. An exception is that Italy and the neighboring Balkan populations share small but significant numbers of common ancestors in the last 1,500 years [i.e. after 3750 years: since the Mycenaean period]

On Iberia:

Patterns for the Iberian peninsula are similar, with both Spain and Portugal showing very few common ancestors with other populations over the last 2,500 years [i.e. 5500 years: Megalithic era onwards]. However, the rate of IBD sharing within the peninsula is much higher than within Italy… 

The low Iberian relationship with other populations seems to preclude this region as source for the conjectured re-expansion of mtDNA H and other Western lineages. I would suggest looking to (Western) France for an alternative source, as this state’s heterogeneous population shares more intense relations with other Western peoples around what could be c. 6200 BP, what is at the very beginning of Megalithic spread in Atlantic Europe, for which Armorica (Brittany and neighboring Western France) could well have been a major source (and definitely was in the case of Britain).
Of course, if you prefer to use the authors’ estimates, it would have no influence on the hypothesis because they simply can’t reach so far back in time, it seems. But I feel more comfortable overall reformulating the hypothesis towards Armorica.
For better reading of each pair of relationships through time, I include here fig. S16:

The maximum likelihood history (grey) and smoothest consistent history (red) for all pairs of population groupings of Figure S12 (including those of Figure 5). Each panel is analogous to a panel of Figure 4; time scale is given by vertical grey lines every 500 years. For these plots on a larger scale, see Figure S17.

As said before, I suggest to read each vertical grey line (counting from left) as meaning ~1100 years rather than just 500.

Update (Jun 23): on IBD-based molecular-clock-o-logy:

I have now and then found strange insistence on IBD-based chronological estimates being almost beyond reasonable doubt. I admittedly don’t know a great deal on the matter, so when Davidski (see comments) insisted again on that, I asked him for a reference, so I could learn something. He kindly suggested me to read Gusev et al. 2011, The Architecture of Long-Range Haplotypes Shared within and across Populations, which is indeed a good paper. However I could not find the clearly explained basis for the chronological estimates in general, probably buried deep in the bibliography. What I found instead was a clear example of these being short from historical reality by a lot.

This example corresponds to one of the best documented populations to have suffered a “recent” bottleneck event: Ashkenazi Jews (AJ). According to Gusev et al., these would have suffered a bottleneck (founder effect of some 400 nuclear families followed by expansion) around 20 generations ago (~600 years = 1400 CE) or, a few lines later more specifically: 23 generations ago (~1320 CE). So here we do have a clear case study.

When we look at historical reality however, it is just impossible that AJ would have their founder effect bottleneck so late. Historical records document them often already in the Frankish period and they were definitely a vibrant expanding community by the time of the founding of Prague and Krakov c. 900 CE. A historical reasonable estimate for the AJ founder effect should be instead c. 700 CE, when they begin to appear in historical records, or maybe even a bit earlier, because of the lack of documentation in the Dark Ages.

That is not at all a mere 20-23 generations ago but almost double (counting generation time = 30 years, if gen-time would be 27 years, for example, the difference between estimates and reality would be even greater). Assuming a very reasonable AJ founder effect at 700 CE, then:

  • For gen-time = 30 years → 43 generations till now → 43/23 = 1.9 times for realistic correction
  • For gen-time = 27 years → 48 generations → 48/23 = 2.1 times for realistic correction
  • For gen-time = 25 years → 52 generations → 52/23 = 2.3 times for realistc correction

While it has become nowadays standard issue to assimilate generation time to 30 years, this is not any absolute measure because the actually observed generation time (i.e. the age difference between parental and child generations on average) varies in real life depending on cultural factors (such as marriage age), gender (female generation time is almost invariably shorter than male), life expectancy (mothers dead at birth at young age, for example, don’t have any more children), etc. So it is in the fine detail a somewhat blurry issue, with some significant variability among cultures and surely also through time.

Another issue is if this “short term” estimate correction is stable along time or does in fact vary somewhat. I can’t say.

Whatever the case, the approximate x2 correction proposed above, seems to stand in general terms.


Mellars challenges the ‘early out of Africa’ model

I do not have yet access to this potentially key paper, so first of all I want to make an appeal here to share a copy with me (→ email address). Thanks in advance. Update: got it (thanks to all who shared, you people are just great!) I will review it again as soon as possible.

Update (Jun 18): complementary review of the full paper now available here.

Paul Mellars et al., Genetic and archaeological perspectives on the initial modern human colonization of southern Asia. PNAS 2013. Pay per view (6-month embargo) → LINK [doi:10.1073/pnas.1306043110]


It has been argued recently that the initial dispersal of anatomically modern humans from Africa to southern Asia occurred before the volcanic “supereruption” of the Mount Toba volcano (Sumatra) at ∼74,000 y before present (B.P.)—possibly as early as 120,000 y B.P. We show here that this “pre-Toba” dispersal model is in serious conflict with both the most recent genetic evidence from both Africa and Asia and the archaeological evidence from South Asian sites. We present an alternative model based on a combination of genetic analyses and recent archaeological evidence from South Asia and Africa. These data support a coastally oriented dispersal of modern humans from eastern Africa to southern Asia ∼60–50 thousand years ago (ka). This was associated with distinctively African microlithic and “backed-segment” technologies analogous to the African “Howiesons Poort” and related technologies, together with a range of distinctively “modern” cultural and symbolic features (highly shaped bone tools, personal ornaments, abstract artistic motifs, microblade technology, etc.), similar to those that accompanied the replacement of “archaic” Neanderthal by anatomically modern human populations in other regions of western Eurasia at a broadly similar date.

A review has been published at Live Science.

South Asian artifacts from ~30-50 Ka BP.

By “genetic evidence” they obviously mean “molecular clock” nonsense, so it is not evidence at all but mere speculation. However I am indeed very interested in knowing in detail what they mean by “archaeological evidence”, because they seem to get into direct confrontation with much accumulated evidence, first and foremost all of Petraglia’s research in both India and Arabia but also with the quite strong evidence for pre-60 Ka human presence in Australia and growing evidence for pre-60 Ka modern humans in SE Asia (in some cases even as old as 100 Ka). 
It must be said that Paul Mellars has been criticized before a lot for several reasons but very especially for his adherence to the quite speculative “modern human behavior” conjecture and, relatedly, bigotric attitudes against Neanderthal intellectual capabilities, based on nothing too solid. Therefore I’m generally skeptic about what Mellars has to say on this matter because this kind of conclusion is what one would expect from him. 
However Mellars is certainly a distinguished academic and, even if prejudiced and stuck to his own old-school and somewhat Eurocentric interpretations, he knows his trade as archaeologist and prehistorian. So he may be onto something, even if it is not exactly what he wants us to believe. 
For example, it is not impossible that this research may have, unbeknown to the authors, found evidence of a secondary OoA wave (maybe related to the spread of Y-DNA D and mtDNA N?) or even a distinctive evolution in Southern Asian technology prior to the expansion of Western Eurasia. 
It is interesting that they suggest that the 80-60/50 Ka toolkits of India would have been made by Neanderthals, when they are not describing them at all as Mousterian, the almost exclusively Neanderthal techno-culture, or Mousterian-related.
I have some difficulties judging before reading the whole study. However the supplemental material (quite extensive) is freely accessible and for what I can see there:
  1. They dedicate much text to attempt to justify a particular version of mainstream “molecular clock” hypothesis, which are clearly broke in my understanding. The kind of arguments “rebated” are more or less what I have been putting forward since many years ago. Ironically their “molecular clock” estimates make N and R much older than M, what I absolutely oppose (just count mutations downstream of the L3 node).
  2. No real attention is given instead to the geographical structure/distribution of major mtDNA haplogroups, only mentioned in relation to “molecular clock” speculations.
  3. The criticism of the African affinity of the Jwalapuram (Jurreru Valley) cores (Petraglia 2007) focuses on dismissal of any possibility of comparison, rather than on alternative comparisons. 
  4. Another “criticism” is that there is no apparent connection between Jwalapuram and the Nubian Complex (why there should be any?, it is not the only East African techno-culture, nor the only group that shows indications of traveling to Arabia in the Abbassia Pluvial).
  5. Also it is “criticized” that the most comparable African culture, Howiesons Poort) is not recorded before c. 71 Ka BP (what IMO may indicate late cultural dispersals to Southern Africa from East Africa, for example, but, hey!, Mellars is fencing off balls like crazy at his conservative goal). 
  6. They find clear similitudes between Indian and African microlithic industries (apparently related to the development of “mode 4” in both areas, as well as in West Eurasia). Indian industries are dated to c. 38-40 Ka BP, while African ones are dated to c. 49 Ka BP (Kenya) or later. However West Eurasian ones have dates as old as 55 Ka BP (not for Mellars, who remains stuck in older date references which he describes as ∼40–45 ka [calibrated (cal.) before present (B.P.)]), what really suggest that we are talking here not of the “out of Africa” but of the West Eurasian colonization process (necessarily from further into Asia, genetic phylo-geographic structure demands) with offshoots to the nearby regions. 
  7. Another element of late Africa-India “similitude” they find is “the remarkable, double bounded criss-cross design incised on ostrich eggshell”, dated in India (Patne) to at least ∼30 ka (cal. B.P.), much earlier in South Africa. For Mellars this is beyond the range of either pure coincidence or entirely independent and remarkably convergent cultural evolutionary processes. Hmmm, really? Or are we before a clear case of wishful thinking as happens with the Solutrean-Clovis relationship hypothesis? Isn’t it 30 Ka BP anyhow well beyond any reasonable expectations for the OoA time frame, including Mellar’s own conjectures?
  8. Mellars accepts the paradox that the geographical limits of these highly distinctive microblade and geometric microlithic technologies are confined to the Indian subcontinent, with no currently documented traces of these technologies in regions farther to the east. And then makes up excuses for it, such as biological and cultural bottlenecks caused by “founder effects”, mysteriously leading to a loss or simplification of cultural and technological know-how, as well as fininding new and contrasting environments (in the same latitudes?!)
  9. Even in the case of Arabian colonization, Mellars shows to be in a very defensive attitude, admitting only to the reality of the Palestinian sites with clearly modern skulls, as well as to the area of Nubian Complex colonization (on whose peculiarities he insists a lot, as if it would be the only expression of the wider MSA techno-complex), disdaining all the other MSA colonization areas and, often ill-defined, variants.
In brief, for what I could see in the supplemental material, along with some potentially interesting references to the relative cultural community spanning from East Africa to South Asia at the time of emergence of “mode 4” industries, it seems that Mellars and allies are essentially putting the cart (their models) before the horses (the facts), what is bad science. 
In 2008, Zilhao and d’Errico angrily accused Mellars of being an obsolete armchair prehistorian (different words maybe, same idea). Back in the day I was tempted to support Mellars but nowadays I must agree that he is clearly stuck in a one-sided interpretation of prehistory whose time is long gone. Whatever the case I welcome the debate and can only hope that will help to produce even more evidence to further clarify the actual facts of the Prehistory of Humankind.

Reconstructing human demographic history from IBS segments

Figure 1. An eight base-pair tract of identity by state (IBS).
Identity-by-state (IBS) segments are those located between any two SNPs (polymorphisms, letters that vary among individuals). According to this new paper, they seem to be evolutionarily neutral and therefore their length, modified by recombination events each new generation, is a good trail to reconstruct human demographic history.
Kelley Harris & Rasmus Nielsen, Inferring Demographic History from a Spectrum of Shared Haplotype Lengths. PLoS Genetics 2013. Open accessLINK [doi:10.1371/journal.pgen.1003521]


There has been much recent excitement about the use of genetics to elucidate ancestral history and demography. Whole genome data from humans and other species are revealing complex stories of divergence and admixture that were left undiscovered by previous smaller data sets. A central challenge is to estimate the timing of past admixture and divergence events, for example the time at which Neanderthals exchanged genetic material with humans and the time at which modern humans left Africa. Here, we present a method for using sequence data to jointly estimate the timing and magnitude of past admixture events, along with population divergence times and changes in effective population size. We infer demography from a collection of pairwise sequence alignments by summarizing their length distribution of tracts of identity by state (IBS) and maximizing an analytic composite likelihood derived from a Markovian coalescent approximation. Recent gene flow between populations leaves behind long tracts of identity by descent (IBD), and these tracts give our method power by influencing the distribution of shared IBS tracts. In simulated data, we accurately infer the timing and strength of admixture events, population size changes, and divergence times over a variety of ancient and recent time scales. Using the same technique, we analyze deeply sequenced trio parents from the 1000 Genomes project. The data show evidence of extensive gene flow between Africa and Europe after the time of divergence as well as substructure and gene flow among ancestral hominids. In particular, we infer that recent African-European gene flow and ancient ghost admixture into Europe are both necessary to explain the spectrum of IBS sharing in the trios, rejecting simpler models that contain less population structure.

The most interesting graph, synthesizing the result for standard HapMap European and African proxy samples is figure 7. However I have major issues with the age estimates, which seem to be half what is needed to be realistic according to archaeological and other genetic data (unlineal haplogroup history, for example). Therefore I have annotated it with a revised timeline, so it fits better with the objective data:

Figure 7. A history inferred from IBS sharing in Europeans and Yorubans.
This is the simplest history we found to satisfactorily explain IBS tract sharing in the 1000 Genomes trio data. It includes ancient ancestral population size changes, an out-of-African bottleneck in Europeans, ghost admixture into Europe from an ancestral hominid, and a long period of gene flow between the diverging populations.
(Right margin annotations by Maju).

Indeed the simplest revision of the time-scale was to double it. I guess it can be refined a bit more than that, maybe pushing it a bit further into the past, but the alternative time-scale I propose fits closely enough with known archaeological data like the time of the OoA to Arabia and Palestine or the spread of Acheulean (and therefore H. ergaster, common ancestor of Neanderthals and H. sapiens) out of Africa c. 1 Ma ago to illustrate that the reconstruction seems pretty much correct overall but fails when estimating the dates (because of scholastic-autistic academic biases that are too common in the field of human population genetics).

Update: even Dienekes agrees, on his own well documented reasoning, with a x2 mutation rate being necessary for the above graph.


Oppenheimer 2012: the scholastic ouroboros of repeating the usual ‘molecular clock’ errors

Last year Stephen Oppenheimer published yet another article on the mitochondrial DNA tree and his vision of the molecular clock applied to the human matrilineages.

Stephen Oppenheimer, Out-of-Africa, the peopling of continents and islands: tracing uniparental gene trees across the map. Philosophical Transactions of the Royal Society B, 2012. Freely accessibleLINK [doi:10.1098/rstb.2011.0306]
The centerpiece of the article is fig.2, a mtDNA tree with his “molecular clock” estimates of the ages of the haplogroups. Sadly it has a major problem: the resulting dates have a horrible fit with all the archaeological and paleoclimatic evidence and even with the most recent estimates for the Pan-Homo split. 
Much of the article (all section 1.b) is dedicated to attempt to justify his so-called “calibration” methods, which are in the end based on a self-reference: Soares 2009, of which Oppenheimer was co-author and which was calibrated assuming a Pan-Homo split age of 5-6 Ma. 
In annoyingly pointless circular reasoning, Oppenheimer manages now to estimate the  Pan-Homo split at 6.5 Ma using the Soares 2009 “molecular clock” rates.
All these Pan-Homo split age guesstimates are horribly wrong, because Sahelanthropus tchadiensis (c. 7 Ma ago) was already in the Homo line (and not anymore in the Pan one) and also because several other authors have estimated the Pan-Homo divergence age to be at least 8 Ma old, and maybe as ancient as 13 Ma (Langergraeber 2012).
Sadly the Academy remains stuck and Oppenheimer is no exception but rather the opposite. This is his fig. 2 with my rough corrections in red after proper recalibration of the Pan-Homo split age:

This does not mean that the red colored dates provided here are necessarily the correct ones, although in many cases they do seem to fit much better with the archaeological and paleoclimatic data, especially at the lower ranges. It is merely a simple “first aid” correction to Oppenheimer’s necessarily incorrect estimates. 
Other factors must be taken into account, for example I do not believe for a second that M is older than African L3 branches, which show only one or, in one case, two coding region mutations downstream of the L3 node, while M is three mutations downstream and N five. Oppenheimer seems determined to count HVS mutations for example and to estimate age counting from the present forms (which could well be frozen in time for many many millennia because of “drift out” phenomena if the population was large enough but not too large, which would tend to freeze the hegemonic lineages in my modeling tests, while removing any novel ones). 
I do not propose any alternative “molecular clock” for mtDNA because I feel that it poses way too many issues because of irregular branch length. Maybe in the future some brilliant geneticist (or maybe mathematician?) will be able to posit a reasonably good refurbished “molecular clock” for mtDNA but at the moment I know of no one. 
I’m just stating the obvious: what Oppenheimer is selling is necessarily wrong.

Posted by on May 17, 2013 in bad science, molecular clock, mtDNA


Brotherton 2013: cherry-picking the evidence for mtDNA H

Unlike the conceptually akin paper by Fu 2013 (PPV – discussed here), this one is very neatly explained and allows no doubts on how they reached their conclusions. Another thing is to agree with the method being good enough to provide for any conclusions at all. It is still an interesting study on the evolution of mtDNA lineage H in the specific context of the Elba-Saale region of Germany.
Paul Brotherton et al., Neolithic mitochondrial haplogroup H genomes and the genetic origins of Europeans. Nature Communications 2013. Pay per viewLINK. [doi:10.1038/ncomms2656]


Haplogroup H dominates present-day Western European mitochondrial DNA variability (>40%), yet was less common (~19%) among Early Neolithic farmers (~5450 BC) and virtually absent in Mesolithic hunter-gatherers. Here we investigate this major component of the maternal population history of modern Europeans and sequence 39 complete haplogroup H mitochondrial genomes from ancient human remains. We then compare this ‘real-time’ genetic data with cultural changes taking place between the Early Neolithic (~5450 BC) and Bronze Age (~2200 BC) in Central Europe. Our results reveal that the current diversity and distribution of haplogroup H were largely established by the Mid Neolithic (~4000 BC), but with substantial genetic contributions from subsequent pan-European cultures such as the Bell Beakers expanding out of Iberia in the Late Neolithic (~2800 BC). Dated haplogroup H genomes allow us to reconstruct the recent evolutionary history of haplogroup H and reveal a mutation rate 45% higher than current estimates for human mitochondria.
Let’s deal with the interesting part first and then with their impossible molecular clock speculations. 
All the samples used in this study belong to haplogroup H as you can see in table 1. This does not allow to consider the overall apportion of H in each population, for which we would need to go to the original studies. For example in the region’s LBK samples, H was just some 20% of the total, what alone talks of a population that was not at all like the modern one, never mind N1a. On the opposite side of the spectrum are the Bell Beaker (BBC) samples, where H made up 88% of the total (Adler 2012, discussed here), again non-modern but a possible source of H increase in frequency. 
We must keep in mind all the time that in this study only H is considered, with all the derived pros and cons. 
Maybe the most interesting result is therefore the comparison with modern populations done in fig. 2a:

Figure 2 | Population affinities of select Neolithic cultures. (a) PCA biplot based on the frequencies of 15 hg H sub-haplogroups (component loadings) from 37 present-dayWestern Eurasian and three ancient populations (light blue:Western Europe; dark blue: Central and Eastern Europe; orange; Near East,Caucasus and Anatolia; and pink: ancient samples). Populations are abbreviated as follows: GAL, Galicia; CNT, Cantabria; CAT, Catalonia; GAS, Galicia/Asturia; CAN, Cantabria2; POT, Potes; PAS, Pasiegos; VIZ, Vizcaya; GUI, Guipuzcoa; BMI, Basques; IPNE, Iberian Peninsula Northeast; TUR, Turkey; ARM, Armenia; GEO, Georgia; NWC, Northwest Caucasus; DAG, Dagestan; OSS, Ossetia; SYR, Syria; LBN, Lebanon; JOR, Jordan; ARB, Arabian Peninsula;ARE, Arabian Peninsula2; KBK, Karachay-Balkaria; MKD, Macedonia; VUR, Volga-Ural region; FIN, Finland; EST, Estonia; ESV, Eastern Slavs; SVK, Slovakia; FRA, France; BLK, Balkans; DEU, Germany; AUT, Austria, ROU, Romania; FRM, France Normandy; WIS, Western Isles; CZE, Czech Republic; LBK, Linear pottery culture; BBC, Bell Beaker culture; MNE, Middle Neolithic.

BBC (Bell Beaker) and LBK (Linear Pottery Culture) are clear-cut cultures in this graph. However MNE (Middle Neolithic) is a pooled agglomeration of several not too related cultures from the Late Neolithic and Early and Middle Chalcolithic. So, using the haplogroup vectors (grey), I remapped its unlikely components:

Fig. 2a annotated by Maju: green “MNE” cultures, grey: other cultures. Dotted circles just for reference.

Suddenly the mirage of modernity and homogeneity in MNE’s H collapses, very specially for Salzmünde (2/2 H3) but really also for the other components of the MNE pool: Rössen (directly derived from LBK) appears here as Balcano-Estonian and similar to Bronze Age Sardinia, Schöningen (derived from Rössen) appears Norman French and close to the original LBK pool, the first Kurgan culture in Central Europe, Baalberge, is the only one really close to the MNE dot but its closest modern relatives are NE Iberians (IPNE), while its successor Salzmünde is “hyper-Iberian” much as Bell Beaker after them – however the intermediate Corded Ware, C.W., leans back to the right and appears Catalan.
No conclusions can be inferred from this, for that we’d need to compare whole genetic pools and not just H, which is minority in most ancient samples but for whatever is worth… I made yet another annotated version of this graph:

Fig. 2a annotated by Maju: changes in Central European mtDNA H composition along time (arrows).

I considered here Rössen as different from Schöningen, as Rössen or Epi-Rössen persisted in much of Germany and nearby Alpine areas for long, but feel free to draw or imagine it differently.
Whatever the case the appearance is of gradual “modernization” or “Germanization” of haplogroup H culminating in Baalberge, followed by an “Iberization” of the haplogroup pool in the Middle and Late Chalcolithic, coincident roughly with the expansion of Megalithism and Bell Beaker and just mildly countered by Indoeuropean expansion from the East (Corded Ware, Unetice). Here they mention six Unetice H sequences but, judging on Adler 2012, H was very very rare in this culture at least in the Elbe-Saale area (1/31).
Beyond this I doubt that the paper can provide us with any more enlightenment.
It does provide for some false leads however.
The authors use this Elbe-Saale limited ancient mtDNA evidence to construct a “molecular clock”:

Another major advantage of the temporal calibration points provided by ancient hg H mt genomes is that the data allow a relatively precise estimate of the evolutionary substitution rate for human mtDNA. The temporal dependency of evolutionary rates predicts that rate estimates measured over short timespans will be considerably higher than those using deep fossil calibrations, such as the human/chimpanzee split at ~6 million years.

6 million years?! Where have you been in the last five years, Paul? Ahem…
It doesn’t really matter but it illustrates the reactionary scholastic inertia that plagues the Academia, very especially in the field of population genetics.

What matters is that they continue as follows:

(…) The rate calibrated by the Neolithic and Bronze Age sequences is 2.4 x10⁻⁸ substitutions per site per year (1.7–3.2×10⁻⁸; 95% high posterior density) for the entire mt genome, which is 1.45 (44.5%) higher than current estimates based on the traditional human/chimp split (for example, 1.66 x10 ⁻⁸ for the entire mt genome and 1.26x 10⁻⁸ for the coding region). Consequently, the calibrated ‘Neolithic’ rate infers a considerably younger coalescence date for hg H (10.9–19.1 kya) than those previously reported (19.2–21.4 kya for HVSI, 15.7–22.5 kya for the mt coding region or 14.7–22.6 kya when corrected for purifying selection).

What matters is that by cherry-picking only some sequences of ancient mtDNA H, they are denying themselves (and the rest of us by extension) a realistic calibration of the haplogroup. What happened with the Cantabrian Magdalenian and Epipaleolithic Basque H? What happened with Epipaleolithic Karelian H? Never mind Sunghir’s Gravettian H17’27 or Taforalt’s massive pool of R*-CRS, most likely H1 (Kéfi 2005), which may be more questionable but never rejected without direct negative evidence.
In other words: they are cherry-picking the evidence. They could argue that the Elbe-Saale data was the only one readily available for them to sequence in full or whatever and that therefore the evidence was cherry-picked by Destiny… but that would not justify in any case the arrogance of their conclusions: they should have been much more humble and admit that this evidence is only part of all the ancient mtDNA H (known or suspected), some of which is clearly much older and therefore much more relevant.
I illustrated this problem using their fig. 1a:

Fig. 1a, annotated by Maju.
(Note: one of the “Magdalenian” H* sequences from North Iberia is actually Epipaleolithic, my error)

In orange color I have marked an alternative minimal “molecular clock” extrapolation using the La Chora H6 sequence (Hervella 2006 open access). This is minimal because I’m assuming this sequence to be underived H6, if it’d be derived (what I don’t know), the estimate would be even larger.
I have annotated all the sequences I am aware of ancient confirmed (unquestionable) mtDNA H. There are many more that are very likely, and in many cases older (see maps), but not yet confirmed.
So well, molecular-clock-o-logical pseudoscience again. It’s a pity that otherwise respectable scientists pay tribute to this academic fetish.
The molecular clock hypothesis has never been proven, being a mere statistical construct, and it has many problems particularly in mitochondrial DNA, where branches are dramatically unequal, obeying to either: (a) randomness, (b) differential adaptive fitness or (c) ancient population dynamics (variable drift results depending on population size). I discussed some of that here and also here.

I beg here to population geneticists to be more serious and careful and not try to push their ideas against the available evidence. That is not proper of scientists but belongs to the field of ideological propaganda.

Update: La Chora Magdalenian H6 is probably H6a1, with implications for the age estimate of H.

All known H6 of Iberia and all or most of Western Europe is H6a1, while the “famous” Central Asian H6 (very minor overall) is all H6(xH6a), which is also relatively important in Eastern Europe. See Álvarez Iglesias 2009 (open access), especially Supp. Table 3. H6a(xH6a1) has only been detected so far in Austria (oversampled – I miss data from France again).

Brotherton’s H6 only sample (Corded Ware) is H6a1a. Álvarez Iglesias did not test for this phylogenetic level, hence would show in his data as H6a1 but he did test for H6a1a1, only found precisely in Cantabria.

So the La Chora H6 Magdalenian sequence can be:

  • H6(xH6a): extremely rare in Western Europe modernly
  • H6a: reported in Austria only (modern sample)
  • H6a1: most common in Western Europe and especially North Iberia
  • H6a1a: like Brotherton’s Corded Ware sequence
  • H6a1a1: found only in Cantabria modernly, it seems
  • etc. (PhyloTree allows for some other options)

I already discussed the possible age (using molecular clock theory, calibrated) of H if La Chora H6 would be H6-root. But, considering that H6b and H6c seem to be Eastern European or Central Asian, it seems more reasonable to think it is H6a or downstream of it. What would be the age range of H for the other possible assignations of La Chora’s H6, would it be tested for coding region mutations? Let’s see:

  • If H6a-root: 47,500 to 24,500 years ago (median: 36,000 BP)
  • If H6a1: 73,200 to 34,800 years ago (median: 54,000 BP)

Of course I do not really think that the molecular clock can be easily applied, if at all, to mtDNA, because the rarity of accumulating mutations poses way too many challenges. But if it had to be applied, as Brotherton, Fu, their teams and some amateurs seem to think, then we’d have to test the La Chora and La Pasiega (and Sunghir and others) for coding region mutations in order to have the most valid calibration points.

Otherwise is like the blind man who touched the trunk of an elephant and imagined it was like a snake.