Category Archives: IBD

The less homogeneous European "populations" are Italians and French

This comes from a recent IBD study on Europe:
Peter Ralph & Graham Coop, The Geography of Recent Genetic Ancestry across Europe. PLoS Biology, 2013. Open accessLINK [doi:10.1371/journal.pbio.1001555] 


The recent genealogical history of human populations is a complex mosaic formed by individual migration, large-scale population movements, and other demographic events. Population genomics datasets can provide a window into this recent history, as rare traces of recent shared genetic ancestry are detectable due to long segments of shared genomic material. We make use of genomic data for 2,257 Europeans (in the Population Reference Sample [POPRES] dataset) to conduct one of the first surveys of recent genealogical ancestry over the past 3,000 years at a continental scale. We detected 1.9 million shared long genomic segments, and used the lengths of these to infer the distribution of shared ancestors across time and geography. We find that a pair of modern Europeans living in neighboring populations share around 2–12 genetic common ancestors from the last 1,500 years, and upwards of 100 genetic ancestors from the previous 1,000 years. These numbers drop off exponentially with geographic distance, but since these genetic ancestors are a tiny fraction of common genealogical ancestors, individuals from opposite ends of Europe are still expected to share millions of common genealogical ancestors over the last 1,000 years. There is also substantial regional variation in the number of shared genetic ancestors. For example, there are especially high numbers of common ancestors shared between many eastern populations that date roughly to the migration period (which includes the Slavic and Hunnic expansions into that region). Some of the lowest levels of common ancestry are seen in the Italian and Iberian peninsulas, which may indicate different effects of historical population expansions in these areas and/or more stably structured populations. Population genomic datasets have considerable power to uncover recent demographic history, and will allow a much fuller picture of the close genealogical kinship of individuals across the world.

Most interesting in my understanding is table 1 (right), which describes the IBD relation of the sampled populations within themselves and with other Europeans.
From this table it seems very apparent that Italians and French are not homogeneous at all and therefore, in my opinion, should not be treated as single populations in genetic studies but butchered at least a bit by regions (whose optimal dimensions are yet to be determined).
The degree of internal homogeneity of the samples (only n=5 or greater) can be simplified as follows:
  • Very low (<1): Italy, France.
  • Quite Low (1-1.4): Germany, UK, Belgium, England, Austria, French-Swiss, 
  • Somewhat low (1.5-1.9): Spain, German-Swiss, Greece, Portugal, Netherlands, Hungary.
  • Somewhat high (2-2.9): Czech R., Romania, Scotland, Ireland, Serbia, Croatia,
  • Quite high (3-3.9): Sweden, Poland
  • Very high (4-5): Bosnia, Russia*
  • Extremely high (>10): Albania
  • I ignored strangely labeled samples like “Switzerland” and “Yugoslavia”, which seem to mean actually “other” within these labels.  I retained the “United Kingdom” category for its large sample size, much larger than its obvious parts.
  • The level of relatedness of Russians may be exaggerated by the small sample: n=6, still above my cautionary threshold. 
  • I suspect that the extreme disparity of sample sizes may influence the results to some extent.
Eastern Europeans seem much more strongly related with others, especially other Eastern Europeans, than Western ones, while NW Europeans are more related with other groups (usually at regional level) than SW ones. In fact the Italian and Iberian peninsula show very low levels of “recent” relatedness with other populations, which is a bit perplexing, considering their non-negligible roles in Medieval and Modern European history. I guess that this may be partly caused by geographic barriers (mountains) and also by these areas having large populations since Antiquity or before. 

Figure 3. Geographic decay of recent relatedness.
In all figures, colors give categories based on the regional groupings of Table 1. (A–F) The area of the circle located on a particular population is proportional to the mean number of IBD blocks of length at least 1 cM shared between random individuals chosen from that population and the population named in the label (also marked with a star). Both regional variation of overall IBD rates and gradual geographic decay are apparent. (G–I) Mean number of IBD blocks of lengths 1–3 cM (oldest), 3–5 cM, and >5 cM (youngest), respectively, shared by a pair of individuals across all pairs of populations; the area of the point is proportional to sample size (number of distinct pairs), capped at a reasonable value; and lines show an exponential decay fit to each category (using a Poisson GLM weighted by sample size). Comparisons with no shared IBD are used in the fit but not shown in the figure (due to the log scale). “E–E,” “N–N,” and “W–W” denote any two populations both in the E, N, or W grouping, respectively; “TC-any” denotes any population paired with Turkey or Cyprus; “I-(I,E,N,W)” denotes Italy, Spain, or Portugal paired with any population except Turkey or Cyprus; and “between E,N,W” denotes the remaining pairs (when both populations are in E, N, or W, but the two are in different groups). The exponential fit for the N–N points is not shown due to the very small sample size. See Figure S8 for an SVG version of these plots where it is possible to identify individual points.
We can also see in the above figure (bottom) how most of the relatedness, especially along longer distances belongs to the oldest dates (1-3 cM).
The authors suggest that low heterogeneity within some of these groupings is influenced by regional variation, what makes good sense to me. This they illustrate with the examples of Italy and Great Britain:

Figure 2. Substructure in (A) Italian and (B) U.K. samples.
The leftmost plots of (A) show histograms of the numbers of IBD blocks that each Italian sample shares with any French-speaking Swiss (top) and anyone from the United Kingdom (bottom), overlaid with the expected distribution (Poisson) if there was no dependence between blocks. Next is shown a scatterplot of numbers of blocks shared with French-speaking Swiss and U.K. samples, for all samples from France, Italy, Greece, Turkey, and Cyprus. We see that the numbers of recent ancestors each Italian shares with the French-speaking Swiss and with the United Kingdom are both bimodal, and that these two are positively correlated, ranging continuously between values typical for Turkey/Cyprus and for France. Figure (B) is similar, showing that the substructure within the United Kingdom is part of a continuous trend ranging from Germany to Ireland. The outliers visible in the scatterplot of Figure 2B are easily explained as individuals with immigrant recent ancestors—the three outlying U.K. individuals in the lower left share many more blocks with Italians than all other U.K. samples, and the individual labeled “SK” is a clear outlier for the number of blocks shared with the Slovakian sample.
In the UK, there is a negative correlation between blocks shared with Ireland and those shared with Germany, what seems to imply a dual origin of Britons. 
Age estimates (double them?):
The authors also get to estimate ages, however it seems obvious from their own data that the results should be multiplied by 2.2 or something like that to make good sense:

Figure 4. Estimated average number of most recent genetic common ancestors per generation back through time.
Estimated average number of most recent genetic common ancestors per generation back through time shared by (A) pairs of individuals from “the Balkans” (former Yugoslavia, Bulgaria, Romania, Croatia, Bosnia, Montenegro, Macedonia, Serbia, and Slovenia, excluding Albanian speakers) and shared by one individual from the Balkans with one individual from (B) Albanian-speaking populations, (C) Italy, or (D) France. The black distribution is the maximum likelihood fit; shown in red is smoothest solution that still fits the data, as described in the Materials and Methods. (E) shows the observed IBD length distribution for pairs of individuals from the Balkans (red curve), along with the distribution predicted by the smooth (red) distribution in (A), as a stacked area plot partitioned by time period in which the common ancestor lived. The partitions with significant contribution are labeled on the left vertical axis (in generations ago), and the legend in (J) gives the same partitions, in years ago; the vertical scale is given on the right vertical axis. The second column of figures (F–J) is similar, except that comparisons are relative to samples from the United Kingdom.

I say that mainly because the shared ancestry between Balcans and both Italy and France is dated here to around 3000 or 3500 years ago, when it would fit much better to c. 7500 years ago (as much as 8000 BP for some parts of Italy), when the Neolithic expansion was ongoing. There is no particular reason why the Balcans would be related to France and Italy c. 3000 years ago specifically, unless one believes in undocumented massive Mycenaean migrations or something like that (and what about Albania then?)
However I am getting a headache with this issue because no correction, low or high seems good enough for all pairs, so, well, just take this part with your usual dose of healthy skepticism.
Some (annotated) excerpts:

In most cases, only pairs within the same population are likely to share genetic common ancestors within the last 500 years [i.e.: ~1100 years]. Exceptions are generally neighboring populations (e.g., United Kingdom and Ireland). During the period 500–1,500 ya [i.e. ~1100-3300 years ago: most of the Metal Ages], individuals typically share tens to hundreds of genetic common ancestors with others in the same or nearby populations, although some distant populations have very low rates. Longer ago than 1,500 ya [i.e. before ~3300 years ago: before the Late Bronze Age crisis], pairs of individuals from any part of Europe share hundreds of genetic ancestors in common, and some share significantly more.

On Italy:

There is relatively little common ancestry shared between the Italian peninsula and other locations, and what there is seems to derive mostly from longer ago than 2,500 ya [i.e. ~5500 y.a.: Megalithic era onwards]. An exception is that Italy and the neighboring Balkan populations share small but significant numbers of common ancestors in the last 1,500 years [i.e. after 3750 years: since the Mycenaean period]

On Iberia:

Patterns for the Iberian peninsula are similar, with both Spain and Portugal showing very few common ancestors with other populations over the last 2,500 years [i.e. 5500 years: Megalithic era onwards]. However, the rate of IBD sharing within the peninsula is much higher than within Italy… 

The low Iberian relationship with other populations seems to preclude this region as source for the conjectured re-expansion of mtDNA H and other Western lineages. I would suggest looking to (Western) France for an alternative source, as this state’s heterogeneous population shares more intense relations with other Western peoples around what could be c. 6200 BP, what is at the very beginning of Megalithic spread in Atlantic Europe, for which Armorica (Brittany and neighboring Western France) could well have been a major source (and definitely was in the case of Britain).
Of course, if you prefer to use the authors’ estimates, it would have no influence on the hypothesis because they simply can’t reach so far back in time, it seems. But I feel more comfortable overall reformulating the hypothesis towards Armorica.
For better reading of each pair of relationships through time, I include here fig. S16:

The maximum likelihood history (grey) and smoothest consistent history (red) for all pairs of population groupings of Figure S12 (including those of Figure 5). Each panel is analogous to a panel of Figure 4; time scale is given by vertical grey lines every 500 years. For these plots on a larger scale, see Figure S17.

As said before, I suggest to read each vertical grey line (counting from left) as meaning ~1100 years rather than just 500.

Update (Jun 23): on IBD-based molecular-clock-o-logy:

I have now and then found strange insistence on IBD-based chronological estimates being almost beyond reasonable doubt. I admittedly don’t know a great deal on the matter, so when Davidski (see comments) insisted again on that, I asked him for a reference, so I could learn something. He kindly suggested me to read Gusev et al. 2011, The Architecture of Long-Range Haplotypes Shared within and across Populations, which is indeed a good paper. However I could not find the clearly explained basis for the chronological estimates in general, probably buried deep in the bibliography. What I found instead was a clear example of these being short from historical reality by a lot.

This example corresponds to one of the best documented populations to have suffered a “recent” bottleneck event: Ashkenazi Jews (AJ). According to Gusev et al., these would have suffered a bottleneck (founder effect of some 400 nuclear families followed by expansion) around 20 generations ago (~600 years = 1400 CE) or, a few lines later more specifically: 23 generations ago (~1320 CE). So here we do have a clear case study.

When we look at historical reality however, it is just impossible that AJ would have their founder effect bottleneck so late. Historical records document them often already in the Frankish period and they were definitely a vibrant expanding community by the time of the founding of Prague and Krakov c. 900 CE. A historical reasonable estimate for the AJ founder effect should be instead c. 700 CE, when they begin to appear in historical records, or maybe even a bit earlier, because of the lack of documentation in the Dark Ages.

That is not at all a mere 20-23 generations ago but almost double (counting generation time = 30 years, if gen-time would be 27 years, for example, the difference between estimates and reality would be even greater). Assuming a very reasonable AJ founder effect at 700 CE, then:

  • For gen-time = 30 years → 43 generations till now → 43/23 = 1.9 times for realistic correction
  • For gen-time = 27 years → 48 generations → 48/23 = 2.1 times for realistic correction
  • For gen-time = 25 years → 52 generations → 52/23 = 2.3 times for realistc correction

While it has become nowadays standard issue to assimilate generation time to 30 years, this is not any absolute measure because the actually observed generation time (i.e. the age difference between parental and child generations on average) varies in real life depending on cultural factors (such as marriage age), gender (female generation time is almost invariably shorter than male), life expectancy (mothers dead at birth at young age, for example, don’t have any more children), etc. So it is in the fine detail a somewhat blurry issue, with some significant variability among cultures and surely also through time.

Another issue is if this “short term” estimate correction is stable along time or does in fact vary somewhat. I can’t say.

Whatever the case, the approximate x2 correction proposed above, seems to stand in general terms.