The recent genealogical history of human populations is a complex mosaic formed by individual migration, large-scale population movements, and other demographic events. Population genomics datasets can provide a window into this recent history, as rare traces of recent shared genetic ancestry are detectable due to long segments of shared genomic material. We make use of genomic data for 2,257 Europeans (in the Population Reference Sample [POPRES] dataset) to conduct one of the first surveys of recent genealogical ancestry over the past 3,000 years at a continental scale. We detected 1.9 million shared long genomic segments, and used the lengths of these to infer the distribution of shared ancestors across time and geography. We find that a pair of modern Europeans living in neighboring populations share around 2–12 genetic common ancestors from the last 1,500 years, and upwards of 100 genetic ancestors from the previous 1,000 years. These numbers drop off exponentially with geographic distance, but since these genetic ancestors are a tiny fraction of common genealogical ancestors, individuals from opposite ends of Europe are still expected to share millions of common genealogical ancestors over the last 1,000 years. There is also substantial regional variation in the number of shared genetic ancestors. For example, there are especially high numbers of common ancestors shared between many eastern populations that date roughly to the migration period (which includes the Slavic and Hunnic expansions into that region). Some of the lowest levels of common ancestry are seen in the Italian and Iberian peninsulas, which may indicate different effects of historical population expansions in these areas and/or more stably structured populations. Population genomic datasets have considerable power to uncover recent demographic history, and will allow a much fuller picture of the close genealogical kinship of individuals across the world.
- Very low (<1): Italy, France.
- Quite Low (1-1.4): Germany, UK, Belgium, England, Austria, French-Swiss,
- Somewhat low (1.5-1.9): Spain, German-Swiss, Greece, Portugal, Netherlands, Hungary.
- Somewhat high (2-2.9): Czech R., Romania, Scotland, Ireland, Serbia, Croatia,
- Quite high (3-3.9): Sweden, Poland
- Very high (4-5): Bosnia, Russia*
- Extremely high (>10): Albania
- I ignored strangely labeled samples like “Switzerland” and “Yugoslavia”, which seem to mean actually “other” within these labels. I retained the “United Kingdom” category for its large sample size, much larger than its obvious parts.
- The level of relatedness of Russians may be exaggerated by the small sample: n=6, still above my cautionary threshold.
- I suspect that the extreme disparity of sample sizes may influence the results to some extent.
|Figure 3. Geographic decay of recent relatedness.
In all figures, colors give categories based on the regional groupings of Table 1. (A–F) The area of the circle located on a particular population is proportional to the mean number of IBD blocks of length at least 1 cM shared between random individuals chosen from that population and the population named in the label (also marked with a star). Both regional variation of overall IBD rates and gradual geographic decay are apparent. (G–I) Mean number of IBD blocks of lengths 1–3 cM (oldest), 3–5 cM, and >5 cM (youngest), respectively, shared by a pair of individuals across all pairs of populations; the area of the point is proportional to sample size (number of distinct pairs), capped at a reasonable value; and lines show an exponential decay fit to each category (using a Poisson GLM weighted by sample size). Comparisons with no shared IBD are used in the fit but not shown in the figure (due to the log scale). “E–E,” “N–N,” and “W–W” denote any two populations both in the E, N, or W grouping, respectively; “TC-any” denotes any population paired with Turkey or Cyprus; “I-(I,E,N,W)” denotes Italy, Spain, or Portugal paired with any population except Turkey or Cyprus; and “between E,N,W” denotes the remaining pairs (when both populations are in E, N, or W, but the two are in different groups). The exponential fit for the N–N points is not shown due to the very small sample size. See Figure S8 for an SVG version of these plots where it is possible to identify individual points.
|Figure 2. Substructure in (A) Italian and (B) U.K. samples.
The leftmost plots of (A) show histograms of the numbers of IBD blocks that each Italian sample shares with any French-speaking Swiss (top) and anyone from the United Kingdom (bottom), overlaid with the expected distribution (Poisson) if there was no dependence between blocks. Next is shown a scatterplot of numbers of blocks shared with French-speaking Swiss and U.K. samples, for all samples from France, Italy, Greece, Turkey, and Cyprus. We see that the numbers of recent ancestors each Italian shares with the French-speaking Swiss and with the United Kingdom are both bimodal, and that these two are positively correlated, ranging continuously between values typical for Turkey/Cyprus and for France. Figure (B) is similar, showing that the substructure within the United Kingdom is part of a continuous trend ranging from Germany to Ireland. The outliers visible in the scatterplot of Figure 2B are easily explained as individuals with immigrant recent ancestors—the three outlying U.K. individuals in the lower left share many more blocks with Italians than all other U.K. samples, and the individual labeled “SK” is a clear outlier for the number of blocks shared with the Slovakian sample.
|Figure 4. Estimated average number of most recent genetic common ancestors per generation back through time.
Estimated average number of most recent genetic common ancestors per generation back through time shared by (A) pairs of individuals from “the Balkans” (former Yugoslavia, Bulgaria, Romania, Croatia, Bosnia, Montenegro, Macedonia, Serbia, and Slovenia, excluding Albanian speakers) and shared by one individual from the Balkans with one individual from (B) Albanian-speaking populations, (C) Italy, or (D) France. The black distribution is the maximum likelihood fit; shown in red is smoothest solution that still fits the data, as described in the Materials and Methods. (E) shows the observed IBD length distribution for pairs of individuals from the Balkans (red curve), along with the distribution predicted by the smooth (red) distribution in (A), as a stacked area plot partitioned by time period in which the common ancestor lived. The partitions with significant contribution are labeled on the left vertical axis (in generations ago), and the legend in (J) gives the same partitions, in years ago; the vertical scale is given on the right vertical axis. The second column of figures (F–J) is similar, except that comparisons are relative to samples from the United Kingdom.
In most cases, only pairs within the same population are likely to share genetic common ancestors within the last 500 years [i.e.: ~1100 years]. Exceptions are generally neighboring populations (e.g., United Kingdom and Ireland). During the period 500–1,500 ya [i.e. ~1100-3300 years ago: most of the Metal Ages], individuals typically share tens to hundreds of genetic common ancestors with others in the same or nearby populations, although some distant populations have very low rates. Longer ago than 1,500 ya [i.e. before ~3300 years ago: before the Late Bronze Age crisis], pairs of individuals from any part of Europe share hundreds of genetic ancestors in common, and some share significantly more.
There is relatively little common ancestry shared between the Italian peninsula and other locations, and what there is seems to derive mostly from longer ago than 2,500 ya [i.e. ~5500 y.a.: Megalithic era onwards]. An exception is that Italy and the neighboring Balkan populations share small but significant numbers of common ancestors in the last 1,500 years [i.e. after 3750 years: since the Mycenaean period] …
Patterns for the Iberian peninsula are similar, with both Spain and Portugal showing very few common ancestors with other populations over the last 2,500 years [i.e. 5500 years: Megalithic era onwards]. However, the rate of IBD sharing within the peninsula is much higher than within Italy…
The maximum likelihood history (grey) and smoothest consistent history (red) for all pairs of population groupings of Figure S12 (including those of Figure 5). Each panel is analogous to a panel of Figure 4; time scale is given by vertical grey lines every 500 years. For these plots on a larger scale, see Figure S17.
Update (Jun 23): on IBD-based molecular-clock-o-logy:
I have now and then found strange insistence on IBD-based chronological estimates being almost beyond reasonable doubt. I admittedly don’t know a great deal on the matter, so when Davidski (see comments) insisted again on that, I asked him for a reference, so I could learn something. He kindly suggested me to read Gusev et al. 2011, The Architecture of Long-Range Haplotypes Shared within and across Populations, which is indeed a good paper. However I could not find the clearly explained basis for the chronological estimates in general, probably buried deep in the bibliography. What I found instead was a clear example of these being short from historical reality by a lot.
This example corresponds to one of the best documented populations to have suffered a “recent” bottleneck event: Ashkenazi Jews (AJ). According to Gusev et al., these would have suffered a bottleneck (founder effect of some 400 nuclear families followed by expansion) around 20 generations ago (~600 years = 1400 CE) or, a few lines later more specifically: 23 generations ago (~1320 CE). So here we do have a clear case study.
When we look at historical reality however, it is just impossible that AJ would have their founder effect bottleneck so late. Historical records document them often already in the Frankish period and they were definitely a vibrant expanding community by the time of the founding of Prague and Krakov c. 900 CE. A historical reasonable estimate for the AJ founder effect should be instead c. 700 CE, when they begin to appear in historical records, or maybe even a bit earlier, because of the lack of documentation in the Dark Ages.
That is not at all a mere 20-23 generations ago but almost double (counting generation time = 30 years, if gen-time would be 27 years, for example, the difference between estimates and reality would be even greater). Assuming a very reasonable AJ founder effect at 700 CE, then:
- For gen-time = 30 years → 43 generations till now → 43/23 = 1.9 times for realistic correction
- For gen-time = 27 years → 48 generations → 48/23 = 2.1 times for realistic correction
- For gen-time = 25 years → 52 generations → 52/23 = 2.3 times for realistc correction
While it has become nowadays standard issue to assimilate generation time to 30 years, this is not any absolute measure because the actually observed generation time (i.e. the age difference between parental and child generations on average) varies in real life depending on cultural factors (such as marriage age), gender (female generation time is almost invariably shorter than male), life expectancy (mothers dead at birth at young age, for example, don’t have any more children), etc. So it is in the fine detail a somewhat blurry issue, with some significant variability among cultures and surely also through time.
Another issue is if this “short term” estimate correction is stable along time or does in fact vary somewhat. I can’t say.
Whatever the case, the approximate x2 correction proposed above, seems to stand in general terms.