The less homogeneous European "populations" are Italians and French

22 Jun
This comes from a recent IBD study on Europe:
Peter Ralph & Graham Coop, The Geography of Recent Genetic Ancestry across Europe. PLoS Biology, 2013. Open accessLINK [doi:10.1371/journal.pbio.1001555] 


The recent genealogical history of human populations is a complex mosaic formed by individual migration, large-scale population movements, and other demographic events. Population genomics datasets can provide a window into this recent history, as rare traces of recent shared genetic ancestry are detectable due to long segments of shared genomic material. We make use of genomic data for 2,257 Europeans (in the Population Reference Sample [POPRES] dataset) to conduct one of the first surveys of recent genealogical ancestry over the past 3,000 years at a continental scale. We detected 1.9 million shared long genomic segments, and used the lengths of these to infer the distribution of shared ancestors across time and geography. We find that a pair of modern Europeans living in neighboring populations share around 2–12 genetic common ancestors from the last 1,500 years, and upwards of 100 genetic ancestors from the previous 1,000 years. These numbers drop off exponentially with geographic distance, but since these genetic ancestors are a tiny fraction of common genealogical ancestors, individuals from opposite ends of Europe are still expected to share millions of common genealogical ancestors over the last 1,000 years. There is also substantial regional variation in the number of shared genetic ancestors. For example, there are especially high numbers of common ancestors shared between many eastern populations that date roughly to the migration period (which includes the Slavic and Hunnic expansions into that region). Some of the lowest levels of common ancestry are seen in the Italian and Iberian peninsulas, which may indicate different effects of historical population expansions in these areas and/or more stably structured populations. Population genomic datasets have considerable power to uncover recent demographic history, and will allow a much fuller picture of the close genealogical kinship of individuals across the world.

Most interesting in my understanding is table 1 (right), which describes the IBD relation of the sampled populations within themselves and with other Europeans.
From this table it seems very apparent that Italians and French are not homogeneous at all and therefore, in my opinion, should not be treated as single populations in genetic studies but butchered at least a bit by regions (whose optimal dimensions are yet to be determined).
The degree of internal homogeneity of the samples (only n=5 or greater) can be simplified as follows:
  • Very low (<1): Italy, France.
  • Quite Low (1-1.4): Germany, UK, Belgium, England, Austria, French-Swiss, 
  • Somewhat low (1.5-1.9): Spain, German-Swiss, Greece, Portugal, Netherlands, Hungary.
  • Somewhat high (2-2.9): Czech R., Romania, Scotland, Ireland, Serbia, Croatia,
  • Quite high (3-3.9): Sweden, Poland
  • Very high (4-5): Bosnia, Russia*
  • Extremely high (>10): Albania
  • I ignored strangely labeled samples like “Switzerland” and “Yugoslavia”, which seem to mean actually “other” within these labels.  I retained the “United Kingdom” category for its large sample size, much larger than its obvious parts.
  • The level of relatedness of Russians may be exaggerated by the small sample: n=6, still above my cautionary threshold. 
  • I suspect that the extreme disparity of sample sizes may influence the results to some extent.
Eastern Europeans seem much more strongly related with others, especially other Eastern Europeans, than Western ones, while NW Europeans are more related with other groups (usually at regional level) than SW ones. In fact the Italian and Iberian peninsula show very low levels of “recent” relatedness with other populations, which is a bit perplexing, considering their non-negligible roles in Medieval and Modern European history. I guess that this may be partly caused by geographic barriers (mountains) and also by these areas having large populations since Antiquity or before. 

Figure 3. Geographic decay of recent relatedness.
In all figures, colors give categories based on the regional groupings of Table 1. (A–F) The area of the circle located on a particular population is proportional to the mean number of IBD blocks of length at least 1 cM shared between random individuals chosen from that population and the population named in the label (also marked with a star). Both regional variation of overall IBD rates and gradual geographic decay are apparent. (G–I) Mean number of IBD blocks of lengths 1–3 cM (oldest), 3–5 cM, and >5 cM (youngest), respectively, shared by a pair of individuals across all pairs of populations; the area of the point is proportional to sample size (number of distinct pairs), capped at a reasonable value; and lines show an exponential decay fit to each category (using a Poisson GLM weighted by sample size). Comparisons with no shared IBD are used in the fit but not shown in the figure (due to the log scale). “E–E,” “N–N,” and “W–W” denote any two populations both in the E, N, or W grouping, respectively; “TC-any” denotes any population paired with Turkey or Cyprus; “I-(I,E,N,W)” denotes Italy, Spain, or Portugal paired with any population except Turkey or Cyprus; and “between E,N,W” denotes the remaining pairs (when both populations are in E, N, or W, but the two are in different groups). The exponential fit for the N–N points is not shown due to the very small sample size. See Figure S8 for an SVG version of these plots where it is possible to identify individual points.
We can also see in the above figure (bottom) how most of the relatedness, especially along longer distances belongs to the oldest dates (1-3 cM).
The authors suggest that low heterogeneity within some of these groupings is influenced by regional variation, what makes good sense to me. This they illustrate with the examples of Italy and Great Britain:

Figure 2. Substructure in (A) Italian and (B) U.K. samples.
The leftmost plots of (A) show histograms of the numbers of IBD blocks that each Italian sample shares with any French-speaking Swiss (top) and anyone from the United Kingdom (bottom), overlaid with the expected distribution (Poisson) if there was no dependence between blocks. Next is shown a scatterplot of numbers of blocks shared with French-speaking Swiss and U.K. samples, for all samples from France, Italy, Greece, Turkey, and Cyprus. We see that the numbers of recent ancestors each Italian shares with the French-speaking Swiss and with the United Kingdom are both bimodal, and that these two are positively correlated, ranging continuously between values typical for Turkey/Cyprus and for France. Figure (B) is similar, showing that the substructure within the United Kingdom is part of a continuous trend ranging from Germany to Ireland. The outliers visible in the scatterplot of Figure 2B are easily explained as individuals with immigrant recent ancestors—the three outlying U.K. individuals in the lower left share many more blocks with Italians than all other U.K. samples, and the individual labeled “SK” is a clear outlier for the number of blocks shared with the Slovakian sample.
In the UK, there is a negative correlation between blocks shared with Ireland and those shared with Germany, what seems to imply a dual origin of Britons. 
Age estimates (double them?):
The authors also get to estimate ages, however it seems obvious from their own data that the results should be multiplied by 2.2 or something like that to make good sense:

Figure 4. Estimated average number of most recent genetic common ancestors per generation back through time.
Estimated average number of most recent genetic common ancestors per generation back through time shared by (A) pairs of individuals from “the Balkans” (former Yugoslavia, Bulgaria, Romania, Croatia, Bosnia, Montenegro, Macedonia, Serbia, and Slovenia, excluding Albanian speakers) and shared by one individual from the Balkans with one individual from (B) Albanian-speaking populations, (C) Italy, or (D) France. The black distribution is the maximum likelihood fit; shown in red is smoothest solution that still fits the data, as described in the Materials and Methods. (E) shows the observed IBD length distribution for pairs of individuals from the Balkans (red curve), along with the distribution predicted by the smooth (red) distribution in (A), as a stacked area plot partitioned by time period in which the common ancestor lived. The partitions with significant contribution are labeled on the left vertical axis (in generations ago), and the legend in (J) gives the same partitions, in years ago; the vertical scale is given on the right vertical axis. The second column of figures (F–J) is similar, except that comparisons are relative to samples from the United Kingdom.

I say that mainly because the shared ancestry between Balcans and both Italy and France is dated here to around 3000 or 3500 years ago, when it would fit much better to c. 7500 years ago (as much as 8000 BP for some parts of Italy), when the Neolithic expansion was ongoing. There is no particular reason why the Balcans would be related to France and Italy c. 3000 years ago specifically, unless one believes in undocumented massive Mycenaean migrations or something like that (and what about Albania then?)
However I am getting a headache with this issue because no correction, low or high seems good enough for all pairs, so, well, just take this part with your usual dose of healthy skepticism.
Some (annotated) excerpts:

In most cases, only pairs within the same population are likely to share genetic common ancestors within the last 500 years [i.e.: ~1100 years]. Exceptions are generally neighboring populations (e.g., United Kingdom and Ireland). During the period 500–1,500 ya [i.e. ~1100-3300 years ago: most of the Metal Ages], individuals typically share tens to hundreds of genetic common ancestors with others in the same or nearby populations, although some distant populations have very low rates. Longer ago than 1,500 ya [i.e. before ~3300 years ago: before the Late Bronze Age crisis], pairs of individuals from any part of Europe share hundreds of genetic ancestors in common, and some share significantly more.

On Italy:

There is relatively little common ancestry shared between the Italian peninsula and other locations, and what there is seems to derive mostly from longer ago than 2,500 ya [i.e. ~5500 y.a.: Megalithic era onwards]. An exception is that Italy and the neighboring Balkan populations share small but significant numbers of common ancestors in the last 1,500 years [i.e. after 3750 years: since the Mycenaean period]

On Iberia:

Patterns for the Iberian peninsula are similar, with both Spain and Portugal showing very few common ancestors with other populations over the last 2,500 years [i.e. 5500 years: Megalithic era onwards]. However, the rate of IBD sharing within the peninsula is much higher than within Italy… 

The low Iberian relationship with other populations seems to preclude this region as source for the conjectured re-expansion of mtDNA H and other Western lineages. I would suggest looking to (Western) France for an alternative source, as this state’s heterogeneous population shares more intense relations with other Western peoples around what could be c. 6200 BP, what is at the very beginning of Megalithic spread in Atlantic Europe, for which Armorica (Brittany and neighboring Western France) could well have been a major source (and definitely was in the case of Britain).
Of course, if you prefer to use the authors’ estimates, it would have no influence on the hypothesis because they simply can’t reach so far back in time, it seems. But I feel more comfortable overall reformulating the hypothesis towards Armorica.
For better reading of each pair of relationships through time, I include here fig. S16:

The maximum likelihood history (grey) and smoothest consistent history (red) for all pairs of population groupings of Figure S12 (including those of Figure 5). Each panel is analogous to a panel of Figure 4; time scale is given by vertical grey lines every 500 years. For these plots on a larger scale, see Figure S17.

As said before, I suggest to read each vertical grey line (counting from left) as meaning ~1100 years rather than just 500.

Update (Jun 23): on IBD-based molecular-clock-o-logy:

I have now and then found strange insistence on IBD-based chronological estimates being almost beyond reasonable doubt. I admittedly don’t know a great deal on the matter, so when Davidski (see comments) insisted again on that, I asked him for a reference, so I could learn something. He kindly suggested me to read Gusev et al. 2011, The Architecture of Long-Range Haplotypes Shared within and across Populations, which is indeed a good paper. However I could not find the clearly explained basis for the chronological estimates in general, probably buried deep in the bibliography. What I found instead was a clear example of these being short from historical reality by a lot.

This example corresponds to one of the best documented populations to have suffered a “recent” bottleneck event: Ashkenazi Jews (AJ). According to Gusev et al., these would have suffered a bottleneck (founder effect of some 400 nuclear families followed by expansion) around 20 generations ago (~600 years = 1400 CE) or, a few lines later more specifically: 23 generations ago (~1320 CE). So here we do have a clear case study.

When we look at historical reality however, it is just impossible that AJ would have their founder effect bottleneck so late. Historical records document them often already in the Frankish period and they were definitely a vibrant expanding community by the time of the founding of Prague and Krakov c. 900 CE. A historical reasonable estimate for the AJ founder effect should be instead c. 700 CE, when they begin to appear in historical records, or maybe even a bit earlier, because of the lack of documentation in the Dark Ages.

That is not at all a mere 20-23 generations ago but almost double (counting generation time = 30 years, if gen-time would be 27 years, for example, the difference between estimates and reality would be even greater). Assuming a very reasonable AJ founder effect at 700 CE, then:

  • For gen-time = 30 years → 43 generations till now → 43/23 = 1.9 times for realistic correction
  • For gen-time = 27 years → 48 generations → 48/23 = 2.1 times for realistic correction
  • For gen-time = 25 years → 52 generations → 52/23 = 2.3 times for realistc correction

While it has become nowadays standard issue to assimilate generation time to 30 years, this is not any absolute measure because the actually observed generation time (i.e. the age difference between parental and child generations on average) varies in real life depending on cultural factors (such as marriage age), gender (female generation time is almost invariably shorter than male), life expectancy (mothers dead at birth at young age, for example, don’t have any more children), etc. So it is in the fine detail a somewhat blurry issue, with some significant variability among cultures and surely also through time.

Another issue is if this “short term” estimate correction is stable along time or does in fact vary somewhat. I can’t say.

Whatever the case, the approximate x2 correction proposed above, seems to stand in general terms.


66 responses to “The less homogeneous European "populations" are Italians and French

  1. jackson_montgomery_devoni

    June 27, 2013 at 8:46 pm

    I think this may be a good example of how a founder effect or bottleneck in a population can effect IBD results. The bubbles on this map that I am going to post represent different IBD segment sizes from an analysis that Davidski did for members of his Eurogenes Project last year. This map represents my own results from this analysis. I am 25% Italian, 25% Finnish and 50% Irish/British by known ancestry. Even though I am only 25% Finnish look how big the bubbles over Finland are for my results especially down at the 1cM level. Bubbles: 1cM+Red Bubbles: 2cM+Yellow Bubbles: 3cM+Green Bubbles: 4cM+Blue Bubbles: 5cM+Purple Bubbles: 6cM+

  2. Maju

    June 27, 2013 at 9:05 pm

    Yes, that's very impressive. It really makes me think that IBD is not really working as it should. Maybe it's because of what Anders says above but the results are sometimes very weird.

  3. jackson_montgomery_devoni

    June 27, 2013 at 10:14 pm

    Finns were sampled in this Ralph and Coop study correct? I see Finland in the one table but they do not seem to mention Finns much throughout the study. It seems rather clear to me based on my own results that Finns went through at least one bottleneck.

  4. Maju

    June 27, 2013 at 10:52 pm

    There was recently another (not IBD but quite comprehensive) study of Finno-Ugric peoples which showed that they show clear and intense endogamy signatures, what really distorts a lot when comparing autosomal data because they tend to form clusters fast and group the more cosmopolitan populations around them. Where Germans and Italians have mean ROH (endogamy) scores of ~0.2, and Poles, Latvians and Estonians 0.5 to 0.6, Helsinki Finns have 1.1, while the other Finnic peoples (Komi, Kuusamo Finns) score between 1.7 and 2.7. More than bottlenecks (also possible) what I would consider as key here are very low population numbers through most of their history, what resulted in a continuous and unavoidable endogamy.

  5. mikej2

    June 28, 2013 at 1:53 pm

    This study has however low quality, like some other studies too. It uses a village data representing about 0.3% of the Finnish population. I guess they use similar poor data for other FU-people. Both PCA and admix show distorted results due to this poor sample selection. This definitely a good example how we can't make our conclusions without knowing the background history.

  6. Maju

    June 28, 2013 at 5:58 pm

    Helsinki represents much more than "0.3% of the Finnish pop." Also Kuusamo was settled by Finns relatively recently (originally it was Saami land), so maybe represents a wider ancestral pop. (?). The Komi samples are also from two different and geographically separated districts, gathering together some 4.3% of the Komi pop. The fact that Komi and Finns tend to make separate clusters all the time (Komi only up to K=5) means to me that they are fairly representative. "Both PCA and admix show distorted results due to this poor sample selection".In my understanding the distortion is caused by relative oversampling of these populations, together with their intense endogamous drift, which "colors" them more intensely than normal populations. It happens in other cases and I'd say it's "standard issue", a common problem with all statistical tools for nDNA analysis. Either you seek that (because you're focused on studying those particular oversampled peoples) or (in normal cases) you should avoid it by reducing the samples of the anomalous endogamous populations even down to zero.

  7. mikej2

    June 29, 2013 at 7:30 pm

    You are right about the oversampling, but if you make testing using carefully prepared test data, as you should do, you can repeat the effect of high internal similarity in populations under the test. I recommend you to do it. You come to see that PCA and also admix analyses in some extent are affected by the internal similarity in many cases more than similarity between populations. This means in practice that a small population with genetic drift has a very strong effect to the mother population where it comes from on PCA plots and the figure becomes strongly distorted. This happens in case of Finns from Kuusamo. According the known history and Finnish researchers who studied genetic diseases (Reijo Norio, a Finnish geneticist) they are mainly descendants of 20 families who moved to the Eastern Lapland. The unwanted effect, if you want to get neutral results, is a strong clustering of the sample data with high internal similarity and next the effect on the "mother" population. I really recommend you to do these tests. It is quite easy, you need only to check for example the internal IBS-data of each population, select suitable combinations for testing and after the test repeat by real data and see the similarity in results.

  8. Anders Pålsen

    June 29, 2013 at 7:33 pm

    IBD works but one have to take into consideration the population charactaristics like f.ex Native Americans.

  9. Maju

    June 29, 2013 at 8:56 pm

    Other studies in the past have also given strong personality to Finns (surely from Helsinki) and they tend to pull other populations towards their cluster. For example in Bauchet 2007 (and many other similar studies from even the times of Cavalli-Sforza): West Eurasians in full first diverged into Finnic-like and West Asian-like, what does not make any sense after we understand that Finns are highly anomalous population: it is an artifact caused by their isolation and endogamy. Playing with Ashkenazi and Moroccan Jews in ADMIXTURE I got similar hyper-distorting effects (having to use only Sephardim), never mind Henn's horrible Tunisian Berber sample, which she later declared useless for comparisons precisely for their extremely high homozygosity, and even the Hadza are problematic in the same way (and I'd dare say that also the Maasai of HapMap MKK sample). The best way to make Admixture or other statistical analysis is to remove highly homozygous populations to begin with. If you really need to compare them, you can always include them in a supervised run later on. Also careful attention to sample sizes (in general but specially with anomalous populations). When you compare 5 San to 5000 others, the San component takes long to resolve. However when you compare 20 San with, say, 100 others, then it shows up very early, as it must.

  10. mikej2

    June 30, 2013 at 7:00 am

    Yes, we can see results on studies and made conclusions. What I stated, or meant to state, is that there is many objects that we can study, the entirety is complex and we can make our conclusions using what we see. There is many things effecting on the results; the history, including livelihood and life style, i.e. the religion. There is migrations, mixing, expansions, bottle necks etc. The age of esach national state has a big effect on the populational structure, I would like to see fo example studies about white Americans and how they share IBS’s and IBD’s. I guess they have their own profile with high amount of shared IBD’s. There is the homozygosity and heterozygosity of the source population(s) we are testing. And not the least one, there is the sampling of used data, it is seldom fully representative for used population labels. And there is a question how well the used universal arithmetic method is suitable to use with thi complex situation. All these factors combine in results. For these reasons I recommended you to made some tests using synthetic data and compare results with real data, to see possible distortion. Making this kind of tests is ordinary work for software developers. When I did this with some known program tools generally used in this field I was surprised. It is not expectable to do quality studies like getting in your hands academic data, input it and make conclusions.

  11. pconroy

    June 30, 2013 at 8:19 am

    @Maju,You say that Oetzi has double the amount of Neanderthal ancestry, compared to Europeans. Did you know that Tunisians have the highest amount of Neanderthal ancestry, at about 5%. So was Oetzi from North Africa??Anders, Davidski,When I look at my Family Finder results, I find that my 4th highest match – in terms of Shared cM – is with a Norwegian, who lives in Raufoss, Central Norway, near Lillehammer.We share a total of 19 Shared Segments (each > 1 cM), or 49.32 cM total – the largest single segment is on Chr 22 and is 8.19 cMWhat does this mean?I know that my father – who is one of the only Irish tested with no known foreign ancestry – has the highest level of Basque-like ancestry and Scandinavian-like ancestry, when compared to other Irish people.Meanwhile on DNATribes, my first population affinity was Scandinavian, not Irish.

  12. Maju

    June 30, 2013 at 12:01 pm

    "You say that Oetzi has double the amount of Neanderthal ancestry, compared to Europeans".That's what Hawks said once at his blog, AFAIK nobody has dismantled that claim so far but Hawks has been wrong (or debatable) in such issues before, so I'm just limiting myself to state that result. Other studies on hybridization's fine detail have also been contradictory, so I would suggest not to rush to conclusions unless the purported finding has been confirmed in several independent studies."Did you know that Tunisians have the highest amount of Neanderthal ancestry, at about 5%. So was Oetzi from North Africa??"Two different Tunisian samples in the Sánchez Quinto paper (discussed here) were reported to have values of 100% and 138% relative Neanderthal ancestry compared to CEU (YRI = 0%). Even if correct that does not make a "5%" but a mere 2.4% and 3.3% in fact. Basques in the same study were reported to have almost that figure: 130% relative to CEU, i.e. 3.1% in "absolute" normalized values. But all other North African samples instead reported significantly less Neanderthal admixture than Europeans: between a max. of 69% in Northern Morocco and a min. of just 18% in Southern Morocco (a likely refuge of "Aterian ancestry") relative to CEU, i.e. 1,7% to 0.4% in "absolute" values.The same paper however produced very high results of alleged Neanderthal admixture in East Asians, with figures of almost 200% rel. to CEU, what is in total contradiction with all other studies, which report very similar levels in Europe and East Asia. So again I would not rely on Neanderthal or "Denisovan" admixture estimates to infer anything but would rather consider the rest of the data. In any case North Africa is not any likely origin for any real or alleged excess of Neanderthal ancestry with the data we have as of now, rather it should be origin of dilution of such exogenous element, as they have overall quite less Neanderthal blood than Europeans. …On what you ask to David, I think you should clarify if your mother has any sort of possible Scandinavian (or maybe Orcadian, Manx, English, even old Dublinese – i.e. any cryptically "Viking") ancestry, because if it means anything at all, it means a close relation with that Norwegian an more in general Scandinavians and it should not be through your father's side, right?

  13. Maju

    June 30, 2013 at 12:02 pm

    "For these reasons I recommended you to made some tests using synthetic data and compare results with real data, to see possible distortion".I fear that I'm not qualified to do that, sorry. I wish…

  14. Anders Pålsen

    June 30, 2013 at 3:40 pm

    It would be very complex to make such simulations. As I have shown earlier on this thread between a Maya and a Columbian assuming that IBD have an "cosmopolitian" relatedness does not apply for Native Americans as the ancient background relatedness is far higher.

  15. pconroy

    July 1, 2013 at 12:57 am

    Well my father suffers from "Viking Hand", and that's sometimes seen as proof of some Viking ancestry.My mother has Native Irish, Cambro-Norman, Huguenot and Northern English (Lancashire) ancestry. She shows a pull towards France usually – though she also has some sort of Lezgin component at 4%, as do I.My high level of segments shared with a Norwegian, may mean some ancient North European connection between these people, not necessarily more recent Viking ancestry – that's what I was wondering about??On 23andMe's Ancestry Components, I show as 99.9% British/Irish – which is the highest % of that component I've yet seen among any of my almost 1,700 matches.My father is only 88.7% British/Irish and 10.7% Nonspecific Northern Europe. my mother is 97.4% British/Irish and 2.4% Nonspecific Northern Europe.

  16. Anders Pålsen

    July 1, 2013 at 1:19 pm

    pconroy: If you are not in general is more related to Scandinavians than other you would like to compare yourself with, is it possible that this Norwegian individual could have ancestry from your side of the North-Sea?


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: