A new paper investigates the genetic structure of Italy:
Cornelia di Gaetano et al., An Overview of the Genetic Structure within the Italian Population from Genome-Wide Data. PLoS ONE, 2012. Open access ··> LINK [doi:10.1371/journal.pone.0043759]
The results confirm that Sardians are a very distinct population and show that Italians essentially seem to cluster with mainland Europeans (NW Europeans in principle but Iberian or Balcanic comparisons are missing), West Asians and Sardinians in this order.
There is some N-S gradient in the Peninsula and Sicily but it’s mostly determined by an increasing West Asian affinity in the South. Central Italians stand between North and South but clearly closer to the North but many individuals from NW Italy (Liguria and Piedmont, as well as some Sardinians) actually cluster with Southern Italians as well as with “mixed” Sardinians (those Sardinians who stand between the main insular cluster and the peninsular one).
|Figure 1. SNP-Based PC of 1,262 individuals from 10 sub-populations.
The Italian population plotted onto the first two principal components defined by the European HGDP-CEPH populations and CEU HapMap data. Scatter plot of the first two principal components, obtained using R software (prcomp). Analysis based on 125,799 autosomal SNPs. Individuals included belong to Northern Italy (N-IT): black dots, Central Italy (C-IT): red dots, Southern Italy (S-IT): green dots, Sardinian (SAR): blue dots…
[the original legend does not explain well the other populations (too many blatant errors in the text) but it’s obvious that the group to the top-right corner are other Europeans (French, CEU), while the group to the center-left are West Asians (Druze, Palestinian, Bedouins) and Mozabites. Larger images can be downloaded from the paper].
In this first characterization we see a primary duality between Europe and West Asia (the Paleo-Neolithic dichotomy probably) and a secondary one between Sardinia and mainland Europe.
|Figure 2. SNP-Based PC of 1,014 individuals from the Italian dataset.
A. A Scatter Plot of the Italian population of the first two principal components obtained via R software (prcomp). Individuals included belong to Northern Italy : black dots, Central Italy : red dots, Southern Italy : green dots, Sardinian: blue dots.
B. Italian population without the Sardinian-projected scatter plot of the first two principal components obtained via the R software (prcomp)
[larger images can be downloaded from the paper]
Here we see (A) a main dichotomy between Sardinia and Peninsular Italy (with Sicily) and a secondary N-S gradient. However in (B) it becomes more obvious that to some extent there are two distinct clusters: Southern and Central-North Italy with certain clear separation.
However, and this is quite interesting some North Italians strongly cluster with Southern Italians. Razib mentions
this fact as signature of internal Italian migrations but individual migrations would not look that way because the genetic distinction would have diluted in the meantime, appearing at most as intermediate. What we see instead is preserved genetic identity, not too diluted or not diluted at all, with Southern Italy in many Northern Italians.
Who are these Northern Italians, I wondered then. The answer is in the supplements:
Hidden population structure within the Italian dataset. Scatter plot of the first two eigenvectors based on 125,799 autosomal SNPs and 1,012 individuals. Colors represent the four different macro-areas; green- Southern Italy (Apulia, Calabria/Sicily, Campania, Basilicata), red- Central Italy (Tuscany, Lazio, Emilia Romagna and Abruzzo/Marche), black- Northern Italy (Piedmont,Liguria, Aosta Valley and Lombardy), blue- Sardinia (these samples were labeled for the linguistic area). Subjects are symbol- labeled by municipality. Information on municipality was not used for calculations.
[click to expand]
In this image we can appreciate how all Northern Italians clustering with Southern Italians are from two specific regions: Liguria and Piedmont (Piemonte), the Northwestern regions of Italy, bordering France. What do these two regions have in common? All I can think is that, in ancient times they were mostly inhabited by the Ligures
, a pre-Indoeuropean people plausibly descendant from the first Neolithic colonization (Cardium Pottery, via the Chassey-Cortaillod-La Lagozza cultural complex).
|Roman region of Liguria (Regio IX)
We are also provided with a bayesian cluster analysis, for which K=4 seems the most valid result (K=3 and K=5 also give low cross-validation values but do not seem more informative):
|Figure 3. Clustering of the European, Northern African and Middle Eastern individuals by the Structure software.
ancestry analysis based on a subset of HGDP-CEPH and HapMap CEU data
using the merged data of 126K autosomal SNPs. Ancestry for each
individual was inferred using ADMIXTURE  at K = 4. Abbreviations as in Figure 1.
This confirms four clusters: Main European (green), Sardinian (red), West Asian (blue) and North African (purple).
I tend to consider the West Asian component as the main Neolithic input in Europe, although, of course other DNA sections may well have traveled around in that period or later on.
I also find notable that Sardinian affinity exists among Italians, French and North Africans (surely via Iberia) but almost not among North American Euro-descendants (CEU) of NW European origin and West Asians, who instead do sport some notable Mainline European affinity.
It’s also interesting that CEU are among the most North African related of all European populations.
Some prehistoric and proto-historic speculation
, and only IF, the affinity of Ötzi with Sardinians
can be considered representative of how most Italy was in the Chalcolithic (and not a random fluke specific of that man or his mountain community), then, we should consider two further waves into Italy: (1) of West Asian affinity (maybe from the Agean since the Bronze Age or even before) and (2) of mainland European affinity (Indoeuropeans: Italics, Celts).
IF this is correct then the Ligures would not be so much descendant genetically from La Lagozza-Chassey, as I said above but from the “Aegean” wave. This would also be consistent with some individual Tuscans clustering with Southern Italians as well (historical Etruscans are one of the culminations of these Aegean waves together with the Greek colonies).
But sincerely, I am not aware of any such Aegean flow arriving to the proto-historical Liguria, are you?
So I must consider that there is another possibility: that the Sardinian element represents only one of several Neolithic (or maybe even Paleolithic but nothing clear here) elements in Italy, maybe associated to Y-DNA I2a (strong in Croatia, Bosnia, etc.), while the other, the one most akin to West Asia, would be related to Y-DNA E1b-V13 (strong in Greece and Albania) and maybe other patrilineages from the Eastern Mediterranean like J2b, etc. Both E1b-V13 and I2a are know from ancient DNA from the Neolithic of the Western Mediterranean, so they did indeed take part in these migrations.
Then the “Greek” or “Aegean” (or “Albanian” if you wish) component was reinforced by Bronze Age flows while the “Dalmatian” one was diluted instead by the successive Indoeuropean (Kurgan) waves.
I’ll leave it this way until more evidence comes forward.