Category Archives: Roma people

Autosomal genetics of the Roma People

Some more information on the genetics of the Roma People.
Prija Moorjani et al., Reconstructing Roma History from Genome-Wide Data. PLoS ONE 2013. Open accessLINK [doi:10.1371/journal.pone.0058633]


The Roma people, living throughout Europe and West Asia, are a diverse population linked by the Romani language and culture. Previous linguistic and genetic studies have suggested that the Roma migrated into Europe from South Asia about 1,000–1,500 years ago. Genetic inferences about Roma history have mostly focused on the Y chromosome and mitochondrial DNA. To explore what additional information can be learned from genome-wide data, we analyzed data from six Roma groups that we genotyped at hundreds of thousands of single nucleotide polymorphisms (SNPs). We estimate that the Roma harbor about 80% West Eurasian ancestry–derived from a combination of European and South Asian sources–and that the date of admixture of South Asian and European ancestry was about 850 years before present. We provide evidence for Eastern Europe being a major source of European ancestry, and North-west India being a major source of the South Asian ancestry in the Roma. By computing allele sharing as a measure of linkage disequilibrium, we estimate that the migration of Roma out of the Indian subcontinent was accompanied by a severe founder event, which appears to have been followed by a major demographic expansion after the arrival in Europe.

The claim of “80%” West Eurasian ancestry seems quite exaggerated on light of the ADMIXTURE data, where at least 40% is clearly of South Asian origin (maybe somewhat more as NW South Asians display some West Eurasian admixture). I guess that they are just speculating on the ANI/ASI (North-South Indian) issue and attributing ANI to a West Eurasian gene pool, what is most confusing to say the least.

Figure 1. Relationship of Roma with other worldwide populations.
(click to expand)

It is true in any case that FST distances are significantly higher with Gujarati Indians (GIH) than with Europeans (CEU, TSI), the former at 0.026, while the latter at only 0.016.
Whatever the case I’d focus on the Fig 1(a) ADMIXTURE graph, because in the (b) one the appearance of European affinity among many South Asians (in my understanding, a 50,000 years-old affinity highlighted only for lack of sufficient K-depth, K=3 only!) is only a factor of confusion. Following this criterion, Roma appear to be some 60% West Eurasian and 40% NW Indian.
Something I really miss in this paper is a more detailed comparison not just with South Asians (more K-depth please!) but also with West Asians, totally absent from the study.
See also: Romani mtDNA

Romani autosomal genetics

French Gitanes (Roma)
CC by Fiore S. Barbato
If a few days ago I mentioned the study by Rai et al. of Romani Y-DNA, which locate their origins with great certainty in the NW reaches of the Indian subcontinent, specifically among the lower castes, now I must echo this other study, still in pre-publication stage, which deals with the autosomal genetics of the same European minority.
Priya Moorjani et al., Reconstructing Roma history from genome-wide data. arXiv 2012. Freely accessibleLINK [ref. arXiv:1212.1696]
The authors studied the nuclear genome of 27 Romani individuals from six populations of four European states: Hungary (three different populations), Romania, Slovakia and Spain. 
A reasonable complaint at this stage could be that the size of the sample is small and very specially too concentrated in a very specific area: the Middle and Lower Danube region. But, well, let’s assume that is not too important. 
The authors appear to confirm the NW Indian ancestral affinities of the Roma, however it seems obvious that they have been heavily admixed with Europeans since their migration a thousand years ago. 
The tests performed on this regard find greater affinity to Romanians than other Europeans but no other Balcanic nor West Asian peoples were tested for, so some question marks remain open. Certainly it is a bit puzzling that with all the worldwide comparisons performed in this paper not a single West Asian population was included. 
There are hence some shortcomings in the sampling and analysis strategy (why to compare with tropical Africans but not with Iranians, Turks, Egyptians or Arabs?) but the study still deserves a mention. 
Principal component analysis:

STRUCTURE  analysis:


Roma orignis and the Y-DNA haplogroup H1a1a-M82

The origins of the Roma people of Europe and West Asia are better understood each day.

Niraj Rai et al., The Phylogeography of Y-Chromosome Haplogroup H1a1a-M82 Reveals the Likely Indian Origin of the European Romani Populations. PLoS ONE 2012. Open accessLINK [doi:10.1371/journal.pone.0048477]


Linguistic and genetic studies on Roma populations inhabited in Europe
have unequivocally traced these populations to the Indian subcontinent.
However, the exact parental population group and time of the
out-of-India dispersal have remained disputed. In the absence of
archaeological records and with only scanty historical documentation of
the Roma, comparative linguistic studies were the first to identify
their Indian origin. Recently, molecular studies on the basis of
disease-causing mutations and haploid DNA markers (i.e. mtDNA and
Y-chromosome) supported the linguistic view. The presence of
Indian-specific Y-chromosome haplogroup H1a1a-M82 and mtDNA haplogroups
M5a1, M18 and M35b among Roma has corroborated that their South Asian
origins and later admixture with Near Eastern and European populations.
However, previous studies have left unanswered questions about the exact
parental population groups in South Asia. Here we present a detailed
phylogeographical study of Y-chromosomal haplogroup H1a1a-M82 in a data
set of more than 10,000 global samples to discern a more precise
ancestral source of European Romani populations. The phylogeographical
patterns and diversity estimates indicate an early origin of this
haplogroup in the Indian subcontinent and its further expansion to other
regions. Tellingly, the short tandem repeat (STR) based network of
H1a1a-M82 lineages displayed the closest connection of Romani haplotypes
with the traditional scheduled caste and scheduled tribe population
groups of northwestern India.
Figure 1. The most parsimonious route of
prehistoric expansion of Y-chromosomal haplogroup H1a1a-M82 and the
recent out-of -India migration of European Roma ancestors.
Figure 2. Phylogenetic network relating Y-STR haplotypes within haplogroup H1a1a -M82.

I don’t feel I can say much more. Just, as usual, to insist in taking the proposed age estimates with caution.


Posted by on November 29, 2012 in European history, Roma people, South Asia, Y-DNA


Finally some improved knowledge of haplogroup R1a1 (Y-DNA)

Haplogroup R1a, most of which is R1a1, dominant in Northern South Asia and Eastern Europe, as well as in much of Central Asia, has been giving headaches to population geneticists, academic and amateur alike, because key markers were not identified, making most of the haplogroup look like an amorphous goo, the same in India as in Europe. It seems that this may change now:
Horolma Pamjav et al., Brief communication: New Y-chromosome binary markers improve phylogenetic resolution within haplogroup R1a1. AJPA 2012. Pay per view ··> LINK [10.1002/ajpa.22167]


Haplogroup R1a1-M198 is a major clade of Y chromosomal haplogroups which is distributed all across Eurasia. To this date, many efforts have been made to identify large SNP-based subgroups and migration patterns of this haplogroup. The origin and spread of R1a1 chromosomes in Eurasia has, however, remained unknown due to the lack of downstream SNPs within the R1a1 haplogroup. Since the discovery of R1a1-M458, this is the first scientific attempt to divide haplogroup R1a1-M198 into multiple SNP-based sub-haplogroups. We have genotyped 217 R1a1-M198 samples from seven different population groups at M458, as well as the Z280 and Z93 SNPs recently identified from the “1000 Genomes Project”.

The two additional binary markers present an effective tool because now more than 98% of the samples analyzed assign to one of the three sub-haplogroups. R1a1-M458 and R1a1-Z280 were typical for the Hungarian population groups, whereas R1a1-Z93 was typical for Malaysian Indians and the Hungarian Roma. Inner and Central Asia is an overlap zone for the R1a1-Z280 and R1a1-Z93 lineages. This pattern implies that an early differentiation zone of R1a1-M198 conceivably occurred somewhere within the Eurasian Steppes or the Middle East and Caucasus region as they lie between South Asia and Eastern Europe. The detection of the Z93 paternal genetic imprint in the Hungarian Roma gene pool is consistent with South Asian ancestry and amends the view that H1a-M82 is their only discernible paternal lineage of Indian heritage.

Not having access to the paper right now, I can’t say much more but I believe that the abstract alone is very informative already.

Distribution of R1a per Underhill 2010


Fig. 1 – MJ trees
(click to expand)
A reader already sent me a copy of the paper and I think that it has two aspects:
On one side the paper effectively detects these markers and study them, as well as R-M458 in Hungarians and related ethnic groups (Csangos, Szeklers, Hungarian Roma), as well as in Malaysian Indians, Uzbeks and Mongols. This part is informative, even if the selected Asian populations may not be the best choice (Mongols are low in R1a and so are Tamils who make up the bulk of Malaysian Indians).

On the other side, the authors attempt to read too much, not just on these haplogroups but specially on molecular-clock-o-logic estimates, (based on the Zhivotovsky mutation rate, now considered obsolete even by molecular clock enthusiasts). A corrected age estimate would be roughly doubly old[ref 1, ref 2] and that means that neither the Kurgan expansion nor the Neolithic one could account for its arrival to Europe.
Even using Underhill age estimates, they’d imply at least LGM dates for the arrival to Europe after the due correction. Their own dates, after due x2 correction, give Late Upper Paleolithic dates for the haplogroups researched here. 
Also the authors insist on arguing against a South Asian origin of R1a1 (Underhill 2010) on what sound like weak and fallacious arguments:

Previous publications have pointed out that regions of highest haplogroup frequencies do not always indicate the territory of origin (Cinnioglu et al., 2004) and high STR diversity may not be exclusively an indicator of in-situ diversification but could also be the consequence of repeated gene flow from different sources (Zerjal et al., 2002; Sharma et al., 2009).

Basically they are nagging: “Underhill could hypothetically be wrong in his conclusions but we have no evidence whatsoever that he is – just saying”. 
The real reason is that they seem to hope to find a more westerly origin for the lineage and attribute it again to Indoeuropean expansions, in line with classic speculations for which the high South Asian STR diversity levels are a big problem. However it is most unlikely that a bunch of horse-riding nomads could so radically alter the genetic landscape of the whole subcontinent, more so when its agriculture was already fully developed, sustaining no doubt high densities. 
But notwithstanding all those highly questionable opinions, the discovery of new haplogroups adding to our comprehension of this major lineage is a great advance.


It seems that some of the data exposed in this paper was already floating around in some circles because ISOGG already includes the “new” haplogroups in its phylogenetic synthesis. Most interestingly the two “European” clades (along with a third one, whose geography I ignore so far) make up a larger haplogroup (R1a1a1b1aS198/Z282), which is “brother” of the “Asian” one (R1a1a1b2S202/Z93).

As I was just commenting elsewhere the key to the origins of R1a is not so much in these low level haplogroups but in the higher “asterisk” paragroup, which (from memory) used to be concentrated in Pakistan and nearby areas of India, etc.

But once reached the level of R1a1a1b1 (S339/Z283), this lineage seems to have split in two: one which we can describe as “European” and another which we can describe as “Indian”.

The European half is treated in this paper as two of its subclades only and separately, what may be confuse. Hence I am adding here a synthesis of the current ISOGG phylogeny of R1a, with some annotations, for easier reference:

  • R1a* ··> Iran, Persian Gulf, Turkey
  • R1a1  (L120/M516, L122/M448, M459, Page65.2/SRY1532.2/SRY10831.2)
    • R1a1* ··> Iran, Caucasus, Greece, Scandinavia
    • R1a1a (L168, L449, M17, M198, M512, M514, M515)
      •  R1a1a* ··> where? (not clear)
      •  R1a1a1 (M417, Page7)
        • R1a1a1* ··> where?
        • R1a1a1a (L664/S298)  ··> where?
        • R1a1a1b (S224/Z645, S441/Z647)
          • R1a1a1b* ··> where?
          • R1a1a1b1 (S339/Z283)
            • R1a1a1b1* ··> where?
            • R1a1a1b1a (S198/Z282)
              • R1a1a1b1a*
              • R1a1a1b1a1 (M458) ··> Central & East Europe
              • R1a1a1b1a2 (S204/Z91, S466/Z280) ··> Europe, Central Asia
              • R1a1a1b1a3 (S221/Z284, S443/Z289) ··> where?
            •  R1a1a1b2  (S202/Z93) ··> India, Central Asia

All the data on the geography of top level “asterisk” paragroups is from Underhill 2010, already mentioned above. It suggest a West Asian origin for R1a overall and spread to West and East since the R1a1a level or lower.

I used colors to emphasize the clades discussed here (purple for the larger haplogroup, blue for the European-leaning clade and red for the Indian-leaning one).

Clades in cursive are “proposed”, not yet consolidated.


Romani mtDNA

There is a new paper onthe genetics of the Roma People (Gypsies), with emphasis in mtDNA:
The lineages of this European people of South Asian origins can be divided in a clearly South Asian component (M5a1 specially, also M18, M25 and M35b), a most likely West Asian component (X2 and J1 clades specially), a possibly Balcanic component (H7 and U3) and “others” (clearly European lineages). The exact apportions vary among populations as follows:
Fig. 1
I think it is interesting demonstration of drift and founder effect how most pre-Europe “founder” lineages have vanished in most populations, specially in the rather well known “bottleneck” leading to non-Balcanic Roma: of all the Indian lineages almost only the largest one, M5a1, survives beyond that bottleneck. The same happens with West Asian X clades (barely surviving among Polish Roma).  Instead other lineages have been amplified in destiny regions, no doubt by founder effect. This is particularly true for U3 but also H7 and some of the J1 clades and even an Indian lineage barely found in the Balcans (M18, which has thrived in Spain instead).
If you now compare Spanish or Lithuanian Roma with the Bulgaria 1 sample (probably the one best representing the ancestral Roma, at least in their “founder” lineages fraction), it is almost difficult to recognize much affinity. Only M5a1 remains as a clear link. The Bulgaria 3 sample (Vallachian Roma arrived to Bulgaria in the 19th century) is maybe a more clear ancestral link but still the differences are notorious.
Indian origins
The authors argue, on statistical methodology (table 3), on a Punjab origin for the Roma, which is consistent with their language. However, as Manju points out at his blog, this is not so consistent, it seems with their patrilineages, which are probably from SE India, lacking R1a1, the most common NW Indian lineage (and I would add R2, L, J2, etc., all of which were in India some 2000 years ago when the Roma exodus must have happened, at the earliest).
Actually the state of Orissa has also high statistical likelihood for the origin of the Roma people, being only second to Punjab by the authors’ methodology. Also the main Roma founder mtDNA lineage, M5a1, is much more common in SE India than in the NW. This is also true for M18.
Fig. 3
So it is surely worthy to consider, as Manju does, whether the proto-Roma are ultimately original from Eastern or SE India, although incorporating lineages maybe from the NW, such as M35b, the same that they later did in West Asia and Europe. 
In this sense it is maybe worth considering in future studies comparing with the Domba people of South Asia and the Dom people of West Asia, generally considered to be likely relatives of European Roma.

Posted by on January 12, 2011 in Roma people