Category Archives: Khoisan peoples

SW African Bantu matrilineages

Prolific researcher Chiara Barbieri has put online another interesting study on African genetics, this time about the Bantu populations of Southwestern and Central-Southern Africa (i.e. Namibia, Angola, Botswana and Zambia).
Chiara Barbieri et al., Migration and interaction in a contact zone: mtDNA variation among Bantu-speakers in southern Africa. bioRXiv 2014. Freely accessible (pre-pub) → LINK


Bantu speech communities expanded over large parts of sub-Saharan Africa within the last 4000-5000 years, reaching different parts of southern Africa 1200-2000 years ago. The Bantu languages subdivide in several major branches, with languages belonging to the Eastern and Western Bantu branches spreading over large parts of Central, Eastern, and Southern Africa. There is still debate whether this linguistic divide is correlated with a genetic distinction between Eastern and Western Bantu speakers. During their expansion, Bantu speakers would have come into contact with diverse local populations, such as the Khoisan hunter-gatherers and pastoralists of southern Africa, with whom they may have intermarried. In this study, we analyze complete mtDNA genome sequences from over 900 Bantu-speaking individuals from Angola, Zambia, Namibia and Botswana to investigate the demographic processes at play during the last stages of the Bantu expansion. Our results show that most of these Bantu-speaking populations are genetically very homogenous, with no genetic division between speakers of Eastern and Western Bantu languages. Most of the mtDNA diversity in our dataset is due to different degrees of admixture with autochthonous populations. Only the pastoralist Himba and Herero stand out due to high frequencies of particular L3f and L3d lineages; the latter are also found in the neighboring Damara, who speak a Khoisan language and were foragers and small-stock herders. In contrast, the close cultural and linguistic relatives of the Herero and Himba, the Kuvale, are genetically similar to other Bantu-speakers. Nevertheless, as demonstrated by resampling tests, the genetic divergence of Herero, Himba, and Kuvale is compatible with a common shared ancestry with high levels of drift and differential female admixture with local pre-Bantu populations.

Figure 1: Map showing the rough geographical location of populations, 
colored by linguistic affiliation. Abbreviations of population labels are 
as specified in Table 1.

In spite of the Bantu-centric approach of the study, which also has its merits, my greatest interest is rather in the less typically Bantu lineages, which speak of admixture with several pre-Bantu populations.
In this sense I find the following highlights:

Fig. S2 (annotated in green by Maju): CA plots based on haplogroup frequencies. Left: all the dataset, right: excluding outliers.

L3d and L3f founder effect:
The Himba and Herero, as well as the non-Bantu pastoralists Damara make one distinctive cluster defined by the high frequencies of haplogroup L3d, as well as L3f (not present among the Damara but found among the Kuvale). As discussed in the paper, the Himba and Herero may be related to the Kuvale of SW Angola but they have notable differential levels (or directionality) of aboriginal admixture. 
As both L3d and L3f are present in West and East Africa alike, it is interesting to track the specific subhaplogroups implicated in this founder effect, something done in fig. 4. 
The main L3d sublineage is L3d3a1, whose haplotype network shows a largely Khoisan centrality (not Damara) although this node is shared also by some unspecified “other Bantu”. The Southern Africa specificity of L3d3a was already noticed in the past (see here). So it is very possible that we are before an aboriginal Southern African lineage, maybe arrived with the first Khoisan Neolithic (or whatever other ancient flow) rather than a Bantu-specific founder effect. 
The main L3f subhaplogroup is L3f1b4a, which seems more specifically Bantu, with a major branch concentrated among the Himba, Herero and Kuvale. This lineage is not found among the Damara in spite of the other strong affinity of this Khoisan population towards the Himba and Herero. L3f1b is found in Southern Africa, Kenya and Oman (per Bihar 2008), so we are probably before a distinctive East African element, not too likely to be genuinely Bantu but possibly just assimilated into Bantu ethnic identity. 
Even if both lineages converge in the Himba and Herero, they are almost certainly different inputs, one of Damara (herder Khoisan) origin and the other of Bantuized East African origin maybe.
L1b founder effect:
L1b is essentially a West African lineage concentrated in the Sahel area from Chad westwards (although L1b1a2 is from the Nile basin). A particularly high frequency population are the Fulani pastoralists, original from the Westernmost African plateaus, who ruled many kingdoms in West Africa between the collapse of the colonial rule by Morocco and the consolidation of the European conquest of the continent.
As this study does not dwell in sublineages, we cannot understand the most likely specific origins of it among several Southern African populations, specifically the pooled NE Zambians (13%) and the Fwe and Shanjo of SW Zambia (24-27%).
In any case it is a notorious founder effect, almost absent in other Bantus of the area (0-10%).
Typical L0d Khoisan admixture:
This element is concentrated in Botswana (~25%) and with highest frequencies in the SW Kgalagadi (53%). It is also important among the Kuvale of SW Angola (21%). Other Bantu populations in this dataset have frequencies under 10%, some even zero. The Damara have 13%.
We know from previous studies that it is also found at high frequencies among the Xosha of South Africa (L0d3).
While L3h appears marked in the graph, the lineage is in fact absent in all populations except at very low frequency among the Kuvale (2%), so it does not seem actually of any relevance. 
Less typical L0k around SW Zambia:
While L0k is generally considered an aboriginal Southern African lineage it has a much more northernly distribution than the more common and surely older L0d. Its area of greatest commonality seems to be SW Zambia (see here and here).
This study confirms this distribution:

Supplementary Figure S3[A]: Haplogroup frequencies of important haplogroups in the populations studied here. A: Haplogroups L0d and L0k.(…)

The size of the circles is proportional to the sample size.

High frequencies of L1c (Pygmy admixture marker) among Southern African Bantus:
An interesting element is the commonality of L1c, typical of Western Pygmies and some other populations from Gabon (possibly representative of the wider West-Central Africa jungle region, not too well studied otherwise), among almost all Bantu populations in this dataset. 
The exceptions are the Herero, Himba, Kgalagadi and Tswana (0%), as well as the NE Zambians (4%). All the rest have frequencies between 12% and 30%. Even the non-Bantu Damaras have 11% of it.
In my understanding this almost certainly implies a notable level of admixture with Western Pygmies of the Bantus from especially Angola and West Zambia. A phenomenon that may be widespread in Central-West Africa. 
It is notable however that at least many of the populations with the highest likely Khoisan admixture (in its various forms, discussed in the previous sections) have the lesser frequencies of L1c (Pygmy admixture). So to a great extent these two aboriginal influences in Bantu mtDNA seem mutually exclusive and were probably produced after settlement rather than “on the march”. 
This in turn arises some interesting questions about the ethnic geography of Africa before the Bantu expansion. 

Update: I just noticed that Ethiohelix has parsed the haplogroups’ frequency into a very helpful chartLINK.

See also:

Khoesan and Coloured autosomal DNA in context

There has been a number of studies coming out recently on Khoesan genetics but this one does not seem to be just redundant, providing some extra information instead.

Desiree C. Petersen et al., Complex Patterns of Genomic Admixture within Southern Africa. PLoS Genetics 2013. Open accessLINK [doi:10.1371/journal.pgen.1003309]


Within-population genetic diversity is greatest within Africa, while between-population genetic diversity is directly proportional to geographic distance. The most divergent contemporary human populations include the click-speaking forager peoples of southern Africa, broadly defined as Khoesan. Both intra- (Bantu expansion) and inter-continental migration (European-driven colonization) have resulted in complex patterns of admixture between ancient geographically isolated Khoesan and more recently diverged populations. Using gender-specific analysis and almost 1 million autosomal markers, we determine the significance of estimated ancestral contributions that have shaped five contemporary southern African populations in a cohort of 103 individuals. Limited by lack of available data for homogenous Khoesan representation, we identify the Ju/’hoan (n = 19) as a distinct early diverging human lineage with little to no significant non-Khoesan contribution. In contrast to the Ju/’hoan, we identify ancient signatures of Khoesan and Bantu unions resulting in significant Khoesan- and Bantu-derived contributions to the Southern Bantu amaXhosa (n = 15) and Khoesan !Xun (n = 14), respectively. Our data further suggests that contemporary !Xun represent distinct Khoesan prehistories. Khoesan assimilation with European settlement at the most southern tip of Africa resulted in significant ancestral Khoesan contributions to the Coloured (n = 25) and Baster (n = 30) populations. The latter populations were further impacted by 170 years of East Indian slave trade and intra-continental migrations resulting in a complex pattern of genetic variation (admixture). The populations of southern Africa provide a unique opportunity to investigate the genomic variability from some of the oldest human lineages to the implications of complex admixture patterns including ancient and recently diverged human lineages.

The array of Khoesan populations senso stricto analyzed in this study is much smaller than that of Schebusch 2010 but this study has the advantage of including Cape Coloureds and their Baster relatives, partially descendants from the otherwise extinct pastoralist Khoekhoe (Hottentots, now considered a derogative term) who lived in much of Southern Africa upon the arrival of Bantu and Europeans, as well as the amaXhosa, a Bantu people which clearly display marked Khoesan admixture.

Figure 1. Map of southern Africa
showing distribution of sampling per population identifier and
significant historical events that likely shaped ancestral

There is brief mention of maternal and paternal DNA. Just to mention that mtDNA being mostly aboriginal (L0d/L0k) among the Khoesan (86-100%), the Coloureds (68%) and even the Xhosa (47%, all L0d), while aboriginal Y-DNA (essentially A2b and A2c2, plus occasional B2) is concentrated among the Ju/’hoan, with the !Xun being instead dominated by E1b1-M275, of putative East African (Nilotic?) origins. This is consistent with the !Xun being historically pastoralists. European patrilineages, notably R1b, are dominant among the Baster (92%) and Cape Coloured (71%).
Coloureds only make up some 9% of South African population but they dominate the countryside in much of the former Cape Province. Namibian Basters are a subset of them who migrated northwards in 1868.

Figure 2.  PCA and STRUCTURE analysis (click to expand)
We can see in the graphics above how the North Cape Coloured and Baster only display minor Bantu admixture, being essentially a variable mix of European and Khoesan ancestry, with probably also some Malay input (apparent in the increase of the blue component relative to the European reference). Instead East Cape and Cape Town (D6) Coloured appear to have greater apportion of Bantu ancestry and, especially the later, a notable increase of the East Asian input.
The STRUCTURE graph, particularly at K=9, is also informative about other African populations but I won’t dwell in that here. 
The authors also made an interesting exercise of analysis using Ancestry Informative Markers with the !Xun and Xhosa:

Figure 4. Ju/’hoan-Yoruba ancestry
informative markers (AIMs) defined ancestral contributions to the !Xun
and amaXhosa, providing evidence for two distinct !Xun lineages with
differing ancestral contributions.
It seems evident that much of the !Xun ancestry (up to 70%) does not fall in either (Ju/’hoan-Yoruba) category but it is something else, probably specific to this people. The Xhosa Khoesan ancestry also seems closer to the pastoralist !Xun than to the (likely more genuinely ancient) Ju/’hoan. 

There is some more info in the paper but I feel that the essentials are sufficiently covered here. 

See also:

Khoe-San matrilineages and prehistory

A most interesting study has just been published that reconstructs the prehistory of the Khoe-San peoples of Southern Africa primarily using mitochondrial DNA analysis but with very important reliance on archaeological data as well.
Karina M. Schlabusch et al., MtDNA control region variation affirms diversity and deep sub-structure in populations from Southern Africa. BMC Evolutionary Biology 2013. Open accessLINK [doi:10.1186/1471-2148-13-56]

Abstract (provisional)


The current San and Khoe populations are remnant groups of a much larger and widely dispersed population of hunter-gatherers and pastoralists, who had exclusive occupation of southern Africa before the influx of Bantu-speakers from 2 ka (ka = kilo annum [thousand years] old/ago) and sea-borne immigrants within the last 350 years. Here we use mitochondrial DNA (mtDNA) to examine the population structure of various San and Khoe groups, including seven different Khoe-San groups (Ju/’hoansi, !Xun, /Gui+//Gana, Khwe, =Khomani, Nama and Karretjie People), three different Coloured groups and seven other comparative groups. MtDNA hyper variable segments I and II (HVS I and HVS II) together with selected mtDNA coding region SNPs were used to assign 538 individuals to 18 haplogroups encompassing 245 unique haplotypes. Data were further analyzed to assess haplogroup histories and the genetic affinities of the various San, Khoe and Coloured populations. Where possible, we tentatively contextualize the genetic trends through time against key trends known from the archaeological record.


The most striking observation from this study was the high frequencies of the oldest mtDNA haplogroups (L0d and L0k) that can be traced back in time to ~100 ka, found at high frequencies in Khoe-San and sampled Coloured groups. Furthermore, the L0d/k sub-haplogroups were differentially distributed in the different Khoe-San and Coloured groups and had different signals of expansion, which suggested different associated demographic histories. When populations were compared to each other, San groups from the northern parts of southern Africa (Ju speaking: !Xun, Ju/’hoansi and Khoe-speaking: /Gui+//Gana) grouped together and southern groups (historically Tuu speaking: =Khomani and Karretjie People and some Coloured groups) grouped together. The Khoe group (Nama) clustered with the southern Khoe-San and Coloured groups. The Khwe mtDNA profile was very different from other Khoe-San groups with high proportions of Bantu-speaking admixture but also unique distributions of other mtDNA lineages.


On the whole, the research reported here presented new insights into the multifaceted demographic history that shaped the existing genetic landscape of the Khoe-San and Coloured populations of southern Africa.

From the reading of the paper, I gather the following chronology (which should be always taken with some caution because of the uncertainty of “molecular clock” methods but in this case they seem reasonably backed from the material/cultural evidence record):
  1. L0d coalescence time estimate may correlate with the arrival of MSA to the region c. 100 Ka ago (I estimated once ~90 Ka, so it is consistent with my thought).
  2. Its sublineage L0d1’2 might have expanded c. 50 Ka ago (I would rather think of a more ancient chronology, soon after the L0d node – they can’t correlate it properly with any obvious archaeological pattern, so it might be, I guess, more related to the apogee of MSA c. 75 Ka ago).
  3. Some L0d1 subclades (notably L0d1a, L0d1b) would have expanded with the transition to LSA (40-20 Ka ago).
  4. L0d2a shows an star-like expansion that they estimate to have happened c. 7-8 Ka ago and would be related to an Epipaleolithic (with microlithic industry) that is also notable for the increase of the density of archaeological findings in South Africa and Lesotho. This lineage also shows secondary expansion with pastoralism later on.
  5. The introduction of herding c. 2000 years ago may have affected the correlations between the various L0d lineages. However most lineages show signs of expansion in this period. The main exception is L0d1a (decrease instead) and to some extent L0d1c (first decrease, later increase probably related to the !Xun late adoption of pastoralism, affecting especially to L0d1c1).
    1. L0d3 is too old to have expanded with pastoralism, so the authors reject  Tatiana Karafet’s hypothesis that it expanded in this period and that it could be related (to most unlikely) linguistic relation between Sandawe and Khoe-San. Instead they suggest (as I did in the past) that L0d3 had an East African distribution instead with only minor spreading to the Khoe-San in relation with pastoralism.
  6. The recent Iron Age (last millennium) arrival of Bantu-speakers absorbed primarily L0d2a, which is the most common lineage of Khoe-San peoples (including Coloureds, with the partial exception of Cape Coloured, where it is second to L0d2b).
The paper only briefly mentions L0k1, which is most concentrated towards Katanga (D.R. Congo) and may therefore have arrived to Southern Africa only with Bantu or pastoralist flows.
Frequencies and estimated timelines of major Southern African L0d and L0k lineages (from fig. 4):

See also:


Khoesan genetics helping to understand the evolutionary history of Humankind as a whole

A reader sent me a copy of this letter or short paper on South African autosomal genetics:
Carina M. Schlebusch, Genomic Variation in Seven Khoe-San Groups Reveals Adaptation and Complex African History. Science 2012. Pay per view ··> LINK [doi:10.1126/science.1227721]
[Note-update (Oct 2): the supplemental material is free and very very extensive: a must read for genetic data-miners and all those interested in getting deeper and more extensive info, even if just on the ethno-historical background of the populations considered in the study, something that most people, including myself, only know rather shallowly].

The paper has several points of interest but is specially useful, complemented by previous studies like Pickrell 2012, to better understand the aboriginal and modern genetics of Southern Africa, which is analyzed, for example as principal component (and other) analysis relative to geography.

Fig. 1. (click to expand)

(A) Sampling locations.
(B) Principal components analysis (PCA) of African individuals showing PC1 and PC2 rotated to fit geography.
(C) PCA for Khoe-San populations (∼ 2.3M SNPs).
(D) Pairwise FST for sub-Saharan populations (excluding Hadza, see fig. S24 for comparison)
(E) Prediction of the genetic components from geographic, linguistic and subsistence covariates. The predictive error relative to geography is given for each combination of covariates (values < 1 show improved predictive capacity compared to geography).
Also an Admixture analysis with an estimate divergence tree that is off in chronology by about 100% or even more. When will geneticists learn to calibrate their “molecular clock” speculations on archaeology? When?!
Here you have it, annotated by me (in red):
Fig. 2.(click to expand)
(A) Rooted population topology from a concordance test approach (14). Nodes with bootstrap support < 50% are collapsed (dashed lines), all other nodes have bootstrap support > 85%.
[Annotations in red by Maju]
(B) Clustering of 403 sub-Saharan African individuals (∼ 270k SNPs), assuming 2 to 11 clusters.
(C) Clustering of 118 southern African individuals (∼ 2.3M SNPs), assuming 2 to 8 clusters. Compare with fig. S16 that include recently admixed individuals.

Additionally the authors think that they have located a number of key genes that appear to have been selected for among some Khoesan groups and/or diversified around the time of the first human split c. 100 200 millennia ago, such as:
  • MYPN (myopalladin) – associated with muscle growth and function
  • ACTN3 – associated with “fast twitching” muscles and elite athletic performance
  • MHC – major histocompatibility comple
  • PRSS16 and POM121L2 – thought to protect against infectious diseases
  • ERCC4 regulators – related to pigmentation
  • ROR2 – involved in regulating bone and cartilage development
Also the following regions appear to have suffered intense selective pressures among early Homo sapiens in general, always according to the authors:
  • SPTLC1 – involved in hereditary sensory neuropathy
  • SULF2 – that regulates cartilage development
  • RUNX2 – related to morphological differences with other Homo species, notably Neanderthals (frontal bossing, clavical morphology, bell-shaped rib cage, and regulating the closure of the fontanel which is crucial for brain expansion)
  • SDCCAG8 – involved in microcephaly
  • LRAT – associated with Alzheimer’s disease 


Thus, three of the top five regions contain genes involved in skeletal development, and syndromes associated with mutations in these genes display similar morphological features.

While also:

Including SULF2, three of the top five candidate regions are thus associated with neuronal function.

All this falls within expectations, I’d say, but nevertheless most interesting to know in such detail and precision.

Khoisan autosomal genetics

There is a new major paper at arXiv on Southern African autosomal genetics, with emphasis on pre-Bantu aboriginal peoples (usually known as Khoisan, even if the phylogenetic unity of their languages is not anymore accepted).
Joseph K. Pickrell et al., The genetic prehistory of southern Africa. arXiv 2012. Open access ··> LINK.

The hunter-gatherer populations of southern and eastern Africa are known to harbor some of the most ancient human lineages, but their historical relationships are poorly understood. We report data from 22 populations analyzed at over half a million single nucleotide polymorphisms (SNPs), using a genome-wide array designed for studies of history. The southern Africans-here called Khoisan-fall into two groups, loosely corresponding to the northwestern and southeastern Kalahari, which we show separated within the last 30,000 years. All individuals derive at least a few percent of their genomes from admixture with non-Khoisan populations that began 1,200 years ago. In addition, the Hadza, an east African hunter-gatherer population that speaks a language with click consonants, derive about a quarter of their ancestry from admixture with a population related to the Khoisan, implying an ancient genetic link between southern and eastern Africa.  

Unlike most other African (or global) genetic studies, here the Southern African natives are not undersampled and that way a more realistic genetic structure, with Khoisan peoples occupying their distinct position in the overall human structure, is more evident.

Selected images (from the supplementary material):

Locator map and legend for all PCA graphs:

Color code is as follows:

  • Dark grey: non-Khoisan Africans (incl. Hadza)
  • Blue: Khoe-Kwadi
  • Green: Kx’a
  • Red: Tuu
  • Light grey: Eurasians

Global PCA (SF 2):

The PC1 sets apart Khoisan (specially the Kx’a and the Tuu) from Eurasians, while the PC2 defines the non-Khoisan African dimension (including some Khoisan, specially the Damara).

Africa-only PCA (PC1-2) (SF 3):

Excluded Eurasians, the PC1 is taken by the Khoisan-other African dialectics, while the PC2 contrasts the Ju|’hoan versus the Nama specially. 

Africa-only PCA (PC2-3 with the inclusion of the ǂKhomani) (SF 17):

+ ǂKhomani samples (excluded in previous graphs)

Interestingly enough, here the Damara and the Himba contrasting with other Africans, specially the Dinka, take over PC2. Meanwhile PC3 is monopolized by the contrast between Ju|’hoan and the ǂHoan.

Admixture clustering (SF 7):

Click to expand

As in most previous PC analysis (excepted African PCA 2-3) the most neatly distinct Khoisan populations are the Kx’a (green cluster) and Tuu (purple one). At K=6, two other clusters are taken by the two distinct Pygmy populations, while the fifth one describes Eurasians (or West Eurasians). The remainder indistinct African is shown as blue component but it seems to hide important substructure in fact.

Arguably K=5 may be a better description, or at last one we are more familiar with: differentiating between Khoisans (green), Pygmies (orange), Hadza (purple), other Africans (blue) and Eurasians (red). But something of the order of K=13 or 16 is probably a better description in any case for such a varied bunch.

Special thanks to Millán.