Category Archives: bad science

Basque language: a criticism of Joseba Lakarra

Important correction: I got totally confused about the authorship of this paper. Neither Frank nor Alonso are the authors but they do have another paper[es] that is also a criticism on Lakarra’s conjectures, paper that I will have to discuss in another entry in order to compensate for my error.

This study is actually anonymous, being the second of that kind, a clear signal of academic freedom being seriously curtailed in Basque philology in the Basque Country itself by this professor’s power networks. 

Continues original entry with the necessary corrections:

A group of anonymous linguists (the speak in plural “we”) have recently published a paper in which they criticize the excessive reliance of Basque language studies on the work of Prof. Joseba Lakarra, whose shadowy control of the Basque Academy on this matter is most worrying, notably since his key defamatory intervention against the extraordinary finds of Iruña-Veleia, which challenge to some extent the foundations of his work.

Sadly for many readers of this blog, the new study is published only in Spanish and Basque languages. In spite of that I feel the need to briefly discuss it here.
Anonymous authors, Joseba Lakarra a examen. Sobre el Diccionario Histórico Etímologico Vasco. Euskararen Jatorria 2013. Freely accessibleLINK 1 (Spanish), LINK 2 (Basque)
The paper begins with a pondered praise of Lakarra’s efforts to go beyond Mitxelena’s paradigms. However they feel that he should also be much more self-critical and humble and ready to back when he’s clearly wrong, what he does not. A key concern is that the Academy of Basque Language (Euskaltzaindia) and University of the Basque Country are focused on a major work: the creation of an etymological dictionary, which will be founded almost only on Lakarra’s work, what could well be a total disaster and waste of resources if he is mostly wrong.
Naturally Lakarra is the director of the project himself. While a few other authors (Tovar, Trask) are cited in Lakarra’s magnum opus project, they are almost only mentioned in a negative manner. The result can therefore be foreseen as a monument to Lakarra’s own vanity.
Nothing new in fact, as Lakarra is infamous for citing almost exclusive his own works, often unpublished, what is not accepted as a healthy academic praxis anywhere… except in his own feudal domain, it seems. This problem of self-citation is discussed in section 4 of this paper.
The criticisms of Lakarra’s work can be synthesized following the structure of the study:
  1. The monosyllabic root theory of Lakarra is too daring. The available evidence does not support this in most cases.
  2. There is no process of critical revision. This makes Lakarra models mere hypothesis or conjectures and not at all proven theories. Larry Trask did not include a single root by Lakarra in his own etymological dictionary. Michael Morvan and J.B. Orpustan frontally rejected Lakarra’s ideas.
  3. All reconstructions are purely theoretical.
  4. Abusive self-citation, often of unpublished materials. Lakarra almost never cites other authors than himself.
  5. No systematization. Lakarra’s model has never been systematically described, something that the professor seems to prefer, as it allows him for unlimited freedom in his ramblings.
  6. Frequent changes in the etymologies, revealing extreme insecurity and improvisation in Lakarra’s own thought.
  7. Abusive use of typological comparativism. Even if systematically criticizes comparativism, because he only believes in internal reconstruction for the case of Basque, he constantly relies in  grammatic comparison with other unrelated languages.
  8. Incoherence with the reality of languages 3000 years ago. For Lakarra, Basque in that time only had the most rudimentary vocabulary and grammar, while the reality we know is that all languages were as complete as they are today, and therefore (proto-)Basque must have been as well.
  9. Monosyllabic root theory has serious issues. Words like lur (earth, land, soil) are ancestrally monosyllabic for Lakarra, however they are attested in bisyllabic forms like luur or luhur, suggesting that it is in fact a shortening of longer ancient words. There are many other such cases.
  10. It does not even consider dialectal variation. Lakarra invariably uses only the modern standard form (Euskara Batua), totally ignoring the well attested dialectal variation.
  11. It ignores Aquitanian toponymy. For example eihar for Lakarra derives from Lat. cremare, while it is attested as such []eihar in Aquitaine c. 87 CE.
  12. Some proposed evolutions are absolutely incredible. For example:
    *goi-bar (‘up-down’) > *gwibar > *bi-z-bar > bizkar (anat. back, geog. hill, mountain).
  13. Some etymologies suffer of serious anachronisms. For example, bazter (edge, corner, riverside; secondarily: field, land, place) is made by Lakarra to derive from Lat. praesaepe via Castilian Spanish pesebre and a claimed intermediate word presepre (actually unattested). Sp. pesebre is attested only 130 years after Basque bazter is. [I believe that bazter is actually present in an ancient Iberian text from Mula, Murcia, see note below].
  14. Breaches the principle of regularity when we consider Basque dialects.
  15. Ignores Basque culture. For example hogi (bread) is for Lakarra derivate from hor (dog) and -gi (-gi/-ki common for meat kinds), meaning in his mind originally something like dog-meat. This is simply absurd… but so are so many things around this peculiar individual in his ivory tower.
  16. Sometimes misinterprets words. For example atseden (to rest, turn off, breath, satisfy) is mistranslated by Lakarra as to die.
  17. Does not help at all to the reconstruction of Aquitanian onomastics. Nothing at all in Lakarra’s work helps the understanding of this key ancient reference of Basque studies.
  18. Risk of unitary or monolithic thought. Lakarra’s single-handed effective domination of Basque philology in the Western Basque Country has almost stopped independent research altogether. His followers limit themselves to make comments to his theories without daring to think independently, much less being critical.
  19. Conclusions. Warning on the use of public funds for the vanity project of this man, who is no doubt fallible.


Note on bazter: in the Ibero-Ionian text on lead from El Cigarralejo (Mula, Murcia – pictured), in line #7 it reads:


Which I tentatively read in modern Basque as follows:
Zabal bazterrak bide denetik bezainelako; i.e. something like: such as the ample margins through the whole path. Uncertain particularly about the last word bezanelaz.

Other fragments of this piece, as well as of other Ibero-Ionian texts also sound terribly Basque-like, although of course not identical. Once I asked a friend from Ondarroa, native speaker of Basque, of his opinion on this text and, laughing, he replied: not from Ondarru but maybe from Lekitto (Lekeitio: the nearby town, which has a distinct dialect).

‘Eurasian’ language macro-family or just another bluff?

Andrew (at his blog) leads me to this interesting criticism by Sally Thomason of the much fabled study about a supposed new language macro-family including the most unlikely Eurasian languages such as Dravidian, Indoeuropean and “Eskimo” (sic). 
The original paper by Mark Pagel et al. proposes that a reduced core of 23 words are “ultraconserved”, allowing them to formulate their hypothesis only on them (totally substandard even for the more generous mass-comparison approach). 
When Thomason looks at the raw data she finds that of the 23 words, only 2 have consensual proto-words in Altaic, for example, all the rest having several alternatives, of which Pagel and co. cherry-picked this or that one with the sole criterion of the convenience for their speculation. 
Never mind that Altaic, as defined in that database of Starostian inspiration, includes Japonic and Koreanic, something nowadays essentially discarded. 
Also the attribute of ultraconservation, foundation for the Pagel hypothesis, is challenged by Thomason, who finds that only 6 or 7 words of the 23 are conserved from Proto-Indoeuropean into English, a very low rate considering that English vocabulary is overwhelmingly of Indoeuropean origins (be them Germanic, Old French or some other variant).
In other words and in French: rien de rien; nothing at all worth the media hype that the Pagel paper has achieved… in the short run.

Oppenheimer 2012: the scholastic ouroboros of repeating the usual ‘molecular clock’ errors

Last year Stephen Oppenheimer published yet another article on the mitochondrial DNA tree and his vision of the molecular clock applied to the human matrilineages.

Stephen Oppenheimer, Out-of-Africa, the peopling of continents and islands: tracing uniparental gene trees across the map. Philosophical Transactions of the Royal Society B, 2012. Freely accessibleLINK [doi:10.1098/rstb.2011.0306]
The centerpiece of the article is fig.2, a mtDNA tree with his “molecular clock” estimates of the ages of the haplogroups. Sadly it has a major problem: the resulting dates have a horrible fit with all the archaeological and paleoclimatic evidence and even with the most recent estimates for the Pan-Homo split. 
Much of the article (all section 1.b) is dedicated to attempt to justify his so-called “calibration” methods, which are in the end based on a self-reference: Soares 2009, of which Oppenheimer was co-author and which was calibrated assuming a Pan-Homo split age of 5-6 Ma. 
In annoyingly pointless circular reasoning, Oppenheimer manages now to estimate the  Pan-Homo split at 6.5 Ma using the Soares 2009 “molecular clock” rates.
All these Pan-Homo split age guesstimates are horribly wrong, because Sahelanthropus tchadiensis (c. 7 Ma ago) was already in the Homo line (and not anymore in the Pan one) and also because several other authors have estimated the Pan-Homo divergence age to be at least 8 Ma old, and maybe as ancient as 13 Ma (Langergraeber 2012).
Sadly the Academy remains stuck and Oppenheimer is no exception but rather the opposite. This is his fig. 2 with my rough corrections in red after proper recalibration of the Pan-Homo split age:

This does not mean that the red colored dates provided here are necessarily the correct ones, although in many cases they do seem to fit much better with the archaeological and paleoclimatic data, especially at the lower ranges. It is merely a simple “first aid” correction to Oppenheimer’s necessarily incorrect estimates. 
Other factors must be taken into account, for example I do not believe for a second that M is older than African L3 branches, which show only one or, in one case, two coding region mutations downstream of the L3 node, while M is three mutations downstream and N five. Oppenheimer seems determined to count HVS mutations for example and to estimate age counting from the present forms (which could well be frozen in time for many many millennia because of “drift out” phenomena if the population was large enough but not too large, which would tend to freeze the hegemonic lineages in my modeling tests, while removing any novel ones). 
I do not propose any alternative “molecular clock” for mtDNA because I feel that it poses way too many issues because of irregular branch length. Maybe in the future some brilliant geneticist (or maybe mathematician?) will be able to posit a reasonably good refurbished “molecular clock” for mtDNA but at the moment I know of no one. 
I’m just stating the obvious: what Oppenheimer is selling is necessarily wrong.

Posted by on May 17, 2013 in bad science, molecular clock, mtDNA