Restoring “Nile-Nubian”: How to Balance Lexicostatistics and Etymology in Historical Research on Nubian Languages

abstract⁄The paper offers a critical analysis of the proposal to dismantle the genetic unity of the so-called Nile-Nubian languages by positioning one of its former constituents, the Nobiin language, as the earliest offshoot from the Common Nubian stem. Combining straightforward lexicostatistical methodology with more scrupulous etymological analysis of the material, I argue that the evidence in favor of the hypothesis that Nobiin is the earliest offshoot may and, in fact, should rather be interpreted as evidence for a strong lexical substrate in Nobiin, accounting for its accelerated rate of change in comparison to the closely related Kenuzi–Dongolawi (Mattokki–Andaandi) cluster.
keywords⁄comparative linguistics, Nilo-Saharan, glottochronology, lexicostatistics, Nubian, West Nilotic


Although there has never been any serious disagreement on which languages constitute the Nubian family, its internal classification has been continuously refined and revised, due to such factors as the overall complexity of the processes of linguistic divergence and convergence in the “Sudanic” area of Africa; constant influx of new data that forces scholars to reevaluate former assumptions; and lack of scholarly agreement on what types of data provide the best arguments for language classification.

Traditionally, four main units have been recognized within Nubian1:

This is, for instance, the default classification model adopted in Joseph Greenbergʼs general classification of the languages of Africa,2 and for a long time it was accepted in almost every piece of research on the history of Nubian languages.

More recently, however, an important and challenging hypothesis on a re-classification of Nubian has been advanced by Marianne Bechhaus-Gerst.3 Having conducted a detailed lexicostatistical study of a representative batch of Nubian lects, she made the important observation that, while the percentage of common matches between the two main components of Nile-Nubian is indeed very high (70%), Kenuzi–Dongolawi consistently shows a much higher percentage in common with the other three branches of Nubian than Nobiin (Table 1).

Midob Birgid Kadaru Debri Dilling K/D
K/D 54% 48% 58% 57% 58%
Nobiin 40% 37% 43% 41% 43% 70%

Table 1. Part of the lexicostatistical matrix for Nubian4

In Bechhaus-Gerstʼs view, such a discrepancy could only be interpreted as evidence of Kenuzi–Dongolawi and Nobiin not sharing an intermediate common “Nile-Nubian” ancestor (if they did share one, its modern descendants should be expected to have more or less the same percentages of matches with the other Nubian subgroups). Instead, she proposed independent lines of development for the two dialect clusters, positioning Nobiin as not just a separate branch of Nubian, but actually the earliest segregating branch of Nubian. Consequently, in her standard historical scenario described at length in two monographs, there was not one, but two separate migrations into the Nile Valley from the original Nubian homeland (somewhere in South Kordofan/Darfur) — one approximately around 1,500 BCE (the ancestors of modern Nobiin-speaking people), and one around the beginning of the Common Era (speakers of Kenuzi–Dongolawi). As for the multiple exclusive similarities between Nobiin and Kenuzi–Dongolawi, these were explained away as results of “intensive language contact.”5 The lexicostatistical evidence was further supported by the analysis of certain phonetic and grammatical peculiarities of Nobiin that separate it from Kenuzi–Dongolawi; however, as of today it is the lexical specificity of Nobiin that remains at the core of the argument.

Bechhaus-Gerstʼs classificatory model, with its important implications not only for the history of Nubian peoples, but also for the theoretical and methodological development of historical and areal linguistics in general, remains somewhat controversial. While it has been embraced in the recent editions of such influential online language catalogs as www⁄Ethnologue and www⁄Glottolog and is often quoted as an important example of convergent linguistic processes in Africa,6 specialists in the field often remain undecided,7 and it is concluded in the most recent handbook on African linguistics that “the internal classification of Nubian remains unclear.”8 One of the most vocal opponents of the new model is Claude Rilly, whose research on the reconstruction of Proto-Nubian (in conjunction with his work on the historical relations and genetic affiliation of Meroitic) and investigation into Bechhaus-Gerstʼs evidence has led him to an even stronger endorsement of the Nile-Nubian hypothesis than ever before.9

While in theory there is nothing impossible about the historical scenario suggested by Bechhaus-Gerst, in practice the idea that language A, rather distantly related to language B, could undergo a serious convergent development over an approximately 1,000-year long period (from the supposed migration of Kenuzi–Dongolawi into the Nile Valley and up to the attestation of the first texts in Old Nubian, which already share most of the important features of modern Nobiin), to the point where language A can easily be misclassified even by specialists as belonging to the same group as language B, seems rather far-fetched. At the very least, it would seem to make perfect sense, before adopting it wholeheartedly, to look for alternate solutions that might yield a more satisfactory explanation to the odd deviations found in the data.

Let us look again more closely (Table 2) at the lexicostatistical evidence, reducing it, for the sake of simple clarity, to percentages of matches observed in a “triangle” consisting of Kenuzi–Dongolawi, Nobiin, and one other Nubian language that is universally recognized as belonging to a very distinct and specific subbranch of the family — Midob. Comparative data are given from the older study by Bechhaus-Gerst and my own, more recent examination of the basic lexicon evidence.10

Nobiin Midob
K/D 70% 54%
Nobiin 40%

Table 2a. Lexicostatistical relations between Nile-Nubian and Midob (Bechhaus-Gerst)11

Nobiin Midob
K/D 66% 57%
Nobiin 51%

Table 2b. Lexicostatistical relations between Nile-Nubian and Midob (Starostin)12

The significant differences in figures between two instances of lexicostatistical calculations are explained by a number of factors (slightly divergent Swadesh-type lists; different etymologizations of several items on the list; exclusion of transparent recent loans from Arabic in Starostinʼs model). Nevertheless, the obvious problem does not go away in the second model: Midob clearly shares a significantly larger number of cognates with K/D than with Nobiin — a fact that directly contradicts the K/D–Nobiin proximity on the Nubian phylogenetic tree. The situation remains the same if we substitute Midob with any other non-Nile-Nubian language, such as Birgid or any of the multiple Hill Nubian idioms.

The important thing is that there are actually two possible reasons for this discrepancy in the lexicostatistical matrix. One, endorsed by Bechhaus-Gerst, is that the K/D–Nobiin number is incorrectly increased by the addition of a large number of items that have not been inherited from a common ancestor, but actually borrowed from Nobiin into K/D. An alternate scenario, however, is that the active recipient was Nobiin, except that the donor was not K/D — rather, a certain percentage of Nobiin basic lexicon could have been borrowed from a third, possibly unidentified source, over a relatively short period of time, which resulted in lowering the percentage of Nobiin matches with all other Nubian languages.

Thus, for instance, if we assume (or, better still, somehow manage to prove) that Nobiin borrowed 6% of the Swadesh wordlist (i.e., 6 words on the 100-item list) from this third source, exclusion of these words from lexicostatistical calculation would generally normalize the matrix, increasing the overall percentage for the K/D–Nobiin and Nobiin–Midob pairs, but not for the K/D–Midob pair.

The tricky part in investigating this situation is determining the status of those Nobiin words on the Swadesh list that it does not share with K/D. If the phylogenetic structure of the entire Nubian group is such that Nobiin represents the very first branch to be split off from the main body of the tree, as in Bechhaus-Gerstʼs model (fig. 1), then we would expect a certain portion of the Swadesh wordlist in Nobiin to be represented by the following two groups of words:

The revised classification of Nubian according to Bechhaus-Gerst

Fig. 1. The revised classification of Nubian according to Bechhaus-Gerst

Indeed, we have a large share of Nobiin basic words that set it apart from every other Nubian languages (see the more than 30 items in §⁄III of the list below), but how can we distinguish retentions from innovations? If the word in question has no etymological cognates in any other Nubian language, then in most cases such a distinction is impossible.13 However, if the retention or innovation in question was not accompanied by the total elimination of the root morpheme, but rather involved a semantic shift, then investigating the situation from an etymo­logical point of view may shed some significant light on the matter. In general, the more lexico­statistical discrepancies we find between Nobiin and the rest of Nubian where the Nobiin item has a Common Nubian etymology, the better the case for the “early separation of Nobiin” hypothesis; the more “strange” words we find in Nobiin whose etymological parallels in the other Nubian languages are highly questionable or non-existent, the stronger the case for the “pre-Nobiin substrate” hypothesis.

In order to resolve this issue, below I offer a concise and slightly condensed etymological analysis of the entire 100-item Swadesh wordlist for modern Nobiin.14 The lexical items are classified into three groups:

100-Item Swadesh List for Nubian: The Data

I. Nobiin/Kenuzi–Dongolawi Isoglosses

I.1. General Nubian Isoglosses

I.2. Exclusive Nile-Nubian Isoglosses

II. Nobiin / Non-K/D Isoglosses

II.1. Potential K/D innovations

II.2. Potential Synonymy in the Protolanguage

III. Nobiin-exclusive Items

III.1. Nobiin-exclusive Items with a Nubian Etymology

III.2. Nobiin-exclusive Items without a Nubian Etymology

III.3. Nobiin-exclusive Recent Borrowings

Analysis of the Data

Based on the presented data and the etymological discussion accompanying (or not accompanying) individual pieces of it, the following observations can be made:

  1. Altogether, §⁄III.2 contains twenty items that are not only lexicostatistically unique for Nobiin, but also do not appear to have any etymological cognates whatsoever in any other Nubian languages. This observation is certainly not conclusive, since it cannot be guaranteed that some of these parallels were missed in the process of analysis of existing dictionaries and wordlists, or that more extensive lexicographical research on such languages as Midob or Hill Nubian in the future will not turn out additional parallels. At present, however, it is an objective fact that the percentage of such words in the Nobiin basic lexicon significantly exceeds the corresponding percentages for any other Nubian language (even Midob, which, according to general consensus, is one of the most highly divergent branches of Nubian). Most of these words are attested already in ON, which is hardly surprising, since the majority of recent borrowings into Nobiin have been from Arabic and are quite transparent as to their origin (see §⁄III.3).
  2. Analysis of §⁄III.1 shows that in the majority of cases where the solitary lexicostatistical item in Nobiin does have a Common Nubian etymology, semantic comparison speaks strongly in favor of innovation, i.e., semantic shift in Nobiin: “blood” ← “fat,” “hear” ← “ear,” “meat” ← “inside,” “say” ← “tell,” “swim” ← “be on the surface,” “tree” ← “jujube”; a few of these cases may be debatable, but the overall tendency is clear. This observation in itself does not contradict the possibility of early separation of Nobiin, but the near-total lack of words that could be identified as reflexes of Proto-Nubian Swadesh equivalents of the respective meanings in this particular group clearly speaks against this historical scenario.
  3. It is worth mentioning that the number of isoglosses that Nobiin shares with other branches of Nubian to the exclusion of K/D (§⁄II.1) is extremely small, especially when compared to the number of exclusive Nile-Nubian isoglosses between Nobiin and K/D. However, this observation neither contradicts nor supports the early separation hypothesis (since we are not assuming that Nobiin should be grouped together with B, M, or Hill Nubian).


Based on this brief analysis, I suggest that rejection of the Nile-Nubian hypothesis in favor of an alternative historical scenario as proposed by Bechhaus-Gerst is not recommendable, since it runs into no less than two independent historical oddities/anomalies:

  1. assumption of a huge number of basic lexical borrowings from Kenuzi–Dongolawi into Nobiin (even including such elements as demonstrative and interrogative pronouns, typically resistant to borrowing);
  2. assumption of total loss of numerous Proto-Nubian basic lexical roots in all branches of Nubian except for Nobiin (19–21 possible items in §⁄III.2). Such conservatism would be highly suspicious; it is also directly contradicted by a few examples such as “water” (q.v.) which clearly indicate that Nobiin is innovative rather than conservative.

By contrast, the scenario that retains Nobiin within Nile-Nubian, but postulates the existence of a “pre-Nobiin” substrate or adstrate only assumes one historical oddity, similar to (1) above — the (presumably rapid) replacement of a large chunk of the Nobiin basic lexicon by words borrowed from an unknown substrate. However, it must be noted that the majority of words in §⁄III.2 are nouns, rather than verbs or pronouns, and this makes the idea of massive borrowing more plausible than in the case of presumed borrowings from K/D into Nobiin.30

This conclusion is in complete agreement with the tentative identification of a “pre-Nile- Nubian substrate” in Nobiin by Claude Rilly,31 who, based on a general distributional analysis of Nubian lexicon, claims to identify no fewer than fifty-one Nobiin lexical items derived from that substrate, most of them belonging to the sphere of basic lexicon. It remains to be ascertained if all of Rillyʼs fifty-one items are truly unique in Nobiin (as I have already mentioned above, some of these Nobiin isolates might eventually turn out to be retentions from Proto-Nubian if future data on Hill Nubian and Midob happens to contain etymological parallels), but the fact that Rilly and the author of this paper arrived at the same conclusion independently of each other by means of somewhat different methods looks reassuring.

If the Nile-Nubian branch is to be reinstated, and the specific features of Nobiin are to be explained by the influence of a substrate that did not affect its closest relative (K/D), this leaves us with two issues to be resolved — (a) chronology (and geography) of linguistic events, and (b) the genetic affiliation of the “pre-Nile-Nubian substrate” in question.

The aspect of chronology has previously been discussed in glottochronological terms.32 In both of these sources the application of the glottochronological method as introduced by Morris Swadesh and later recalibrated by Sergei Starostin allowed to generate the following classification and datings (fig. 2):

Phylogenetic tree for the Nubian languages

Fig. 2. Phylogenetic tree for the Nubian languages with glottochronological datings (generated by the StarlingNJ method)33

If we take the glottochronological figures at face value, they imply the original separation of Proto-Nile-Nubian around three to three and a half thousand years ago, and then a further split between the ancestors of modern Nobiin and K/D around two to two and a half thousand years ago. Interestingly enough, these events are chronologically correlatable with the two main events in the history of Nile-Nubian languages according to Bechhaus-Gerst, but not quite in the way that she envisions it: her “early separation of Nobiin” becomes the early separation of Nobiin and K/D, and her “later separation of K/D” becomes “final split between Nobiin and K/D.” The interaction between Nobiin and the mysterious “pre-Nile-Nubian substrate” must have therefore taken place some time in the 1st millennium CE (after the split with K/D but prior to the appearance of the first written texts in Old Nubian). Nevertheless, at this point I would like to refrain from making any definitive conclusions on probable dates and migration routes, given the possibility of alternate glottochronological models.

The other issue — linguistic identification of the “pre-Nile-Nubian substrate” — is even more interesting, since its importance goes far beyond Nubian history, and its successful resolution may have direct implications for the reconstruction of the linguistic history of Africa in general. Unfortunately, at this moment one can only speculate about what that substrate might have been, or even about whether it is reasonable to speak about a single substrate or a variety of idioms that may have influenced the early independent development of Nobiin.

Thus, Rilly, having analyzed lexical (sound + meaning) similarities between his fifty-one “pre-Nile-Nubian substrate” elements and other languages spoken in the region today or in antiquity, reached the conclusion that the substrate in question may have contained two layers: one related to ancient Meroitic, and still another one coming from the same Northern branch of Eastern Sudanic languages to which Nubian itself is claimed to belong.34 An interesting example of the former would be, e.g., the resemblance between ON mašal “sun” and Meroitic ms “sun, sun god,” while the latter may be illustrated with the example of Nobiin šìgír-tí “hair” = Tama sìgít id. However, few of Rillyʼs other parallels are equally convincing — most of them are characterized by either significant phonetic (e.g., Nobiin súː vs. Nara sàː “milk”) or semantic (e.g., Nobiin nóːg “house” vs. Nara lòg “earth”) discrepancies, not something one would really expect from contact relations that only took place no earlier than two thousand years ago. Subsequent research has not managed to alleviate that problem: cf., e.g., the attempt to derive Nobiin nùlù “white” from proto-Northeast Sudanic *ŋesil “tooth,”35 unconvincing due to multiple phonetic and semantic issues at the same time.

In Языки Африки, an alternate hypothesis was put forward, expanding upon an earlier observation by Robin Thelwall,36 who, while conducting his own lexicostatistical comparison of Nubian languages with other potential branches of East Sudanic, had first noticed some specific correlations between Nobiin and Dinka (West Nilotic). Going through Nobiin data in §⁄III.2 yields at least several phonetically and semantically close matches with West Nilotic, such as:

Additionally, Nobiin múg “dog” is similar to East Nilotic *-ŋɔk-37 and Kalenjin *ŋoːk,38 assuming the possibility of assimilation (*ŋ- > m- before a following labial vowel in Nobiin). These parallels, although still sparse, constitute by far the largest single group of matches between the “pre-Nile Nubian substrate” and a single linguistic family (Nilotic), making this line of future research seem promising for the future — although they neither conclusively prove the Nilotic nature of this substrate, nor eliminate the possibility of several substrate layers with different affiliation.

In any case, the main point of this paper is not so much to shed light on the origin of substrate elements in Nobiin as it is to show that pure lexicostatistics, when applied to complex cases of language relationship, may reveal anomalies that can only be resolved by means of a careful etymological analysis of the accumulated evidence. It is entirely possible that advanced character-based phylogenetic methods might offer additional insight into this problem, but ultimately it all comes down to resolving the problem by means of manual searching for cognates, albeit without forgetting about statistical grounding of the conclusions.

In this particular case, I believe that the evidence speaks strongly in favor of reinstating the Nile-Nubian clade comprising both Nobiin and Kenuzi–Dongolawi, although it must be kept in mind that a common linguistic ancestor and a common ethnic ancestor are not necessarily the same thing (e.g., the linguistic conclusion does not at all exclude the possibility that early speakers of Kenuzi–Dongolawi did shift to Proto-Nile-Nubian from some other language — not necessarily Nubian in origin itself).



