These are questions about Central Eurasian linguistics that come up from time to time (hence their inclusion in my FAQ section).

Why Kazakh ‹у› should not be transcribed as ‹u›

The character ‹u› usually represents a vowel in the Latin alphabet. In Kazakh, ‹у› may sound like a vowel, but in fact represents three separate sounds / sound combinations: /əw/, /ɘw/, and /w/. Consider the following forms, where /w/ is added as a suffix, and the letter ‹у› is used to represent various sound combinations:

  • Verbs with stems ending in non-high vowel:
    • Verb stem "қала-" (қала-ған) → "қалау" (/qɑlɑ+w/ = qalaw)
    • Verb stem "сөйле-" (сөйле-ді-м) → "сөйлеу" (/sy̯ʉjli̯ɘ+w/ = söylew)
  • Verbs with stems ending in a high vowel:
    • Verb stem "оқы-" (оқы-ды-м) → "оқу" (оқыу /u̯ʊkə+w/ = oqıw)
    • Verb stem "ері-" (ері-ген) → "еру" (еріу /i̯ɘrɘ+w/ = eriw)
  • Verbs with stems ending in a consonant:
    • Verb stem "қал-" (қал-ды-м) → "қалу" (/qɑl+w/ → [qɑləw] = qalıw)
    • Verb stem "кел-" (кел-ді-м) → "келу" (/ki̯ɘl+w/ → [ki̯ɘlɘw] = keliw)

In all cases, the /w/ sound is the last part of the sound represented by ‹у›. For stems ending in non-high vowels, ‹у› represents just /w/. For stems ending in high vowels (/ə/ ‹ы› and /ɘ/ ‹і›), the ‹у› represents the combination of those and /w/. The same is true of verbs whose stems end in consonants, except here an epenthetic high vowel is inserted.

Now consider what happens when the "-ны" -/NI/ suffix is added to noun stems ending in ‹у›:

  • "тау"+"ны" → "тауды" /tɑw+NI/ → [tɑwdə] = tawdı
  • "бу"+"ны" → "буды" /bəw+NI/ → [bəwdə] = bıwdı
  • (and the above exmaples: "қалауды", "сөйлеуді", "оқуды", "еруді")

As opposed to what happens when added to noun stems actually ending in vowels:

  • "алма"+"ны" → "алманы" /ɑlmɑ+NI/ → [ɑlmɑnə] = almanı
  • "малшы"+"ны" → "малшыны" /mɑlʃə+NI/ → [mɑlʃənə] = malşını

In all the cases of nouns ending in ‹у›, the original ‹n›-initial suffix starts with ‹d›. This happens only after consonants in Kazakh—after vowels it retains the /n/. This shows that ‹у› (or, more accurately, the last sound in the various sounds represented by ‹у›) is in fact a consonant and not a vowel.

Since the sound represented by ‹у› in Kazakh always is or ends with a consonant, ‹u› makes a bad choice of a way to represent the sound, and ‹w› is preferable; however, ‹w› alone should not be used as a simple conversion from Kazakh ‹у›, since there are also the sounds /ɘw/ and /əw/ represented by ‹у›—these should be transliterated as something like ‹iw› and ‹ıw›. Generally the sound represented by ‹ұ› (/ʊ/) in Kazakh is represented by ‹u› in the Roman orthography, anyway, so using ‹w› for [the last sound in] ‹у› additionally avoids potential ambiguity and confusion.

Note: ‹ю› is just a combination of ‹й› and ‹у›, so any word with ‹ю› should be transcribed accordingly; e.g. тию ‹tiyiw›, аю ‹ayıw›, etc.

Note: When ‹у› is used word-intially, it represents the consonant /w/, and so should be transcribed as just ‹w›; e.g. уақыт ‹waqıt›, уәде ‹wäde›, etc.

A further argument could be made that if the intention is to avoid using the Cyrillic alphabet for Kazakh because it was created by imperial degree as part of the colonisation process, then a straight one-to-one conversion of the Cyrillic letters used for Kazakh should be avoided. The methods for the suggestions stated above rely on knowledge of the phonology and morphology of Kazakh, and do not require as a prerequisite the Cyrillic orthography. This way, the somewhat arbitrary conventions of the Cyrillic orthography (at least as used for Kazakh) are not carried over into the orthography used for the Latin alphabet as used for Kazakh.

The transcription of Kazakh ‹и›

In Kazakh, ‹и›, like ‹у›, represents a couple combinations of sounds—namely, /ɘj/ and /əj/, both ending in consonants. Because of this, ‹и› should alternatively be transcribed into English as

Compare the following verbs:

  • ти- "touch" = ‹tiy›, inf. тию ‹tiyiw›
  • тый- "prohibit" = ‹tıy›, inf. тыю ‹tıyıw›

In the first syllable of verb stems, и can usually only represent a front vowel (/ɘ/ = ‹i›) followed by /j/ (= ‹y›). However, compare the following verbal adverbs:

  • Verbs with stems ending in non-high vowel:
    • Verb stem "қала-" (қала-ған) → "қалай" (/qɑlɑ+j/ = qalay)
    • Verb stem "сөйле-" (сөйле-ді-м) → "сөйлей" (/sy̯ʉjli̯ɘ+j/ = söyley)
  • Verb stems ending in a high vowel:
    • Verb stem "оқы-" (оқы-ды-м) → "оқи" (оқый /u̯ʊkə+y/ = oqıy)
    • Verb stem "ері-" (ері-ген) → "ери" (ерій /i̯ɘrɘ+y/ = eriy)

Here it's clear that ‹и› is representing the combination of ‹ы› (= ‹ı›) or ‹і› (= ‹i›) followed by ‹й› (= ‹y›). Because of this, Kazakh ‹и› should be transcribed as ‹iy› or ‹ıy›. The same arguments as for ‹у› also apply here; i.e., it rejects the arbitrary conventions of the Cyrillic orthography as used for Kazakh, and avoids confusion with ‹і› (=i), ‹й› (=y) and ‹ы› (=ı).

Transcription of Kyrgyz palataly stuff

The following addresses a problem in transcribing Kyrgyz, but the basic issue is valid for Kazakh (and a number of other languages) as well.

The following four phonemes in Kyrgyz together pose a problem for transcription. Presented below are the most common systems.

Cyrillic IPA Russian-style Turkologist English-style A English-style B Turkish-style
и /i/ i i i i i
ы /ɯ/ y ɨ, ï y y ı
й /j/ j y i y y
ж /ʤ/ dzh/dž ǰ j j c/j

The major flaw of the two English-style systems is that they both merge one of the unrounded high vowels (/i/, /ɯ/) with the consonant /j/. The other systems don't merge any phonemes, but have their own problems. For example, the Russian-style system uses the trigraph ‹dzh›—or at best, the digraph ‹dž›—for a single Kyrgyz phoneme. This is due to certain traditions of transcribing Russian, along with the fact that this sound is approximated in Russian by the combination of two phonemes (/d/ and /ʐ/); this is completely unnecessary in Kyrgyz. The remaining two systems (Turkologist/Turkicist and Turkish-style) each pose the problem of containing characters not found in English.

What do I recommend? Any consistent way of keeping the phonemes distinct wins my vote, and everything else is just æsthetic. I personally prefer to use the Turkish-style system (with ‹j› instead of ‹c›, since Kyrgyz has no phonemic difference between [ʤ] and [ʒ])¹ for general-purpose broad/phonemic transcriptions and IPA for more narrow transcriptions or when writing for a linguistics-oriented audience.


  1. Kyrgyz does, however, have a phonemic contrast when recent loans are included. In Russian loans, both /ʒ/ and /ʦ/ are attested modernly for many speakers (traditionally, these were nativised as /ʤ/ and /s/ or /ʧ/, respectively). This leaves the problem of how to transcribe these, since using ‹j› for /ʒ/ would be normal in the Turkish-style system, leaving ‹c› for /ʤ/; this doesn't leave ‹c› for /ʦ/, though unlike for Russian, a two-phoneme analysis may be best: /ts/. In this case, ‹ts› might be the best way to go.


There are two prevailing views of "Altaic".

The first view (Altaic) is that Altaic represents a language family consisting of several subfamilies that are different because of divergence. Usually this includes Turkic, Mongolic, and Tungusic, and sometimes also Japanese and Korean. This is not easily supportable, primarily because of the lack of consistent sound changes in cognates, and that fact that lexical and morpho-syntactic cognates aren't pervasive.

The second view (anti-Altaic) is that Altaic represents an areal grouping of originally unrelated languages that have become more similar due to convergence. This explains the main problems with the divergence view. However, this view generally goes along with the idea that there was / has been a single period of contact among all these groups—in more or less a single geographic area—before they went their own ways. This isn't supportable because there's no evidence for any single "~Altaic homeland".

Instead, there seems to be evidence for many different periods of contact between different groups: e.g., a distant period of Indo-European/Turkic contact; a period of heavy Turkic/Mongolic contact (in the form of Turkic speakers being absorbed as Mongolic speakers?) about 2000-2500 years ago; at least one period of Mongolic/Tungusic contact some time later; a period of Turkic/Hungarian contact about 2000 years ago; several periods of contact between the Sogdians, the Tokharians, and Turkic groups, resulting in most Sogdian and Tokharian populations eventually shifting to Turkic; a period of Mongolic/Turkic contact within the last 1000 years, most heavily affecting South Siberian Turkic languages, where the contact continues to this day in some groups, but also heavily affecting Kypchak languages (some Turkic varieties have been completely abandoned due to shifts to Mongolic); a period of Turkic/Persian, particularly in the south, where it continues to affect a certain number of Turkic languages to this day; and even a number of periods of Turkic/Turkic contact.

These many periods of contact between certain groups have resulted in a number of languages with structural, lexical, and morphological similarities. This is more or less the definition of a Sprachbund, though it's hard to define the geographical or linguistic edges of this particular one, whose core is the middle of Central Eurasia. Does it include Hungarian? Definitely. Does it include Tibetan? Probably. Did it once include the ancestors of Japanese and Korean? Probably.

So a single Sprachbund would include almost all of Central Eurasia; however, since the shape and languages involved have changed so much over time, it's hard to give it a single name (such as "Altaic") or provide a single list of languages which have been involved in it (or even features we expect to find everywhere). A fitting name might be the Central Eurasian Sprachbund/Sprachbünde. This is the third view of Altaic, which is hardly represented in the literature at all, yet is a view increasingly shared by people who think seriously about the problem.