From Citizendium
Jump to navigation Jump to search
This article is developing and not approved.
Main Article
Related Articles  [?]
Bibliography  [?]
External Links  [?]
Citable Version  [?]
This editable Main Article is under development and subject to a disclaimer.

Pali is an ancient Indic language with its background in the Indian subcontinent. It is the language of Theravada Buddhism, and in particular of the Pali Canon. Pali is closely related to the Sanskrit family of languages[1]. Pali's grammar is simplified as compared with Sanskit, and the vocabulary is similar, in many cases varying only by a set of common phonological transformations, such as:

Sanskrit Pali definition
arhat arahant "deserving": one who sees the true nature of existence and has conquered their own negative tendencies so that they no longer take karmic actions; similar to 'saint' in Christianity
Dharma Dhamma "upholding": refers, among other things, to the teachings of the Buddha about how the universe works and how a person can minimize or avoid suffering for themselves and others
karma kamma "action": a thought, speech or deed which results in immediate or future (negative) consequences; or, the part of one's fate which is a consequence of past karmic actions by oneself and/or others
nirvāna nibbāna or nibbāṇa "quenching (as of fire)": a state of peacefulness; a complete lack of suffering
śānti santi "peace"
sūtra sutta "discourse" in Buddhist literature
sangha saṅgha or saṃgha either monastic order or the totality of those with certain spiritual attainments


Three spellings of the name are found in Pali:

  • pāli: this is the preferred spelling of most Western scholars
  • pāḷi: this spelling appears on the title pages of the Burmese, Khmer and Sinhalese editions of the Canon
  • pāḷī: this is the spelling used in Aggavaṃsa's Pali grammar, the Saddanīti

Its fundamental meaning is "row (sequence)", but it is used more specifically in the sense of "text". It does not actually appear in the text of the Canon, strictly speaking (it appears in a summary verse[2]). Thus pāḷibhäsä means "the language of the texts". According to Kate Crosby,[3] it originally meant specifically the idiom of the texts, as distinct from that (or those) of commentaries and other writings, but from about the 12th century it broadened to refer to the language as a whole. The shift from the descriptive meaning "language of the texts" to the proper name meaning "Pali language" has not been established before the 17th century.

The traditional name for the language was Māgadhī, i.e. the language of Magadha. It is likely derived from some form(s) of language used, if not in Magadha proper, then at least in the Magadhan Empire, but the name Māgadhī is used by traditional Indian grammarians to refer to a quite different dialect spoken by "low" characters in Sanskrit dramas.


A number of scholars (at least Professors Cousins,[4] von Hinüber[5] and Oberlies[6]) give roughly the following narrative. Buddhist teachings were originally in an "Eastern" dialect of Early Middle Indo-Aryan. This term is misleading, as the area covered by it is entirely surrounded by areas of "Western" dialects. Scholars call this dialect Old Ardha-Māgadhī, Ardha-Māgadhī being the language in which the oldest Jain literature is preserved. At a later date the teachings were translated into a Western dialect, though retaining some Eastern features. Cousins considers this dialect too different from what we know today to count as Pali strictly speaking, and calls it Old Pali, following some earlier scholars. He considers the change to reflect a change in the official language of the empire ruling most of North India at the time, and dates it to the late 3rd or early 2nd century BC. Oberlies locates this Western dialect somewhere in the vicinity of Quetta. This dialect was later subject to a process of Sanskritization, which Cousins dates to the early centuries of the Christian era, calling this phase of the language Buddhist Hybrid Pali (Norman, similarly, says Pali may be regarded as a form of Buddhist Hybrid Sanskrit, the name given by Western scholars to the Sanskritized forms of Middle Indo-Aryan in which most Mahayana Buddhist scriptures were composed[7]). Cousins considers the name Pali, without qualification, to apply not long before the commentaries of the 4th or 5th century. Further Sanskritization occurred later, perhaps as late as the 12th century (scholars disagree on this date).

The late Professor Norman seems to disagree with the above only in holding that the teachings were originally in a variety of dialects, not a single, Eastern one.

Professor Gombrich has an apparently quite different theory:[8] the Buddha himself created something sufficiently similar to the Pali we have now to be called by that name, by mixing together various dialects in use in the areas in which he preached.


There are different approaches to the language in East and West (the latter including Japan for this purpose). Theravada tradition holds that Pali is the root language, the language of reality, the language of gods, demons, ghosts, talking animals, and wolf-children. It therefore tends not to regard it as having a history at all, the above account being of how Western scholars see Pali's history. In recent times, of course, some people mix different approaches.

Both approaches are heavily influenced by Sanskrit, but in different ways. Traditional grammarians hardly ever mention it explicitly, but silently follow it, even when inappropriate. For example, when Sanskrit grammarians decided the order in which cases of nouns should be listed, they carefully did so in such a way that identical forms appear next to each other. Pali grammarians kept this order, even though it has exactly the opposite effect, separating identical forms. Also, when there is more than one grammatical form, they may treat as default that close to Sanskrit, rather than what is common in actual usage.

Western scholars, on the other hand, tend to spend a lot of time explaining the evolution of everything from Sanskrit, in addition to describing the language as it actually exists.

Traditional grammar, following Sanskrit, is generative grammar: the language is built up out of verbal roots. Aggavaṃsa lists them in his grammar (about 1700 of them), but Kaccāyana and Moggallāna leave them for other works. The methods of construction are sometimes artificial: some of the derivations are completely fictitious, and in other cases the analysis is influenced by ease of exposition. Pali grammarians fill in gaps in forms actually attested in literature written before they wrote, using the analogy of Sanskrit. Some such forms were then introduced into literature written afterwards.


Pali has no writing system of its own. Instead, people tend to use their own. Thus Pali manuscripts have for centuries been written in Burmese, Khmer, Sinhalese and other local scripts. These scripts are not alphabets in the strict sense, but abugidas: a vowel following a consonant is written, not as a separate letter, but as a diacritical mark attached to the consonant (except a, which is the default). From the 19th century, Western scholars, following this practice, have transliterated Pali into variants of the Latin alphabet. The Pali Text Society has largely standardized the transliteration. This Western Pali alphabet (given by Warder and used in Western reference books) is as follows:

  • a ā I ī u ū e o ṃ k kh g gh ṅ c ch j jh ñ ṭ ṭh ḍ ḍh ṇ t th d dh n p ph b bh m y r l ḷ ḷh v s h


  1. When ṃ occurs immediately before one of the letters from k to m, it is alphabetized as if it were the corresponding nasal: ṅ, ñ, ṇ, n or m.
  2. This is the Western Pali alphabet; in the East, ḷ ṃ are at the end of the alphabetical order, and ḷh is not recognized as a letter in its own right, being written like other combinations of consonants (this Eatern Pali alphabet is given by Clough, d'Alwis[9] and Mason, and used in, for example, Buddhadatta's Concise Pali-English Dictionary).

Tables of scripts can be found at [2] and appended to Elizarenkova & Toporov and more recent editions of Warder.


Oberlies[10] lists 42-4 phonemes: a ā i ī u ū e o ä y v k kh g gh c ch j jh ṭ ṭh ḍ ḍh t th d dh p ph b bh r l ñ ṇ n m nh mh s h ṃ and possibly ü yh. Elizarenkova & Toporov (page 46) give 33: a i/y u/v e o k kh g gh c ch j jh ñ ṭ ṭh ḍ/ḷ ḍh/ḷh ṇ t th d dh n p ph b bh m r l s h. Oberlies does not explain their pronunciations, and says there are extensive discrepancies between standard Pali spelling and ancient pronunciation,[11] and that the latter has not been properly studied by scholars yet.[12] The late Professor Warder, however, says the ancient pronunciation is approximately known.[13] The pronunciations listed first below are those explained by Warder immediately after the preceding statement, which might suggest they are intended as his account of ancient pronunciation. In his explanation of v, however, he says "many speakers of Pali pronounce ..." in the present tense, confusing this supposition. The other sources used here also do not say explicitly that they are giving the ancient pronunciation.

  • most as English
  • a as u in butter (Duroiselle gives a different explanation, but it is not clear what difference he is making between a and ā)
  • ā as in father
  • i as in pit
  • ī as in machine
  • u ū as English oo, short and long, as in foot, fool
  • e as English ay, but without the final glide to an i sound; when followed by two consonants, as in bed
  • o as in bone but without the final glide to a u sound (Yorkshire o); when followed by two consonants, as in pot
  • ṃ as m but without release (this must mean you stop breathing while opening the mouth after this sound; Johansson says instead that this letter represents nasalization of the preceding vowel)
  • the h as second element in some letters represents what is known to phoneticians as aspiration, described as a puff of breath. For most English-speakers, k, say, is aspirated at the beginning of a word but not at the end or after s. The distinction has no meaning in English, so most English-speakers have difficulty with it. The Pali aspiration is stronger.
  • c as English ch, but with the middle of the tongue, not just the tip, touching the roof of the mouth (this must apply to j too)
  • in ṭ/t etc., the distinction is that the tip of the tongue is against the roof of the mouth in the dotted letters, the tips of the teeth in the undotted ones. In English, the former pronunciation is normal in India, the latter in Ireland. In England, most speakers place the tongue in between these positions, against the top of the teeth. Again, the distinction is not clearly audible to the untrained native English-speaker.
  • v as v or w

Warder doesn't explain two letters not used in English:

  • ṅ as English ng[14]
  • ñ might be supposed from the symbol to be pronounced as in Spanish (French and Italian gn, Portuguese nh), a sound that sounds to English-speakers rather like ny (hence canyon from Spanish cañon), though the n and y sounds are mixed together rather than successive; Frankfurter, Müller and Johansson give this pronunciation. The late Professor Norman, however, pronounced this as an ng sound as above, but with the tongue in the position described above for c

Modern pronunciation is much more varied: an attempt at a detailed account for Southeast Asia (but not Sri Lanka) can be found at [3]. In Burma, for example, the following major variations are widespread:

  • a before two consonants is pronounced as i
  • c as s
  • j as z
  • r as y
  • s as English th (i.e. lisped)

Sound interactions

Neighbouring sounds often interact with each other. Here are some examples of combinations of prefixes:

  • anu + ā > anvā > annā
  • paṭi + anu > paccanu
  • pari + upa > *pariyupa > payirupa (metathesis; the asterisk indicates a hypothetical form not actually attested)
  • vi + ati > vīti

Such effects are also sometimes found between separate words, though always optionally. Such changes are a quite common linguistic phenomenon. For example, in England, final r is not usually pronounced, but it is restored when closely followed by a word beginning with a vowel. Sometimes it is "restored" even where it never existed in the first place, so "law and order" often sounds like "Laura Nauder". Most written languages ignore such changes, but Pali, like Sanskrit (and Welsh), indicates them. Some examples:

  • -ṃ + ca > -ñ ca
  • -ti + iti > -tīti, often written -tī ti
  • tena + upa- > ten' upa-

Parts of speech

The grammatical tradition classifies into four:

  • nouns, in the broader sense, comprising substantives (nouns proper), adjectives and pronouns
  • verbs
  • particles or indeclinables: adverbs, pre-/postpositions, conjunctions and interjections
  • prefixes or preverbs

Classification into nouns, verbs and particles is quite common, used in traditional Greek and Arabic grammar, for example. In treating verbal prefixes as separate words, Pali grammarians follow Pāṇini's Sanskrit grammar. That grammar covered two forms of Sanskrit together: the archaic language of the Veda and the language of his own day. In the former, those prefixes were separable, but in the latter they had become inseparable. In Pali, there are only rare cases of separation (tmesis), e.g. the Buddhavaṃsa uses the phrase ajjha so vasi, which might be translated "in- he -habited".


Compounds, usually of nouns, are very common. Some major types are illustrated here:

  • mātar (mother) + pitar (father) > mātāpitaro (parents); this type of compound can have more than two components
  • rājan (king) + purisa (man, or more generally male) > rājapurisa (king's man, i.e. state employee)
  • seta (white) + chatta (sunshade) > setacchatta (either white sunshade or having a white sunshade)

As the last example illustrates, there may be more than one way of construing a compound. Hypothetically, *kamaladevī might mean goddess like a lotus, goddess with a lotus, woman whose god(dess) is like a lotus, or various other possibilities.

The compounding process can be iterated, producing some very long compounds: cīvarapiṇḍapātasenāsanagilānapaccayabhesajjaparikkhāra appears dozens of times in the Canon, and kakudha-kuṭaja-aṅkola-kaccikāra-kaṇikāra-kaṇṇikāra-kanavera-koraṇḍaka-koviḷāra-kiṃsuka-yodhika-vanamallika-m-anaṅgaṇa-m-anavajja-bhaṇḍi-surucira-bhaginimālā-malya-dhare seems to be the longest word in the Canon.


In Sanskrit (as in Latin and Greek) there is usually only one grammatically correct form, but in Middle Indo-Aryan alternative forms are common.[15]

There are two numbers, singular and plural, and three genders, masculine, feminine and neuter. The number of cases is a matter of disagreement among grammarians. Traditional sources give 7: nominative, accusative, instrumental, dative, ablative, genitive, locative (these are western names, of course). Some Western grammarians, e.g. Johansson, make 8 by adding the vocative as a separate case, following Latin and Greek (grammarians within the tradition, following Sanskrit, regard it as a variant of the nominative). Contrariwise, Elizarenkova & Toporov make only 6, treating the dative as a variant of the genitive (page 76).

Traditional grammarians start from the following default endings, proceeding to deal with modifications and replacements for particular groups of nouns, or even individual nouns.

Singular Plural
Nominative [no ending] -yo
Accusative -aṃ -yo
Genitive -sa -naṃ
Dative -sa -naṃ
Instrumental -nā -hi
Ablative -smā -hi
Locative -smiṃ -su


  1. The empty ending for nominative singular is represented by the non-existent "ending" *-si (the asterisk is a standard scholarly indication of non-existence, or at least non-attestation), acting as a code symbol, a place-holder, to indicate that, by default, no ending is added to the stem.
  2. The cases are rearranged here into a more sensible order (see above).
  3. Genitive and dative are often, perhaps usually, the same, but not always.
  4. The above forms are all common in the Canon, though not all necessarily the commonest.
  5. Although the above endings are all individually treated as defaults, they do not constitute a default declension collectively. There is in fact no single noun that declines exactly as above, even optionally.

That is the traditional approach. In contrast, Western grammarians ignore this process of developing declensions by modification/replacement of defaults, and instead concentrate on the end-product, classifying into a number of declensions of nouns similarly declined, following the tradition of Latin (five declensions) and Greek (three) grammar. Unlike those languages, there seems to be no standard classification: Mason gives three declensions, Oberlies five,[16] Geiger six, Clough fifteen. Oberlies' table includes alternative forms: in one instance as many as seven possibilities are given. He also mentions in the text a few irregular nouns.

Adjectives usually follow different noun declensions for different genders. Comparatives and superlatives usually add endings -tara and -tama, respectively.

Pronouns tend to be more or less irregular.


There are two numbers as above and three persons, known in Western grammatical terminology as 1st, 2nd and 3rd, but traditionally as last, middle and first respectively. Traditional grammars, followed by Clough and d'Alwis, give eight tenses: present, imperative, optative, perfect, imperfect, aorist, future and conditional. Mason, following Western grammatical concepts, classifies imperative, optative and conditional as moods, not tenses, with the others counted in the indicative mood. Elizarenkova & Toporov do likewise, but also amalgamate aorist, imperfect and perfect into just one past or preterite tense. Oberlies does this latter too. Frankfurter and Geiger combine imperfect and aorist into a single tense, but leave the perfect distinct. Kaccāyana's table of endings below has what Müller calls transitive and intransitive voices, while Duroiselle calls them causative and reflective voices. Mason calls them active and middle voices, but adds a passive voice. Frankfurter includes passive in "derivative conjugation", along with intensive/frequentative, desiderative, causative and denominative, rather than counting them as voices. Clough and d'Alwis speak of active, passive and substantive voices. Elizarenkova & Toporov say there are active and middle voices from one point of view, active and passive from another.

Kaccāyana gives the following default endings for verbs (he does not try to give complete coverage of all alternative forms, noting near the end of the chapter on verbs that there are other alternative forms he does not cover):

1st sing 2nd sing 3rd sing 1st pl 2nd pl 3rd pl
Present active -mi -si -ti -ma -tha -anti
Perfect ,, (*)-aṃ *-e -a (*)-mha (*)-ttha -u
Imperfect ,, -aṃ (*)-o *-mhā -ttha (*)-ū
Aorist ,, -iṃ -o -mhā -ttha -uṃ
Future ,, -ssāmi -ssasi -ssati -ssāma -ssatha -ssanti
Optative ,, -eyyāmi -eyyāsi -eyya -eyyāma -eyyātha -eyyuṃ
Conditional ,, -ssaṃ *-sse *-ssā *-ssāmhā *-ssatha -ssaṃsu
Imperative ,, -mi -hi -tu -ma -tha -antu
Present middle -e -se -te *-mhe *-vhe -ante
Perfect ,, (*)-iṃ (*)-ttho (*)-ttha *-mhe (*)-vho *-re
Imperfect ,, (*)-iṃ *-se (*)-ttha -mhase *-vhaṃ *-tthuṃ
Aorist ,, (*)-aṃ *-se (*)-ā *-mhe *-vhaṃ (*)-ū
Future ,, (*)-ssaṃ -ssase -ssate *-ssāmhe *-ssavhe *-ssante
Optative ,, (*)-eyyaṃ -etho -etha *-eyyāmhe *-eyyāvho *-eraṃ
Conditional ,, *-ssiṃ *-ssase -ssatha *-ssāmhase *-ssavhe *-ssisu
Imperative ,, *-e -ssu -taṃ -āmase -vho [*]-antaṃ


  1. Moggallāna gives different defaults at a few points.
  2. Not all the above forms are the only ones, or the commonest in the Canon, or found there at all. Here, * indicates endings Oberlies doesn't find in the Canon, (*) those he interprets differently, and [*] one he doesn't find, but suggests as an emendation in one place. These endings may or may not be found in non-canonical literature either before or after their listing by grammarians.
  3. In addition to the endings, the augment a- is by default added before the verb stem but after any prefix(es), in the imperfect, aorist and conditional.

As with the declension of nouns, Western grammarians concentrate on the end product, classifying verbs into different conjugations, like the four in Latin. Geiger and Oberlies[17] give two main conjugations of verbs, d'Alwis,[18] Duroiselle and Warder seven, Clough and Mason eight (but Mason, while following his source in giving 8, adds his own view that most are really just irregular verbs). Oberlies also gives detailed accounts of six irregular verbs, and there are also defective verbs, with incomplete conjugations. Whereas the differences between declensions of nouns are a matter of endings, those between conjugations of verbs are mainly about stems. There are three different stems for a regular verb: perfect, aorist and the rest. The relation among these varies with conjugations.

The passive voice is usually formed with the ending -ya-, -iya- or -īya-. In contrast to Sanskrit, Pali usually gives it active endings in practice, not middle ones as given by default in traditional grammars.

The causative is usually formed with the ending -aya- or -e-, often preceded by -p- or -āp-, and with stem vowel strengthened: a > ā, i > e, u > o.

Such formations can be iterated, giving causative passives and so on. Occasionally there are even double causatives: ruhati, [e.g. a plant] grows; ropeti, [someone] causes [it] to grow, i.e. plants [it]; ropāpeti, [someone] gets [someone else] to plant [it]

The infinitive most often ends in -(i)tuṃ (cognate with Latin supine), and the absolutive (sometimes called gerund, though it has little in common with the Latin gerund) in -(i)tvā.

There are present, past and future participles active and passive, though the future participle passive is perhaps more often called gerundive.


The main uses of the cases are as follows:

  • Nominative is used for the subject of a verb, and for the complement of the verb "to be", which is often omitted.
  • Vocative is used as a form of address.
  • Accusative is used for the object of a transitive verb.
  • Genitive means "of".
  • Dative means "to/for".
  • Instrumental means "by/with".
  • Ablative means "from".
  • Locative means "in/on/at".

The "plural of majesty" is sometimes used, not just by kings but also by religious figures, including the Buddha.

As in other gendered languages, grammatical gender sometimes disagrees with "natural" gender.

Comparative and superlative are sometimes used in place of each other.

The verb usually comes at the end of a sentence. The main uses of verb forms are as follows:

  • Present, in addition to its natural meaning, is often used as "historical present" to indicate continuous past as distinct from momentary past.
  • The theoretical meanings of the three past tenses are indicated by their Pali names:
    • aorist is ajjatanī, from ajja, today;
    • imperfect is hiyyattanī, from hiyyo, yesterday
    • perfect is parokkha, remote (more literally, "beyond sight");
    • in practice they are used pretty indiscriminately.
  • Optative means "should/would/might".
  • Conditional is used for counterfactuals.
  • Negative commands are indicated by the particle mā, usually with the aorist tense rather than the imperative.
  • The theoretical meanings of the active and middle voices are indicated by their Pali names:
    • active is parassapada, from parassa, "for another"
    • middle is attanopada, from attano, "for oneself";
    • in practice they are used pretty indiscriminately.
  • The absolutive means "having done".
  • The gerundive is usually used in that sense, "should be done", rather than the literal future participle passive sense, "will be done".


Warder distinguishes between the canonical language and the later language. Geiger distinguishes four phases:

  1. canonical verse (note that Geiger means this is the oldest in terms of language, though he recognizes some verse books as late; in other words the verse is archaic)
  2. canonical prose
  3. post-canonical prose, including Milindapañha; he does not mention Netti and Peṭakopadesa either way, but Oberlies considers their language sufficiently close to the canonical to include them in his study
  4. post-canonical verse

He says 2 is closer to 3 than to 1, which appears to clash with Oberlies' and Warder's treatments of canonical Pali as a single entity. Possibly there are differences in emphases on various aspects of the language.

As in English, there are spelling variations between countries.


  1. Maurice Walshe (1996). The Long Discourses of the Buddha: A Translation of the Dīgha Nikāya by Maurice Walshe, 1st Edition. Wisdom Publications. ISBN 0-8617-1103-3.  p. 17
  2. A iv 144, PTS
  3. Crosby, Kate. "The Origin of Pāli as a Language Name in Medieval Theravāda Literature", Journal of the Centre for Buddhist Studies, Sri Lanka, II (January), 70–116.
  4. [1], 118-128
  5. in Brill's Encyclopedia of Buddhism, volume I, 2015
  6. Pali Grammar, Pali Text Society, volume 1, 2019
  7. Many, if not most, survive now only in Chinese and/or Tibetan translations.
  8. Buddhism and Pali, Mud Pie Slices, 2018
  9. page cxxxv
  10. volume I, pages 53f
  11. op. cit. page 55
  12. page 2
  13. Introduction to Pali, 1963, Pali Text Society, pages 1-4
  14. Müller, page 3
  15. Oberlies, page 209
  16. volume I, pages 211-3
  17. Volume I, page 322
  18. page 23