Desktop Publishing: The Complexities of Typesetting Languages

Typography takes texts to a whole new level, providing visual meaning on top of the literal meaning of the words. Through formatting and design, we can show nuances such as the tone of the message, emphasize certain points, or even project the character of the speaker.

Formatting Western languages is relatively easy and manageable. But, for languages that do not use Roman characters, different rules of typesetting are involved. These qualify as complex text layout languages.

Complex text layout (CTL) or complex text rendering refers to the typesetting of different writing systems where the shape or position of a character changes depending on its neighboring characters or the addition of special symbols. As such, proper storage and character display for data processing is necessary.

We listed below some of the most notoriously challenging languages to format.


The Arabic language follows the Abjad writing system where each symbol is primarily comprised of a consonant and the vowels are not necessarily included when writing. The Arabic script combines Arabic and Latin characters, which makes it bidirectional. When typing, the general flow of text is from right to left. Even as you type Roman characters, the cursor will appear on the left of the letter, but they will still be read from left to right. As the Arabic script is cursive in nature, the appearance of a symbol when isolated changes once it is combined with other symbols. The script primarily relies on this context-sensitive shaping and the formation of ligatures.

N’Ko is a writing system used for the Manding languages of West Africa. It follows an alphabet system where there is a one-to-one correspondence between a sound and a symbol. It’s also written in the right-to-left direction, and the letters in a word are connected at the bottom. One of the notable features of N’Ko is its number system. The numbers are also read and written from right to left. For example, “one hundred” is written as “001”. By nature, N’Ko is a tone-dependent language, so diacritic marks are used to indicate the tonal value of a letter.

The Thai script is an Abugida or Alphasyllabary, which is defined by as a system where “consonants each have an inherent vowel that can be changed to another vowel or muted through diacritics or other modifications.” Diacritics are heavily used in writing to serve as a pronunciation guide. The complexity lies with the strict interaction rules of these symbols. The texts are written from left to right, with no spaces between words. The spaces used serve as punctuations or indicate the end of a sentence. Hence, you cannot cut anywhere in the middle of a cluster.

With the complexities of different writing systems, there's a lot that could easily go wrong if you use a word processor that does not support the technical needs a certain language requires.