Unicode display problems?

As of Jan 1, 2013, this page is no longer being maintained.

Non-Latin script problems?
Multi-lingual web pages and Unicode

Non Latin Characters

Definitions * Assistance: Introduction * Assistance: Steps * Acknowledgements

Are you getting ???? or blank rectangles or question marks in black diamonds or Yíäýñíèé or other mojibake instead of the correct text for some languages? It's probably because your computer system can't display all our Unicode correctly. The good news is that most Unicode display problems can be fixed.


back to top How do I fix Unicode display problems on my computer?

To display text in many different alphabets on one web page (e.g. Languages A-Z), we use Unicode, even though Unicode can create display problems for some computer systems. This web page offers solutions for those problems.

It may be that to "see" everything correctly on our Unicode pages, you only need to upgrade your browser and install, at most, one font, Code2000. Basically, you need:

  1. a Unicode compatible operating system (see Assistance: Introduction);
  2. a Unicode enabled browser (Assistance: Step 1); and
  3. Unicode-compatible font(s) (Assistance: Step 2);

    and then (depending on which languages you want to display) you may need to:

  4. configure your browser (Assistance: Step 3 and Assistance: Step 4).

See also Display Problems? on the Unicode site and Help: Multilingual support on Wikipedia.



Definitions * Assistance: Introduction * Assistance: Steps * Acknowledgements


back to top Definitions: What is Unicode? * Encoding * Code * Language Script * Font

Unicode logo
The Unicode (R) Consortium is a registered trademark, and Unicode (TM) is a trademark of Unicode, Inc.
  • What is Unicode? It is one of several systems (called encodings) that have been developed to manage the display of characters on-screen, but it is the first system that can assign a unique number (code) to every character in each of the world's major languages. (Other systems don't allow for enough characters and they also conflict with one another. That is, two encodings might use the same number for two different characters, or use different numbers for the same character.) Not all computer systems in current use are fully Unicode compatible.

    Windows 7 comes with full support for Unicode (whether you're using using Firefox, Opera, Chrome, Safari or Internet Explorer), and Mac OS X 10.7 (Lion) is not far behind.

  • Encoding: a system of assigning numbers to characters (i.e. letters, punctuation, and mathematical notations) so a computer knows which character to display. Hundreds of different systems (encodings) have been developed and used. Unicode is one of them. Here are examples of how encodings are specified in the head of an html page:

    • charset=iso-8859-1 (for Western No.1),
    • charset=BIG5 (for Traditional Chinese), and
    • charset=utf-8 (for Unicode).

  • Code: the number assigned to the character. Problems happen when different encodings use the same code for two different characters, or use different codes for the same character. Synonyms for "code" that are also in use: code position, code number, code value, code element, code set value.

  • Language Script: the group of characters used to express a language in writing. Also called the "character set" or "character repertoire" or "alphabet" or "writing system" of a language.

  • Font: the font determines the way a character will actually look on the screen (or on a printed page). For instance, this "A" in a sans-serif font looks different than this "A" in a serif font, but it is still the same character. (The "A" and the "A" are known as different "glyphs" of the same character. A font is basically a collection of glyphs. Also note that "A" and "a" are two different characters.)

    Most fonts don't come close to containing all possible characters in the world—instead they contain ranges (also called "blocks") of characters (e.g. in Unicode, the codes (i.e. numbers) for Arabic characters are found in the range of 0660 to 06FF). Unicode currently defines over 100 ranges, and for example, the newest, Unicode-compatible versions of:

    • Arial (with 2792 characters and 3381 glyphs) and
    • Times New Roman (with 2790 characters and 3380 glyphs),

    contain only 39 ranges, while the:

    • Akaash font (409 characters; 642 glyphs), specifically for Bengali,

    is also Unicode-compatible, yet contains only 4 ranges: Basic Latin; Latin-1 Supplement; Latin Extended-A; and Bengali.

  • NOTE: "language script" and "range" are sometimes synonymous, but some languages require characters from more than one range and even non-contiguous ranges (e.g. Vietnamese, and especially CJK (Chinese-Japanese-Korean). CJK ideographs now encompass at least three ranges in two separate "planes" of Unicode.

  • For more information, see also:
back to Definitions


back to top Assistance: Introduction

Because Unicode is a relatively recent development that wasn't consistently utilized within the wide range of operating systems that surfers have used over the years, i.e.:

      • Windows 95/98/ME/NT/2000/XP/Vista/Windows 7,
      • Mac OS 8/OS 9/OS X,
      • Linux (various versions), etc.,

or within the wide range of browsers (and browser versions):

      • FireFox,
      • Internet Explorer,
      • Opera,
      • Mozilla,
      • Netscape,
      • Chrome,
      • Safari, etc.,

not all computer systems are currently fully Unicode compatible.

  • Windows 7 comes with full support for Unicode, including fonts, whether you're using Firefox, Opera, Chrome, Safari or Internet Explorer, and Mac OS X 10.7 (Lion) is not far behind.

  • Some Unicode support has been included in Mac OS since Mac OS 8.5, but prior to Mac OS X (10) only limited use was been made of it by applications.

  • Windows NT/2000/XP/Vista are based on Unicode, and some Unicode support has been included in Microsoft Windows since Windows 95.

  • I've never used Linux (well, except for that one time).

If you have display problems with some of the links and/or text on our pages, you can try the steps set out below. My intention is to bring together, in one place, useful information I found when I was trying to figure out how to fix my own display problems, and to make that information as easy to understand as possible. Do keep in mind, though, that you don't have to understand everything here in order to get the hoped for results from carrying out the steps. Again, it may turn out that to "see" everything on our pages correctly, you may only need to upgrade your browser and install, at most, one font.

The suggestions I offer come from my experience using the following browsers and operating systems:

  • with a Windows 7 operating system, I've used:
    • FireFox 11 & 12
    • Opera 11.6
    • Chrome 18
    • Safari 5.1, and
    • Internet Explorer (IE) 9.

  • with a Windows XP operating system, I've used:
    • Firefox 2 to 11
    • Netscape 7 & 8
    • Mozilla 1.5 & 1.7
    • Opera 7 to 11.6
    • Chrome 5 & 18
    • Safari 3, 4 & 5, and
    • Internet Explorer (IE) 6 to 9.

  • with a Windows 98 operating system, I've used:
    • Netscape 4.79 & 7
    • Mozilla 1.2.1 and 1.3b, and
    • Internet Explorer (IE) 5.5.

(I think some of my suggestions could be useful for those with Windows 2000, NT 4 and Vista, and maybe even Windows 95.)

Because I only do Windows, the best I can offer those with other operating systems is to send you off-site to:

although some of what I say below may be applicable.

back to Assistance: Introduction


back to top Assistance: Step-by-step


Step 0: You need a a Unicode compatible operating system (see Introduction above for information)
Step 1: Selecting a browser
Step 2: Obtaining Unicode compatible fonts
Step 3: Configuring your browser by selecting fonts
Step 4: Configuring your browser by selecting encodings

NOTE: Most encodings are still used somewhere on the web, and these steps can be applied to all encodings, not just Unicode. However, if you are interested in viewing pages in a different encoding, such as Big5 (for Traditional Chinese) for example, in Step 2 you would need to make sure you had Big5-compatible fonts, rather than Unicode-compatible fonts.

back to Step-by-Step

back to top Step 1: Selecting a browser


NOTE: For the greatest success, upgrade your browser to the latest version: IE, FireFox, Opera, Safari, Chrome.

After I went through everything in Steps 1 to 4 above, and then browsed the Unicoded HotPeachPages and EarthWords pages:

  • Actually, with Windows 7, I didn't have to go through any of the steps above, because Windows 7 comes with full support for Unicode, including fonts!! (except maybe Myanmar languages). All I had to do was turn on my new Windows 7 computer, use the built-in Internet Explorer to download the other browsers I'm reporting on, and then with:

    • Firefox 11
    • Opera 11.6
    • Chrome 18
    • Safari 5.1.4 and
    • Internet Explorer 9,

    I did not have any character display problems at all (and you shouldn't either, except maybe Myanmar languages).

  • On Windows XP, FireFox 2 to 11, Opera 8, 9 & 10 and Netscape 7 & 8, displayed everything pretty much correctly. (Conjuncts & re-ordering for Khmer didn't work properly until I installed KhmerUnicode2 (for Window XP) on April 22/10.)

  • IE 5.5 (Win 98), and 6, 7 & 8 (Win XP) displayed everything pretty much correctly. (Again conjuncts & re-ordering for Khmer didn't work properly until I installed KhmerUnicode2 (for Window XP) on April 22/10.)
  • Caveat: On Win XP, for the HTML <title> attribute, IE 8 displays empty rectangles (blank rectangles) for Amharic, Sinhala and Tigrigna (even though the text for the link itself displays fine), whereas Moz-based browsers and Opera display the title text correctly. To see if you have the same issue, go to Domestic violence is more than just physical abuse using IE, and hover over the Amharic and Tigrigna language links at the top of the page to make the title boxes pop-up. Let me know by email if you know how to fix it, or if you don't even have the issue in IE 8.
  • On Windows XP, Chrome 5 to 18 didn't display Sinhala and had the same issue described above for IE in the Caveat. Everything else was pretty much fine.

  • Netscape 7 and Mozilla 1.2 & 1.3 (Win 98), and Mozilla 1.5 & 1.7 and Opera 7.2 & 7.5 (Win XP) all displayed Arabic and Hebrew correctly right-to-left, but didn't produce conjuncts or re-ordering for Indic scripts.

  • Netscape 4.79 (Win 98) and Opera 7.1 (Win XP) displayed Arabic and Hebrew left-to-right, i.e. incorrectly, and didn't produce conjuncts or re-ordering for Indic scripts.

  • Safari:

For more information about these and other browsers, go to:

back to Step-by-Step

back to top Step 2: Obtaining Unicode compatible fonts

Make sure you have a Unicode-compatible font for either all the Unicode ranges, or for each of the language scripts you want to be able to display.

NOTE: To see what fonts you already have in your system, look in your Control Panel under Fonts. This will also give you the address of your FONTS file for when you want to intall a new font.


  • Easiest: If you have either of the two currently available universal fonts:

    • Arial Unicode MS (with almost 39,000 characters and over 50,000 glyphs in 65 ranges) was originally supplied through Microsoft Office 2000 and later, FrontPage 2000 and later, and Publisher 2002 and later, and was bundled with Mac OS X v10.5 and later. Now it is supplied with Windows 7. If you don't have any of these products, Arial Unicode MS can be purchased from Ascender Corporation, which licenses it from Microsoft,

      OR

    • Code2000* (over 50,000 characters and 60,000 glyphs in 105 ranges) is a free download, $5 honour-system registration,

    you should be OK for most languages on our pages. In other words, to "see" everything on our pages, as I've said, you only need to upgrade your browser and install, at most, Code2000. Easy. (And the reason it's so easy, and inexpensive, is because James Kass worked on Code2000 for years as a labour of love and then basically gifted it to the world. James, you rock!)

    *Note: Code2000 is OK in a pinch but not recommended for Chinese Simplified or Traditional, or for Japanese, and Arial Unicode MS is not OK for Lao (as of Office XP), but anyone who can read them probably already has appropriate fonts on their computer.


  • Extra work: Because fonts designed for just one particular language script often present that script better than fonts that display several scripts, you may want to download further specific Unicode-compatible fonts for certain languages. On our EarthWords pages, for instance, we code a preference for the following fonts:


    and we leave the rest up to the user's choices in Step 3, for which you need at least:

    • Arial Unicode MS (again, originally supplied through Microsoft Office 2000 and later, FrontPage 2000 and later, and Publisher 2002 and later, and bundled with Mac OS X v10.5 and later. Now it is supplied with Windows 7. If you don't have any of these products, Arial Unicode MS can be purchased from Ascender Corporation, which licenses it from Microsoft,) OR

    • Code2000 (again, free download, $5 honour-system registration).

    In other words, to "see" everything on our pages almost exactly the way we intended, you only need to upgrade your browser and install, at most, five or six fonts. No big deal.


  • Maximum effort: Because sites other than ours will prompt for fonts other than those mentioned above, you may want to download a whole whack of fonts. I suggest starting at Alan Wood's Unicode Resources*.

    *NOTE: even though this page of Alan's is entitled "Unicode Fonts for Windows computers", it also has links for Mac and Unix.

    *ALSO: Raghindi (listed on Alan's page under Devanagari Fonts) has been known to cause a conflict with other fonts on Windows 9x, including Code2000. It seems that many fonts produced for Windows 2000-and-up lack the ASCII characters required for backwards compatibility on earlier versions of Windows. Installing such fonts on Win 9x is not recommended, as they have a tendency to "take over" the system. The Raghindi is the only one I know about, but apparently there are others.

back to Step-by-Step

back to top Step 3: Configuring your browser by selecting fonts

This is where you can choose a font for each language (aka writing system aka language script), but most languages are displayed fine with the default font your browser has chosen, so really, you only need to go in there if you don't like the default font for a particular language, or if a particular language is not displaying correctly with the default font. Here's where you go to select fonts for various browswers:

  • IE: Tools > Internet Options > Fonts > Language script

  • Opera: Tools/Settings > Preferences > Advanced > Fonts > International Fonts > Writing system

  • Firefox: Tools > Options > Content > Advanced (Fonts & Colors) > Fonts for

  • Netscape 8: Tools > Options > Browser Options, General > Fonts & Colors > Fonts for

  • Safari: Edit > Preferences > ?

This step reveals a significant difference between Mozilla-based browsers (FireFox, Netscape and Mozilla) on the one hand, and IE (& Opera) on the other:

  1. for any particular language, IE and Opera have you choose only from fonts that will work with that language (usually no more than 10 will be on the list on my system).

  2. for every language, Mozilla browsers give you every font on your system to choose from (hundreds on mine), and if you have no idea what you are looking for, you'll be lost.

So what I do is, I use IE to see which fonts work with a particular language, and then I know what to look for in Firefox. If there's no font listed for a particular language, and it isn't displaying correctly, you have to go back to Step 2: Obtaining Unicode compatible fonts.

Alan Wood offers directions for configuring various browsers (not the latest versions, but probably still helpful) at Unicode and Multilingual Web Browsers. To help with the decisions about which fonts to choose for what, the following chart sets out font options for Netscape encodings and for IE language scripts that should work (it's very outdated, but I just can't bring myself to delete it.)

Chart adapted from Yale University Library Workstation Support Group
(all fonts listed below should be available at Alan Wood's Unicode Resources)
Netscape (4.x and up) Font Options
IE (5.5/6) Font Options
Encoding Variable width
font
Fixed width
font
Western
(ISO-8859-1)
(any number of options) (any number of options)
Central European (ISO-8859-2)
(Windows-1250
Bitstream Cyberbit, Times New Roman Courier New
Japanese
(Auto-Detect)
(Shift-JIS)
(EUC-JP)
Arial Unicode MS, MS Gothic Arial Unicode MS, MS Gothic
Traditional Chinese
(Big5)
(EUC-TW)
Arial Unicode MS, MingLiU Arial Unicode MS, MingLiU
Simplified Chinese
(GB2312)
Arial Unicode MS, MS Song Arial Unicode MS, MS Hei
Korean
(Auto-Detect)
Arial Unicode MS, Code2000, GulimChe Arial Unicode MS, Code2000, GulimChe
Cyrillic
(KOI8-R)
(ISO8859-5)
(Windows-1251)
(CP866)
Arial Unicode MS, Code2000, Times New Roman Courier New
Baltic
(ISO-8859-4)
(Windows-1257)
Arial Unicode MS, Code2000, Times New Roman Courier New
Greek
ISO-8859-7)
(Windows-1253)
Arial Unicode MS, Code2000, Times New Roman Courier New
Turkish
(ISO-8859-9)
Arial Unicode MS, Bitstream Cyberbit, Code2000, Times New Roman Courier New
Unicode
(UTF-8)
(UTF-7)
Arial Unicode MS, Code2000 Arial Unicode MS, Code2000
UserDefined Arial Unicode MS, Code2000 Courier New, Courier New Baltic
Language
script
Web page
font
Plain text
font
Arabic Arabic Transparent , Arial Unicode MS, Bitstream Cyberbit , Tahoma, Traditional Arabic & ...  
Armenian Arial Unicode MS, Code2000  
Bengali Akaash, Arial Unicode MS, Code2000  
Braille Code2000  
Burmese    
CanSyllabic Aboriginal Serif, Aboriginal Serif Unicode, Ballymun RO, Code2000  
Cherokee Aboriginal Serif, Code2000  
Chinese Simplified Arial Unicode MS, Bitstream Cyberbit, MS Hei, MS Song, simSun-18030 MS Hei, MS Song
Chinese Traditional Arial Unicode MS, Bitstream Cyberbit, MingLiU MingLiU
Cyrillic Times New Roman & ... Courier New , Andale Mono, Lucida Console
Devanagari Alpha-demo, Arial Unicode MS, Code2000, shiDeva  
Ethiopic Code2000, Ethiopia Jiret, GF Zemen Unicode, TITUS Cyberbit Basic Ethiopia Jiret
Georgian Arial Unicode MS, Code2000, TITUS Cyberbit Basic  
Greek Times New Roman & ... Courier New Andale Mono Lucida Console
Gujarati Arial Unicode MS, Code2000, Shruti  
Gumukhi Arial Unicode MS, Code2000. Raavi  
Hebrew David, Miriam & ... Mirian Fixed Fixed Miriam Transparent Rod
Japanese Arial Unicode MS, Bitstream Cyberbit, MS Gothic, MS Mincho MS Gothic, MS Mincho
Kannada Arial Unicode MS, Code2000. Tunga  
Khmer Code2000, Khmer OS  
Korean Arial Unicode MS, Batang, Bitstream Cyberbit, Code2000, GulimChe GulimChe
Lao Saysettha Unicode, Saysettha OT, VangVieng Unicode, XiengThong Unicode, Alice5 Unicode, Alice3 Unicode, Alice4 Unicode, Alice0 Unicode, Alice1 Unicode, Alice2 Unicode  
Latin based (any number of options) Courier New ....
Malayalam Arial Unicode MS, Code2000, Kartika  
Mongolian Code2000 (?)  
Ogham Code2000, TITUS Cyberbit Basic  
Orriya Arial Unicode MS, Code2000  
Runic Abiriginal Serif Unicode, Code2000, TITUS Cyberbit Basic  
Sinhala Dinamina, Potha  
Syriac Code2000, Estrangelo Edessa, TITUS Cyberbit Basic  
Tamil Arial Unicode MS, Code2000, Latha, TabAvarangal2  
Telugu Arial Unicode MS, Code2000, Gautami  
Thaana Code2000, Mv Boli, TITUS Cyberbit Basic  
Thai Cordia New, Angsana New, Arial Unicode MS, Bitstream Cyberbit, Code2000, IrisUPC, Microsoft Sans Serif, Saysettha OT, Tahoma Courier Mono Thai
Tibetan Arial Unicode MS, NSimSun-18030, SimSun-18030  
UserDefined Arial Unicode MS & all Courier New ALA ...
Yi Code2000, NSimSun-18030, SimSun-18030  

back to Step-by-Step

back to top Step 4: Configuring your browser by selecting the right encoding

You really only need to do this if the text on a page is gibberish. When that happens, the first thing you want to check is what encoding your browser is using. It may need changing. It's quite easy to check and even change encoding. Just click on 'View' on the top menu bar of any browser and then click:

  • 'Character Encoding' for the Mozilla browsers (Firefox, Netscape and Mozilla);
  • 'Encoding' for IE and Opera; and
  • 'Text Encoding' for Safari

The encoding with the dot or check mark is the one being used. You can take an educated guess as to what you should change to depending what language the gibberish is supposed to be. For example, choose one of the Japanese enclodings if the gibberish is supposed to be Japanese. Then just keep choosing till it works.

More detailed directions on how to select encodings for various versions of different browsers (again, not the latest versions, but probably still helpful) can be found on the same pages where the directions for Step 3 are located (i.e. go to Alan Wood's Unicode and Multilingual Web Browsers, click on a browser, then scroll down to the end of the instructions for selecting Fonts till you see the instructions for Encodings).

back to Step-by-Step


back to top Acknowledgements

Thank you especially to James Kass (last archive copy of James Kass' website on the Wayback Machine), Jukka "Yucca" Korpela and Alan Wood. Were it not for their work and excellent material freely available on the web (and James Kass' generous help and suggestions), I would understand very little about encoding systems, or about Unicode and how to use it, and the above would not exist.

1We are not responsible for the quality of the resources listed. They are provided as a reference only.

world burst logo



every country · every shelter · every tongue