With the exception of the Thaana script of the Maldives, all scripts of India are derived from one Brahmi script. These scripts are syllabic alphabets in that they consists of consonant symbols with vowel signs. Encoding these languages has been a challenge for several reasons:
These facts combine to make font development technically difficult, and many companies have not been motivated to work on them until recent years.
Fortunately, there has been progress made in recent years, especially for the most common scripts of Devanagari, Gujarati and Gurmuhki. Many freeware fonts and utilities are also available. See the list of scripts below to find more details.
Because of the complex placement of vowel signs for these languages, Unicode fonts are not interchangeable between platforms. OTF fonts work in Windows, but not perfectly in OS X. Apple fonts use ATSUI technology instead. It is better to use South Asian fonts from Microsoft and Apple whenever possible.
Windows SupportWindows XP Supports
Windows XP Service Pack Two Adds
Windows Vista Adds
|
Macintosh SupportMacintosh Supports
System 10.4 (Tiger) Adds
Freeware Utilites are Available forX11 Unix Environment
|
Linux/Unix
|
Encoding: utf-8 (Unicode) , ISCII (older), ITRANS (older)
Use Unicode to develop new pages.
One option is to use FrontPage, Netscape/Mozilla Composer or Dreamweaver and change the keyboard to the correct script. Make sure you specify the encoding in the Web page header.
Another option is to compose the basic text in an international or foreign languags text editor or word processor and export the content as an HTML or text file with the appropriate encoding. This file could be opened in another HTML editor such as FrontPage or Dreamweaver an edited for formatting.
For short texts, such as the yoga om sign (ॐ = ॐ), it may be desirable to use Unicode Entity codes and enter HTML entity codes.
Before the development of Unicode encoding, the government of India had developed a standard called ISCII (Indian Script Code for Information Interchange). In this standard similar characters in multiple scripts would be assigned the same character number. For instance Devanagari क (ka) and Gujarati ક (ka) would be assigned the same code point. However, most modern development is in Unicode.
Computers process text by assuming a certain encoding or a system of matching electronic data with visual text characters. Whenever you develop a Web site you need to make sure the proper encoding is specified in the header tags; otherwise the browser may default to U.S. settings and not display the text properly.
To declare an encoding, insert or inspect the following meta-tag at the top of your HTML file, then replace "???" with one of the encoding codes listed above. If you are not sure, use utf-8 as the encoding.
Generic Encoding Template
<head>
<meta http-equiv="Content-Type" content="text/html; charset=??? ">
...
<head>Declare Unicode
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8 ">
...
<head>
The final close slash must be included after the final quote mark in the encoding header tag if you are using XHTML
Declare Unicode in XHTML
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
...
<head>
If no encoding is declared, then the browser uses the default setting, which in the U.S. is typically Latin-1. In that case many Unicode characters could be displayed incorrectly. Also, older browsers such as Netscape 4.7 may not be able to process the entity codes correctly without the "utf-8" declaration.
Language tags are also suggested so that search engines and screen readers parse the language of a page. These are meta data tags which indicate the page of a language, not devices to trigger translation. Visit the Language Tag page to view information on where to insert it.
In some cases, your best options may be to use PDF files or image files. See the Web Development Tips section for more details.
These pages cover internationalization of South Asian scripts in general.
See also
