Skip Header
Teaching and Learning with Technology
Computing With Accents and Foreign Scripts
TLT Home : TLT Suggestions Skip Menu

Declare the Encoding

If you create a Web site, it is good practice to declare the encoding. Properly encoded Web pages declare the encoding to a browser through a meta tag in the header. Without this tag, a browser may not know to switch to the proper encoding and characters may be displayed as gibberish.

Some example declaratios for common encodings are given below. If you are not sure which encoding system to declare, you may want to refer to the individual By Language Page or look at which system is declared in other Web sites written in the language.

Top of Page

Sample Encoding Declarations

Unicode | Latin 1 | Other

Unicode (Any Language)

The encoding meta tag is placed in the header. The encoding tage (e.g. utf-8 for Unicode) is declared after charset= specification at the end of the tag.

<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
...
</head>

XHTML

There are two tags - the encoding attribute in the initial XML tag and the charset meta tag (with a final slash). Both tags should be included for cross-browser compatability.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<meta http-equiv="content-type" content="application/xhtml+xml; charset=UTF-8" />
...
</head>

Note: These tags should be included even though XML is theoretically Unicode by default. Not all browsers will parse a page as Unicode unless the meta tag is present.

Latin 1 (English, Spanish, French, German, etc.)

<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
...
<head>

NOTE: It IS good practice to declare the encoding even for an English Web site. One function of this is to tag is to "reset" the user's browser back to Latin-1 and ensure proper font settings. The Unicode "utf-8" encoding also ensures that any special characters inserted such as "Smart quotes", currency symbols, em-dashes and so forth will be properly displayed in most browsers.

Other Scripts (e.g. Windows-1251 for Cyrillic)

<head>
<meta http-equiv="Content-Type" content="text/html; charset=windows-1251">
...
<head>

See the individual By Language Page for other encodings or go to pages in your script and go to the View Source window to see which encodings are generally used.

If no encoding is declared, then the browser uses the default setting, which in the U.S. is typically Latin-1. If the page is actually in some other script, but no encoding specified, the browser will use a Roman alphabet font and display gibberish.

Top of Page

©Penn State University, 2000-2007.
This Web page maintained by Teaching and Learning with Technology, a unit of Information Technology Services. For questions or comments on this Web page, please contact Elizabeth J. Pyatt (ejp10@psu.edu).
Unicode character names and hexadecimal entity codes are taken from the public Unicode Character Charts.

Last Modified: Thursday, 26-Jul-2007 16:12:26 EDT