YouTip LogoYouTip

Ref Html Utf8

# HTML Unicode (UTF-8) Reference Manual * * * ## Unicode Consortium The Unicode Consortium developed the Unicode Standard. Their goal is to replace existing character sets with the standard Unicode Transformation Format (UTF). The Unicode Standard is a successful initiative, implemented in HTML, XML, Java, JavaScript, E-mail, ASP, PHP, and more. The Unicode Standard is also supported by many operating systems and all modern browsers. The Unicode Consortium collaborates with leading standards development organizations, including ISO, W3C, and ECMA. * * * ## Unicode Character Sets Unicode can be implemented by different character sets. The most common encodings are UTF-8 and UTF-16: | Character Set | Description | | --- | --- | | UTF-8 | Characters in UTF8 can be from 1 to 4 bytes long. UTF-8 can represent any character in the Unicode standard. UTF-8 is backward compatible with ASCII. UTF-8 is the preferred encoding for e-mail and web pages. | | UTF-16 | The 16-bit Unicode Transformation Format is a variable-length Unicode character encoding capable of encoding the entire Unicode codepage. UTF-16 is primarily used in operating systems and environments like Microsoft Windows, Java, and .NET. | **Note:** The first 128 characters of Unicode (which correspond one-to-one with ASCII) are encoded using an octet with the same binary value as ASCII, making valid ASCII text valid UTF-8 encoded text as well. **Note:** All HTML 4 processors support UTF-8, and all HTML 5 and XML processors support UTF-8 and UTF-16! * * * ## HTML5 Standard: Unicode UTF-8 Because the character set size in ISO-8859 is limited and incompatible in multilingual environments, the Unicode Consortium developed the Unicode Standard. The Unicode Standard covers (almost) all characters, punctuation, and symbols. Unicode makes text processing, storage, and transport independent of platform and language. **The default character encoding in HTML-5 is UTF-8.** Below are some UTF-8 character sets supported by HTML5: | Character Set | Decimal | Hexadecimal | | --- | --- | --- | | (#) | 0-127 | 0000-007F | | (#) | 128-255 | 0080-00FF | | (#) | 256-383 | 0100-017F | | (#) | 384-591 | 0180-024F | If an HTML5 page uses a character set other than UTF-8, it must be specified in the tag, as shown below: ## Example
← Ref Html SymbolsRef Html 8859 β†’