MySQL 5.5 Reference Manual

9.1.10. Unicode Support
Prev	9.1. Character Set Support	Next

9.1.10. Unicode Support

9.1.10.1. The ucs2 Character Set (UCS-2 Unicode Encoding)
9.1.10.2. The utf16 Character Set (UTF-16 Unicode Encoding)
9.1.10.3. The utf32 Character Set (UTF-32 Unicode Encoding)
9.1.10.4. The utf8 Character Set (Three-Byte UTF-8 Unicode Encoding)
9.1.10.5. The utf8mb3 “Character Set” (Alias for utf8)
9.1.10.6. The utf8mb4 Character Set (Four-Byte UTF-8 Unicode Encoding)

The initial implementation of Unicode support (in MySQL 4.1) included two character sets for storing Unicode data:

ucs2, the UCS-2 encoding of the Unicode character set using 16 bits per character
utf8, a UTF-8 encoding of the Unicode character set using one to three bytes per character

These two character sets support the characters from the Basic Multilingual Plane (BMP) of Unicode Version 3.0. BMP characters have these characteristics:

Their code values are between 0 and 65535 (or U+0000 .. U+FFFF)
They can be encoded with a fixed 16-bit word, as in ucs2
They can be encoded with 8, 16, or 24 bits, as in utf8
They are sufficient for almost all characters in major languages

Characters not supported by the aforementioned character sets include supplementary characters that lie outside the BMP. As of MySQL 5.5.3, Unicode support includes supplementary characters, which requires new character sets that have a broader range and therefore take more space. The following table shows a brief feature comparison of previous and current Unicode support.

Before MySQL 5.5	MySQL 5.5 and up
All Unicode 3.0 characters	All Unicode 5.0 characters
No supplementary characters	With supplementary characters
`ucs2` character set, BMP only	No change
`utf8` character set for up to three bytes, BMP only	No change
	New `utf8mb4` character set for up to four bytes, BMP or supplemental
	New `utf16` character set, BMP or supplemental
	New `utf32` character set, BMP or supplemental

These changes are upward compatible. If you want to use the new character sets, there are potential incompatibility issues for your applications; see Section 9.1.11, “Upgrading from Previous to Current Unicode Support”. That section also describes how to convert tables from utf8 to the (four-byte) utf8mb4 character set, and what constraints may apply in doing so.

MySQL 5.5 supports these Unicode character sets:

ucs2, the UCS-2 encoding of the Unicode character set using 16 bits per character
utf16, the UTF-16 encoding for the Unicode character set; like ucs2 but with an extension for supplementary characters
utf32, the UTF-32 encoding for the Unicode character set using 32 bits per character
utf8, a UTF-8 encoding of the Unicode character set using one to three bytes per character
utf8mb4, a UTF-8 encoding of the Unicode character set using one to four bytes per character

ucs2 and utf8 support BMP characters. utf8mb4, utf16, and utf32 support BMP and supplementary characters.

A similar set of collations is available for each Unicode character set. For example, each has a Danish collation, the names of which are ucs2_danish_ci, utf16_danish_ci, utf32_danish_ci, utf8_danish_ci, and utf8mb4_danish_ci. All Unicode collations are listed at Section 9.1.14.1, “Unicode Character Sets”, which also describes collation properties for supplementary characters.

Note that although many of the supplementary characters come from East Asian languages, what MySQL 5.5 adds is support for more Japanese and Chinese characters in Unicode character sets, not support for new Japanese and Chinese character sets.

The MySQL implementation of UCS-2, UTF-16, and UTF-32 stores characters in big-endian byte order and does not use a byte order mark (BOM) at the beginning of values. Other database systems might use little-endian byte order or a BOM. In such cases, conversion of values will need to be performed when transferring data between those systems and MySQL.

MySQL uses no BOM for UTF-8 values.

Client applications that need to communicate with the server using Unicode should set the client character set accordingly; for example, by issuing a SET NAMES 'utf8' statement. ucs2, utf16, and utf32 cannot be used as a client character set, which means that they do not work for SET NAMES or SET CHARACTER SET. (See Section 9.1.4, “Connection Character Sets and Collations”.)

The following sections provide additional detail on the Unicode character sets in MySQL.

Prev	Up	Next
9.1.9.3. `SHOW` Statements and `INFORMATION_SCHEMA`	Home	9.1.10.1. The `ucs2` Character Set (UCS-2 Unicode Encoding)

docs.sk

comprehensive documentation repository

9.1.10. Unicode Support