This section describes issues pertaining to Unicode support that you may face when upgrading to MySQL 5.5 from an older MySQL release. It also provides guidelines for downgrading from MySQL 5.5 to an older release.
In most respects, upgrading to MySQL 5.5 should present few problems with regard to Unicode usage, although there are some potential areas of incompatibility. These are the primary areas of concern:
For the variable-length character data types (
VARCHARand theTEXTtypes), the maximum length in characters is less forutf8mb4columns than forutf8columns.For all character data types (
CHAR,VARCHAR, and theTEXTtypes), the maximum number of characters that can be indexed is less forutf8mb4columns than forutf8columns.
Consequently, if you want to upgrade tables from
utf8 to utf8mb4 to take
advantage of supplementary-character support, it may be
necessary to change some column or index definitions.
Tables can be converted from utf8 to
utf8mb4 by using ALTER
TABLE. Suppose that a table was originally defined as
follows:
CREATE TABLE t1 ( col1 CHAR(10) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL, col2 CHAR(10) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL ) CHARACTER SET utf8;
The following statement converts t1 to use
utf8mb4:
ALTER TABLE t1
DEFAULT CHARACTER SET utf8mb4,
MODIFY col1 CHAR(10)
CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
MODIFY col2 CHAR(10)
CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL;
In terms of table content, conversion from
utf8 to utf8mb4 presents
no problems:
For a BMP character,
utf8andutf8mb4have identical storage characteristics: same code values, same encoding, same length.For a supplementary character,
utf8cannot store the character at all, whileutf8mb4requires four bytes to store it. Sinceutf8cannot store the character at all, you do not have any supplementary characters inutf8columns and you need not worry about converting characters or losing data when upgradingutf8data from older versions of MySQL.
In terms of table structure, the catch when converting from
utf8 to utf8mb4 is that
the maximum length of a column or index key is unchanged in
terms of bytes. Therefore, it is smaller in
terms of characters because the maximum
length of a character is four bytes instead of three. For the
CHAR,
VARCHAR, and
TEXT data types, watch for these
things when converting your MySQL tables:
Check all definitions of
utf8columns and make sure they will not exceed the maximum length for the storage engine.Check all indexes on
utf8columns and make sure they will not exceed the maximum length for the storage engine. Sometimes the maximum can change due to storage engine enhancements.
If the preceding conditions apply, you must either reduce the
defined length of columns or indexes, or continue to use
utf8 rather than utf8mb4.
Here are some examples where structural changes may be needed:
A
TINYTEXTcolumn can hold up to 255 bytes, so it can hold up to 85 three-byte or 63 four-byte characters. Suppose that you have aTINYTEXTcolumn that usesutf8but must be able to contain more than 63 characters. You cannot convert it toutf8mb4unless you also change the data type to a longer type such asTEXT.Similarly, a very long
VARCHARcolumn may need to be changed to one of the longerTEXTtypes if you want to convert it fromutf8toutf8mb4.InnoDBhas a maximum index length of 767 bytes, so forutf8orutf8mb4columns, you can index a maximum of 255 or 191 characters, respectively. If you currently haveutf8columns with indexes longer than 191 characters, you will need to index a smaller number of characters. In anInnoDBtable, these column and index definitions are legal:col1 VARCHAR(500) CHARACTER SET utf8, INDEX (col1(255))
To use
utf8mb4instead, the index must be smaller:col1 VARCHAR(500) CHARACTER SET utf8mb4, INDEX (col1(191))
The preceding types of changes are most likely to be required
only if you have very long columns or indexes. Otherwise, you
should be able to convert your tables from
utf8 to utf8mb4 without
problems. You can do this by using ALTER
TABLE as described earlier in this section after
upgrading in place to 5.5.
The following items summarize other potential areas of incompatibility:
Performance of four-byte UTF-8 (
utf8mb4) is slower than for three-byte UTF-8 (utf8). If you do not want to incur this penalty, continue to useutf8.SET NAMES 'utf8mb4'causes use of the four-byte character set for connection character sets. As long as no four-byte characters are sent from the server, there should be no problems. Otherwise, applications that expect to receive a maximum of three bytes per character may have problems. Conversely, applications that expect to send four-byte characters must ensure that the server understands them.Applications cannot send
utf16orutf32character data to an older server that does not understand them.For replication, if the character sets that support supplementary characters are going to be used on the master, all slaves must understand them as well. If you attempt to replicate from a MySQL 5.5 master to an older slave,
utf8data will be seen asutf8by the slave and should replicate correctly. But you cannot sendutf8mb4,utf16, orutf32data.Also, keep in mind the general principle that if a table has different definitions on the master and slave, this can lead to unexpected results. For example, the differences in limitations on index key length makes it risky to use
utf8on the master andutf8mb4on the slave.
If you have upgraded to MySQL 5.5, and then decide to downgrade back to an older release, these considerations apply:
ucs2andutf8data should present no problems.Any definitions that refer to the
utf8mb4,utf16, orutf32character sets will not be recognized by the older server.For object definitions that refer to the
utf8mb4character set, you can dump them with mysqldump in MySQL 5.5, edit the dump file to change instances ofutf8mb4toutf8, and reload the file in the older server, as long as there are no four-byte characters in the data. The older server will seeutf8in the dump file object definitions and create new objects that use the (three-byte)utf8character set.