Every character string literal has a character set and a collation.
A character string literal may have an optional character set
introducer and COLLATE
clause:
[_charset_name
]'string
' [COLLATEcollation_name
]
Examples:
SELECT 'string
'; SELECT _latin1'string
'; SELECT _latin1'string
' COLLATE latin1_danish_ci;
For the simple statement SELECT
'
, the string has
the character set and collation defined by the
string
'character_set_connection
and
collation_connection
system
variables.
The
_
expression is formally called an
introducer. It tells the parser,
“the string that is about to follow uses character set
charset_name
X
.” Because this has
confused people in the past, we emphasize that an introducer
does not change the string to the introducer character set
like CONVERT()
would do. It
does not change the string's value, although padding may
occur. The introducer is just a signal. An introducer is also
legal before standard hex literal and numeric hex literal
notation
(x'
and
literal
'0x
), or
before bit-field literal notation
(nnnn
b'
and
literal
'0b
).
nnnn
Examples:
SELECT _latin1 x'AABBCC'; SELECT _latin1 0xAABBCC; SELECT _latin1 b'1100011'; SELECT _latin1 0b1100011;
MySQL determines a literal's character set and collation in the following manner:
If both
_X
andCOLLATE
are specified, character setY
X
and collationY
are used.If
_X
is specified butCOLLATE
is not specified, character setX
and its default collation are used. To see the default collation for each character set, use theSHOW COLLATION
statement.Otherwise, the character set and collation given by the
character_set_connection
andcollation_connection
system variables are used.
Examples:
A string with
latin1
character set andlatin1_german1_ci
collation:SELECT _latin1'Müller' COLLATE latin1_german1_ci;
A string with
latin1
character set and its default collation (that is,latin1_swedish_ci
):SELECT _latin1'Müller';
A string with the connection default character set and collation:
SELECT 'Müller';
Character set introducers and the COLLATE
clause are implemented according to standard SQL
specifications.
An introducer indicates the character set for the following
string, but does not change now how the parser performs escape
processing within the string. Escapes are always interpreted
by the parser according to the character set given by
character_set_connection
.
The following examples show that escape processing occurs
using
character_set_connection
even
in the presence of an introducer. The examples use
SET NAMES
(which changes
character_set_connection
, as
discussed in Section 9.1.4, “Connection Character Sets and Collations”), and
display the resulting strings using the
HEX()
function so that the
exact string contents can be seen.
Example 1:
mysql>SET NAMES latin1;
Query OK, 0 rows affected (0.01 sec) mysql>SELECT HEX('à\n'), HEX(_sjis'à\n');
+------------+-----------------+ | HEX('à\n') | HEX(_sjis'à\n') | +------------+-----------------+ | E00A | E00A | +------------+-----------------+ 1 row in set (0.00 sec)
Here, “à
” (hex value
E0
) is followed by
“\n
”, the escape sequence for
newline. The escape sequence is interpreted using the
character_set_connection
value of latin1
to produce a literal
newline (hex value 0A
). This happens even
for the second string. That is, the introducer of
_sjis
does not affect the parser's escape
processing.
Example 2:
mysql>SET NAMES sjis;
Query OK, 0 rows affected (0.00 sec) mysql>SELECT HEX('à\n'), HEX(_latin1'à\n');
+------------+-------------------+ | HEX('à\n') | HEX(_latin1'à\n') | +------------+-------------------+ | E05C6E | E05C6E | +------------+-------------------+ 1 row in set (0.04 sec)
Here,
character_set_connection
is
sjis
, a character set in which the sequence
of “à
” followed by
“\
” (hex values
05
and 5C
) is a valid
multi-byte character. Hence, the first two bytes of the string
are interpreted as a single sjis
character,
and the “\
” is not interpreted
as an escape character. The following
“n
” (hex value
6E
) is not interpreted as part of an escape
sequence. This is true even for the second string; the
introducer of _latin1
does not affect
escape processing.