MySQL supports the following JOIN syntaxes
for the table_references part of
SELECT statements and
multiple-table DELETE and
UPDATE statements:
table_references:table_reference[,table_reference] ...table_reference:table_factor|join_tabletable_factor:tbl_name[[AS]alias] [index_hint_list] |table_subquery[AS]alias| (table_references) | { OJtable_referenceLEFT OUTER JOINtable_referenceONconditional_expr}join_table:table_reference[INNER | CROSS] JOINtable_factor[join_condition] |table_referenceSTRAIGHT_JOINtable_factor|table_referenceSTRAIGHT_JOINtable_factorONconditional_expr|table_reference{LEFT|RIGHT} [OUTER] JOINtable_referencejoin_condition|table_referenceNATURAL [{LEFT|RIGHT} [OUTER]] JOINtable_factorjoin_condition: ONconditional_expr| USING (column_list)index_hint_list:index_hint[,index_hint] ...index_hint: USE {INDEX|KEY} [{FOR {JOIN|ORDER BY|GROUP BY}] ([index_list]) | IGNORE {INDEX|KEY} [{FOR {JOIN|ORDER BY|GROUP BY}] (index_list) | FORCE {INDEX|KEY} [{FOR {JOIN|ORDER BY|GROUP BY}] (index_list)index_list:index_name[,index_name] ...
A table reference is also known as a join expression.
The syntax of table_factor is
extended in comparison with the SQL Standard. The latter accepts
only table_reference, not a list of
them inside a pair of parentheses.
This is a conservative extension if we consider each comma in a
list of table_reference items as
equivalent to an inner join. For example:
SELECT * FROM t1 LEFT JOIN (t2, t3, t4)
ON (t2.a=t1.a AND t3.b=t1.b AND t4.c=t1.c)
is equivalent to:
SELECT * FROM t1 LEFT JOIN (t2 CROSS JOIN t3 CROSS JOIN t4)
ON (t2.a=t1.a AND t3.b=t1.b AND t4.c=t1.c)
In MySQL, CROSS JOIN is a syntactic
equivalent to INNER JOIN (they can replace
each other). In standard SQL, they are not equivalent.
INNER JOIN is used with an
ON clause, CROSS JOIN is
used otherwise.
In general, parentheses can be ignored in join expressions containing only inner join operations. MySQL also supports nested joins (see Section 7.13.7, “Nested Join Optimization”).
Index hints can be specified to affect how the MySQL optimizer makes use of indexes. For more information, see Section 12.2.9.2, “Index Hint Syntax”.
The following list describes general factors to take into account when writing joins.
A table reference can be aliased using
ortbl_nameASalias_nametbl_name alias_name:SELECT t1.name, t2.salary FROM employee AS t1 INNER JOIN info AS t2 ON t1.name = t2.name; SELECT t1.name, t2.salary FROM employee t1 INNER JOIN info t2 ON t1.name = t2.name;
A
table_subqueryis also known as a subquery in theFROMclause. Such subqueries must include an alias to give the subquery result a table name. A trivial example follows; see also Section 12.2.10.8, “Subqueries in theFROMClause”.SELECT * FROM (SELECT 1, 2, 3) AS t1;
INNER JOINand,(comma) are semantically equivalent in the absence of a join condition: both produce a Cartesian product between the specified tables (that is, each and every row in the first table is joined to each and every row in the second table).However, the precedence of the comma operator is less than of
INNER JOIN,CROSS JOIN,LEFT JOIN, and so on. If you mix comma joins with the other join types when there is a join condition, an error of the formUnknown column 'may occur. Information about dealing with this problem is given later in this section.col_name' in 'on clause'The
conditional_exprused withONis any conditional expression of the form that can be used in aWHEREclause. Generally, you should use theONclause for conditions that specify how to join tables, and theWHEREclause to restrict which rows you want in the result set.If there is no matching row for the right table in the
ONorUSINGpart in aLEFT JOIN, a row with all columns set toNULLis used for the right table. You can use this fact to find rows in a table that have no counterpart in another table:SELECT left_tbl.* FROM left_tbl LEFT JOIN right_tbl ON left_tbl.id = right_tbl.id WHERE right_tbl.id IS NULL;
This example finds all rows in
left_tblwith anidvalue that is not present inright_tbl(that is, all rows inleft_tblwith no corresponding row inright_tbl). This assumes thatright_tbl.idis declaredNOT NULL. See Section 7.13.5, “LEFT JOINandRIGHT JOINOptimization”.The
USING(clause names a list of columns that must exist in both tables. If tablescolumn_list)aandbboth contain columnsc1,c2, andc3, the following join compares corresponding columns from the two tables:a LEFT JOIN b USING (c1,c2,c3)
The
NATURAL [LEFT] JOINof two tables is defined to be semantically equivalent to anINNER JOINor aLEFT JOINwith aUSINGclause that names all columns that exist in both tables.RIGHT JOINworks analogously toLEFT JOIN. To keep code portable across databases, it is recommended that you useLEFT JOINinstead ofRIGHT JOIN.The
{ OJ ... LEFT OUTER JOIN ...}syntax shown in the join syntax description exists only for compatibility with ODBC. The curly braces in the syntax should be written literally; they are not metasyntax as used elsewhere in syntax descriptions.SELECT left_tbl.* FROM { OJ left_tbl LEFT OUTER JOIN right_tbl ON left_tbl.id = right_tbl.id } WHERE right_tbl.id IS NULL;You can use other types of joins within
{ OJ ... }, such asINNER JOINorRIGHT OUTER JOIN. This helps with compatibility with some third-party applications, but is not official ODBC syntax.STRAIGHT_JOINis similar toJOIN, except that the left table is always read before the right table. This can be used for those (few) cases for which the join optimizer puts the tables in the wrong order.
Some join examples:
SELECT * FROM table1, table2; SELECT * FROM table1 INNER JOIN table2 ON table1.id=table2.id; SELECT * FROM table1 LEFT JOIN table2 ON table1.id=table2.id; SELECT * FROM table1 LEFT JOIN table2 USING (id); SELECT * FROM table1 LEFT JOIN table2 ON table1.id=table2.id LEFT JOIN table3 ON table2.id=table3.id;
Join Processing Changes in MySQL 5.0.12
Natural joins and joins with USING,
including outer join variants, are processed according to the
SQL:2003 standard. The goal was to align the syntax and
semantics of MySQL with respect to NATURAL
JOIN and JOIN ... USING according
to SQL:2003. However, these changes in join processing can
result in different output columns for some joins. Also, some
queries that appeared to work correctly in older versions
(prior to 5.0.12) must be rewritten to comply with the
standard.
These changes have five main aspects:
The way that MySQL determines the result columns of
NATURALorUSINGjoin operations (and thus the result of the entireFROMclause).Expansion of
SELECT *andSELECTinto a list of selected columns.tbl_name.*Resolution of column names in
NATURALorUSINGjoins.Transformation of
NATURALorUSINGjoins intoJOIN ... ON.Resolution of column names in the
ONcondition of aJOIN ... ON.
The following list provides more detail about several effects of current join processing versus join processing in older versions. The term “previously” means “prior to MySQL 5.0.12.”
The columns of a
NATURALjoin or aUSINGjoin may be different from previously. Specifically, redundant output columns no longer appear, and the order of columns forSELECT *expansion may be different from before.Consider this set of statements:
CREATE TABLE t1 (i INT, j INT); CREATE TABLE t2 (k INT, j INT); INSERT INTO t1 VALUES(1,1); INSERT INTO t2 VALUES(1,1); SELECT * FROM t1 NATURAL JOIN t2; SELECT * FROM t1 JOIN t2 USING (j);
Previously, the statements produced this output:
+------+------+------+------+ | i | j | k | j | +------+------+------+------+ | 1 | 1 | 1 | 1 | +------+------+------+------+ +------+------+------+------+ | i | j | k | j | +------+------+------+------+ | 1 | 1 | 1 | 1 | +------+------+------+------+
In the first
SELECTstatement, columnjappears in both tables and thus becomes a join column, so, according to standard SQL, it should appear only once in the output, not twice. Similarly, in the second SELECT statement, columnjis named in theUSINGclause and should appear only once in the output, not twice. But in both cases, the redundant column is not eliminated. Also, the order of the columns is not correct according to standard SQL.Now the statements produce this output:
+------+------+------+ | j | i | k | +------+------+------+ | 1 | 1 | 1 | +------+------+------+ +------+------+------+ | j | i | k | +------+------+------+ | 1 | 1 | 1 | +------+------+------+
The redundant column is eliminated and the column order is correct according to standard SQL:
First, coalesced common columns of the two joined tables, in the order in which they occur in the first table
Second, columns unique to the first table, in order in which they occur in that table
Third, columns unique to the second table, in order in which they occur in that table
The single result column that replaces two common columns is defined using the coalesce operation. That is, for two
t1.aandt2.athe resulting single join columnais defined asa = COALESCE(t1.a, t2.a), where:COALESCE(x, y) = (CASE WHEN V1 IS NOT NULL THEN V1 ELSE V2 END)
If the join operation is any other join, the result columns of the join consists of the concatenation of all columns of the joined tables. This is the same as previously.
A consequence of the definition of coalesced columns is that, for outer joins, the coalesced column contains the value of the non-
NULLcolumn if one of the two columns is alwaysNULL. If neither or both columns areNULL, both common columns have the same value, so it doesn't matter which one is chosen as the value of the coalesced column. A simple way to interpret this is to consider that a coalesced column of an outer join is represented by the common column of the inner table of aJOIN. Suppose that the tablest1(a,b)andt2(a,c)have the following contents:t1 t2 ---- ---- 1 x 2 z 2 y 3 w
Then:
mysql>
SELECT * FROM t1 NATURAL LEFT JOIN t2;+------+------+------+ | a | b | c | +------+------+------+ | 1 | x | NULL | | 2 | y | z | +------+------+------+Here column
acontains the values oft1.a.mysql>
SELECT * FROM t1 NATURAL RIGHT JOIN t2;+------+------+------+ | a | c | b | +------+------+------+ | 2 | z | y | | 3 | w | NULL | +------+------+------+Here column
acontains the values oft2.a.Compare these results to the otherwise equivalent queries with
JOIN ... ON:mysql>
SELECT * FROM t1 LEFT JOIN t2 ON (t1.a = t2.a);+------+------+------+------+ | a | b | a | c | +------+------+------+------+ | 1 | x | NULL | NULL | | 2 | y | 2 | z | +------+------+------+------+mysql>
SELECT * FROM t1 RIGHT JOIN t2 ON (t1.a = t2.a);+------+------+------+------+ | a | b | a | c | +------+------+------+------+ | 2 | y | 2 | z | | NULL | NULL | 3 | w | +------+------+------+------+Previously, a
USINGclause could be rewritten as anONclause that compares corresponding columns. For example, the following two clauses were semantically identical:a LEFT JOIN b USING (c1,c2,c3) a LEFT JOIN b ON a.c1=b.c1 AND a.c2=b.c2 AND a.c3=b.c3
Now the two clauses no longer are quite the same:
With respect to determining which rows satisfy the join condition, both joins remain semantically identical.
With respect to determining which columns to display for
SELECT *expansion, the two joins are not semantically identical. TheUSINGjoin selects the coalesced value of corresponding columns, whereas theONjoin selects all columns from all tables. For the precedingUSINGjoin,SELECT *selects these values:COALESCE(a.c1,b.c1), COALESCE(a.c2,b.c2), COALESCE(a.c3,b.c3)
For the
ONjoin,SELECT *selects these values:a.c1, a.c2, a.c3, b.c1, b.c2, b.c3
With an inner join,
COALESCE(a.c1,b.c1)is the same as eithera.c1orb.c1because both columns will have the same value. With an outer join (such asLEFT JOIN), one of the two columns can beNULL. That column will be omitted from the result.
The evaluation of multi-way natural joins differs in a very important way that affects the result of
NATURALorUSINGjoins and that can require query rewriting. Suppose that you have three tablest1(a,b),t2(c,b), andt3(a,c)that each have one row:t1(1,2),t2(10,2), andt3(7,10). Suppose also that you have thisNATURAL JOINon the three tables:SELECT ... FROM t1 NATURAL JOIN t2 NATURAL JOIN t3;
Previously, the left operand of the second join was considered to be
t2, whereas it should be the nested join(t1 NATURAL JOIN t2). As a result, the columns oft3are checked for common columns only int2, and, ift3has common columns witht1, these columns are not used as equi-join columns. Thus, previously, the preceding query was transformed to the following equi-join:SELECT ... FROM t1, t2, t3 WHERE t1.b = t2.b AND t2.c = t3.c;
That join is missing one more equi-join predicate
(t1.a = t3.a). As a result, it produces one row, not the empty result that it should. The correct equivalent query is this:SELECT ... FROM t1, t2, t3 WHERE t1.b = t2.b AND t2.c = t3.c AND t1.a = t3.a;
If you require the same query result in current versions of MySQL as in older versions, rewrite the natural join as the first equi-join.
Previously, the comma operator (
,) andJOINboth had the same precedence, so the join expressiont1, t2 JOIN t3was interpreted as((t1, t2) JOIN t3). NowJOINhas higher precedence, so the expression is interpreted as(t1, (t2 JOIN t3)). This change affects statements that use anONclause, because that clause can refer only to columns in the operands of the join, and the change in precedence changes interpretation of what those operands are.Example:
CREATE TABLE t1 (i1 INT, j1 INT); CREATE TABLE t2 (i2 INT, j2 INT); CREATE TABLE t3 (i3 INT, j3 INT); INSERT INTO t1 VALUES(1,1); INSERT INTO t2 VALUES(1,1); INSERT INTO t3 VALUES(1,1); SELECT * FROM t1, t2 JOIN t3 ON (t1.i1 = t3.i3);
Previously, the
SELECTwas legal due to the implicit grouping oft1,t2as(t1,t2). Now theJOINtakes precedence, so the operands for theONclause aret2andt3. Becauset1.i1is not a column in either of the operands, the result is anUnknown column 't1.i1' in 'on clause'error. To allow the join to be processed, group the first two tables explicitly with parentheses so that the operands for theONclause are(t1,t2)andt3:SELECT * FROM (t1, t2) JOIN t3 ON (t1.i1 = t3.i3);
Alternatively, avoid the use of the comma operator and use
JOINinstead:SELECT * FROM t1 JOIN t2 JOIN t3 ON (t1.i1 = t3.i3);
This change also applies to statements that mix the comma operator with
INNER JOIN,CROSS JOIN,LEFT JOIN, andRIGHT JOIN, all of which now have higher precedence than the comma operator.Previously, the
ONclause could refer to columns in tables named to its right. Now anONclause can refer only to its operands.Example:
CREATE TABLE t1 (i1 INT); CREATE TABLE t2 (i2 INT); CREATE TABLE t3 (i3 INT); SELECT * FROM t1 JOIN t2 ON (i1 = i3) JOIN t3;
Previously, the
SELECTstatement was legal. Now the statement fails with anUnknown column 'i3' in 'on clause'error becausei3is a column int3, which is not an operand of theONclause. The statement should be rewritten as follows:SELECT * FROM t1 JOIN t2 JOIN t3 ON (i1 = i3);
Resolution of column names in
NATURALorUSINGjoins is different than previously. For column names that are outside theFROMclause, MySQL now handles a superset of the queries compared to previously. That is, in cases when MySQL formerly issued an error that some column is ambiguous, the query now is handled correctly. This is due to the fact that MySQL now treats the common columns ofNATURALorUSINGjoins as a single column, so when a query refers to such columns, the query compiler does not consider them as ambiguous.Example:
SELECT * FROM t1 NATURAL JOIN t2 WHERE b > 1;
Previously, this query would produce an error
ERROR 1052 (23000): Column 'b' in where clause is ambiguous. Now the query produces the correct result:+------+------+------+ | b | c | y | +------+------+------+ | 4 | 2 | 3 | +------+------+------+
One extension of MySQL compared to the SQL:2003 standard is that MySQL enables you to qualify the common (coalesced) columns of
NATURALorUSINGjoins (just as previously), while the standard disallows that.