Controlling Optimizer Statistics Estimation

The MySQL query optimizer uses estimated statistics about key distributions to select or avoid using an index in an execution plan, based on the relative selectivity of the index. Previously, InnoDB sampled 8 random pages from an index to get an estimate of the cardinality of (i.e., the number of distinct values in) the index. (This page sampling technique is frequently described as “index dives”.) This small number of page samples frequently was insufficient, and could give inaccurate estimates of an index's selectivity and thus lead to poor choices by the query optimizer.

To give users control over the quality of the statistics estimate (and thus better information for the query optimizer), the number of sampled pages now can be changed using the parameter innodb_stats_sample_pages.

This feature addresses user requests such as that as expressed in MySQL Bug#25640: InnoDB Analyze Table Should Allow User Selection of Index Dives.

You can change the number of sampled pages using the global parameter innodb_stats_sample_pages, which can be set at runtime. The default value for this parameter is 8, preserving the same behavior as in past releases.

Note that the value of innodb_stats_sample_pages affects the index sampling for all tables and indexes. Note that there are the following potentially significant impacts when you change the index sample size:

  • Small values like 1 or 2 can result in very inaccurate estimates of cardinality.

  • Values much larger than 8 (say, 100), can cause a big slowdown in the time it takes to open a table or execute SHOW TABLE STATUS.

  • The optimizer might choose very different query plans based on different estimates of index selectivity.

The cardinality estimation can be disabled for metadata statements such as SHOW TABLE STATUS by executing the statement SET GLOBAL innodb_stats_on_metadata=OFF (or 0). Before InnoDB 1.0.2, this variable could only be set in the MySQL option file (my.cnf or my.ini), and changing it required shutting down and restarting the server.

The cardinality (the number of different key values) in every index of a table is calculated when a table is opened, at SHOW TABLE STATUS and ANALYZE TABLE and in other circumstances such as when the table has changed significantly. Note that all tables are opened, and the statistics are re-estimated, when the mysql client starts if the auto-rehash setting is set on (the default). The auto-rehash feature enables automatic name completion of database, table, and column names for interactive users. You can turn auto-rehash off to improve the start up time of the mysql client.

Because the statistics are automatically recalculated at various times other than on execution of ANALYZE TABLE, it does not make sense to increase the index sample size, then run ANALYZE TABLE and decrease sample size to attempt to obtain better statistics. The “better” statistics calculated by ANALYZE running with a high value of innodb_stats_sample_pages will be wiped away.

The estimated cardinality for an index is more accurate with a larger number of samples, but each sample might require a disk read, so you do not want to make the sample size too large. Choose a value for innodb_stats_sample_pages that results in reasonably accurate estimates for all tables in your database without requiring excessive I/O.

Although it is not possible to specify the sample size on a per-table basis, smaller tables generally require fewer index samples than larger tables do. If your database has many large tables, consider using a higher value for innodb_stats_sample_pages than if you have mostly smaller tables.

Copyright © 2010-2023 Platon Technologies, s.r.o.           Home | Man pages | tLDP | Documents | Utilities | About
Design by styleshout