Design your tables to minimize their space on the disk. This can result in huge improvements by reducing the amount of data written to and read from disk. Smaller tables normally require less main memory while their contents are being actively processed during query execution. Any space reduction for table data also results in smaller indexes that can be processed faster.
MySQL supports many different storage engines (table types) and row formats. For each table, you can decide which storage and indexing method to use. Choosing the proper table format for your application can give you a big performance gain. See Chapter 13, Storage Engines.
You can get better performance for a table and minimize storage space by using the techniques listed here:
Table Columns
Use the most efficient (smallest) data types possible. MySQL has many specialized types that save disk space and memory. For example, use the smaller integer types if possible to get smaller tables.
MEDIUMINT
is often a better choice thanINT
because aMEDIUMINT
column uses 25% less space.Declare columns to be
NOT NULL
if possible. It makes SQL operations faster, by enabling better use of indexes and eliminating overhead for testing whether each value isNULL
. You also save some storage space, one bit per column. If you really needNULL
values in your tables, use them. Just avoid the default setting that allowsNULL
values in every column.
Row Format
InnoDB
tables use a compact storage format. In versions of MySQL earlier than 5.0.3,InnoDB
rows contain some redundant information, such as the number of columns and the length of each column, even for fixed-size columns. By default, tables are created in the compact format (ROW_FORMAT=COMPACT
). If you wish to downgrade to older versions of MySQL, you can request the old format withROW_FORMAT=REDUNDANT
.The presence of the compact row format decreases row storage space by about 20% at the cost of increasing CPU use for some operations. If your workload is a typical one that is limited by cache hit rates and disk speed it is likely to be faster. If it is a rare case that is limited by CPU speed, it might be slower.
The compact
InnoDB
format also changes howCHAR
columns containing UTF-8 data are stored. WithROW_FORMAT=REDUNDANT
, a UTF-8CHAR(
occupies 3 ×N
)N
bytes, given that the maximum length of a UTF-8 encoded character is three bytes. Many languages can be written primarily using single-byte UTF-8 characters, so a fixed storage length often wastes space. WithROW_FORMAT=COMPACT
format,InnoDB
allocates a variable amount of storage in the range fromN
to 3 ×N
bytes for these columns by stripping trailing spaces if necessary. The minimum storage length is kept asN
bytes to facilitate in-place updates in typical cases.To minimize space even further by storing table data in compressed form, specify
ROW_FORMAT=COMPRESSED
when creatingInnoDB
tables, or run the myisampack command on an existingMyISAM
table. (InnoDB
tables compressed tables are readable and writeable, whileMyISAM
compressed tables are read-only.)For
MyISAM
tables, if you do not have any variable-length columns (VARCHAR
,TEXT
, orBLOB
columns), a fixed-size row format is used. This is faster but unfortunately may waste some space. See Section 13.5.3, “MyISAM
Table Storage Formats”. You can hint that you want to have fixed length rows even if you haveVARCHAR
columns with theCREATE TABLE
optionROW_FORMAT=FIXED
.
Indexes
The primary index of a table should be as short as possible. This makes identification of each row easy and efficient. For
InnoDB
tables, the primary key columns are duplicated in each secondary index entry, so a short primary key saves considerable space if you have many secondary indexes.Create only the indexes that you need to improve query performance. Indexes are good for retrieval, but slow down insert and update operations. If you access a table mostly by searching on a combination of columns, create a single composite index on them rather than a separate index for each column. The first part of the index should be the column most used. If you always use many columns when selecting from the table, the first column in the index should be the one with the most duplicates, to obtain better compression of the index.
If it is very likely that a long string column has a unique prefix on the first number of characters, it is better to index only this prefix, using MySQL's support for creating an index on the leftmost part of the column (see Section 12.1.11, “
CREATE INDEX
Syntax”). Shorter indexes are faster, not only because they require less disk space, but because they also give you more hits in the index cache, and thus fewer disk seeks. See Section 7.11.2, “Tuning Server Parameters”.
Joins
In some circumstances, it can be beneficial to split into two a table that is scanned very often. This is especially true if it is a dynamic-format table and it is possible to use a smaller static format table that can be used to find the relevant rows when scanning the table.
Declare columns with identical information in different tables with identical data types, to speed up joins based on the corresponding columns.
Keep column names simple, so that you can use the same name across different tables and simplify join queries. For example, in a table named
customer
, use a column name ofname
instead ofcustomer_name
. To make your names portable to other SQL servers, consider keeping them shorter than 18 characters.
Normalization
Normally, try to keep all data nonredundant (observing what is referred to in database theory as third normal form). Instead of repeating lengthy values such as names and addresses, assign them unique IDs, repeat these IDs as needed across multiple smaller tables, and join the tables in queries by referencing the IDs in the join clause.
If speed is more important than disk space and the maintenance costs of keeping multiple copies of data, for example in a business intelligence scenario where you analyze all the data from large tables, you can relax the normalization rules, duplicating information or creating summary tables to gain more speed.