Events are collected by means of instrumentation added to the server source code. Instruments time events, which is how Performance Schema provides an idea of how long events take. It is also possible to configure instruments not to collect timing information. This section discusses the available timers and their characteristics, and how timing values are represented in events.
Timers vary in precision and the amount of overhead they involve.
To see what timers are available and their characteristics, check
the performance_timers
table:
mysql> SELECT * FROM performance_timers;
+-------------+-----------------+------------------+----------------+
| TIMER_NAME | TIMER_FREQUENCY | TIMER_RESOLUTION | TIMER_OVERHEAD |
+-------------+-----------------+------------------+----------------+
| CYCLE | 2389029850 | 1 | 72 |
| NANOSECOND | NULL | NULL | NULL |
| MICROSECOND | 1000000 | 1 | 585 |
| MILLISECOND | 1035 | 1 | 738 |
| TICK | 101 | 1 | 630 |
+-------------+-----------------+------------------+----------------+
The TIMER_NAME
column shows the names of the
available timers. CYCLE
refers to the timer
that is based on the CPU (processor) cycle counter. If the values
associated with a given timer name are NULL
,
that timer is not supported on your platform. The rows that do not
have NULL
indicate which timers you can use.
TIMER_FREQUENCY
indicates the number of timer
units per second. For a cycle timer, the frequency is generally
related to the CPU speed. The value shown was obtained on a system
with a 2.4GHz processor. The other timers are based on fixed
fractions of seconds. For TICK
, the frequency
may vary by platform (for example, some use 100 ticks/second,
others 1000 ticks/second).
TIMER_RESOLUTION
indicates the number of timer
units by which timer values increase at a time. If a timer has a
resolution of 10, its value increases by 10 each time.
TIMER_OVERHEAD
is the minimal number of cycles
of overhead to obtain one timing with the given timer. The
overhead per event is twice the value displayed because the timer
is invoked at the beginning and end of the event.
To see which timer is in effect or to change the timer, access the
setup_timers
table, which has a single row:
mysql>SELECT * FROM setup_timers;
+------+------------+ | NAME | TIMER_NAME | +------+------------+ | wait | CYCLE | +------+------------+ mysql>UPDATE setup_timers SET TIMER_NAME = 'MICROSECOND';
mysql>SELECT * FROM setup_timers;
+------+-------------+ | NAME | TIMER_NAME | +------+-------------+ | wait | MICROSECOND | +------+-------------+
Performance Schema uses the best timer available by default, but
you can select a different one. Generally the best timer is
CYCLE
, which uses the CPU cycle counter
whenever possible to provide high precision and low overhead.
The precision offered by the cycle counter depends on processor
speed. If the processor runs at 1 GHz (one billion cycles/second)
or higher, the cycle counter delivers sub-nanosecond precision.
Using the cycle counter is much cheaper than getting the actual
time of day. For example, the standard
gettimeofday()
function can take hundreds of
cycles, which is an unacceptable overhead if data gathering occurs
thousands or millions of times per second.
Cycle counters also have disadvantages:
End users expect to see timings in wall-clock units, such as fractions of a second. Converting from cycles to fractions of seconds can be expensive. For this reason, the conversion is a quick and fairly rough multiplication operation.
Processor cycle rate might change, such as when a laptop goes into power-saving mode or when a CPU slows down to reduce heat generation. If a processor's cycle rate fluctuates, conversion from cycles to real-time units is subject to error.
Cycle counters might be unreliable or unavailable depending on the processor or the operating system. For example, on Pentiums, the instruction is
RDTSC
(an assembly-language rather than a C instruction) and it is theoretically possible for the operating system to prevent user-mode programs from using it.Some processor details related to out-of-order execution or multiprocessor synchronization might cause the counter to seem fast or slow by up to 1000 cycles.
Currently, MySQL works with cycle counters on x386 (Windows, Mac OS X, Linux, and Solaris and other Unix flavors), PowerPC, and IA-64.
Within events, times are stored in picoseconds (trillionths of a second) so that they all use a standard unit, regardless of which timer is selected. The timer used for an event is the one in effect when the event is timed. This timer is used to convert start and end values to picoseconds for storage in the event. If a different timer is selected, that affects only events that start afterward, not those already in progress.
The timer baseline (“time zero”) occurs at
Performance Schema initialization during server startup.
TIMER_START
and TIMER_END
values in events represent picoseconds since the baseline.
TIMER_WAIT
values are durations in picoseconds.
Picosecond values in events are approximate. Their accuracy is
subject to the usual forms of error associated with conversion
from one unit to another. If the CYCLE
timer is
used and the processor rate varies, there might be drift. For
these reasons, it is not reasonable to look at the
TIMER_START
value for an event as an accurate
measure of time elapsed since server startup. On the other hand,
it is reasonable to use TIMER_START
or
TIMER_WAIT
values in ORDER
BY
clauses to order events by start time or duration.
The choice of picoseconds in events rather than a value such as
microseconds has a performance basis. One implementation goal was
to show results in a uniform time unit, regardless of the timer.
In an ideal world this time unit would look like a wall-clock unit
and be reasonably precise; in other words, microseconds. But to
convert cycles or nanoseconds to microseconds, it would be
necessary to perform a division for every instrumentation.
Division is expensive on many platforms. Multiplication is not
expensive, so that is what is used. Therefore, the time unit is an
integer multiple of the highest possible
TIMER_FREQUENCY
value, using a multiplier large
enough to ensure that there is no major precision loss. The result
is that the time unit is “picoseconds.” This
precision is spurious, but the decision enables overhead to be
minimized. If this decision turns out to be impractical in some
way, we will revisit it.
The setup_instruments
table has an
ENABLED
column to indicate the instruments for
which to collect events. The table also has a
TIMED
column to indicate which instruments are
timed. If an instrument is not enabled, it produces no events. If
an enabled instrument is not timed, events produced by the
instrument have NULL
for the
TIMER_START
, TIMER_END
, and
TIMER_WAIT
timer values. This in turn causes
those values to be ignored when calculating the sum, minimum,
maximum, and average time values in summary tables.