MySQL desu: July 2015

Several times recently I've needed to get the max or the min value within a group of values. For example, say I want the max id for a user each day or the max id for each user who has several entries in some kind of a history table.

At first I struggled with how I was going to get that data but the solution was actually fairly simple with a sub-select and GROUP BY. Since I realized how easy this is to do, I've had to do a very similar thing at work quite often. It is nice to remember how useful GROUP BY can be. This is a similar query I wrote recently with a GROUP BY and a sub-select.

This trick only works when the table has an auto-increment id. You also have to make an assumption for your data that the max auto increment id has the highest value you are looking for in the group.

Might result in full index scans so it may not be the most optimal solution but it worked for what I needed it for:

SET @day_number = 1;

SELECT id

FROM table_a

WHERE date_created BETWEEN DATE_SUB(NOW(), INTERVAL @day_number DAY) AND NOW()

AND value1 < (@day_number*86400)

AND id IN (SELECT MAX(id) FROM table_a GROUP BY user_id);

I've been a fan of Solar Winds' 12 Steps to Query Performing Tuning Infographic which they published for Oracle and SQL Server. However, they didn't publish one for MySQL. Using the same 12 step model as in their infographic, I've written my own:

12 Steps to MySQL Query Tuning

Step 1 - Examine the Execution Plan

Make sure you are working with tables, not views
Make Certain you know the row count
Review the Access Predicates (key_len, ref columns in EXPLAIN PLAN)
- Is all or part of composite index being used?
Are there full table scans? Are there full index scans?
Review the Extra column
- Do you see "Using temporary;" or "Using filesort;" (BAD)
- Do you see "Using Index" (GOOD)
Is the Possible keys column NULL?
Is the value in the Rows column high? (BAD)
Does the type column show RANGE?
- Is it having traverse large number of rows in the Index leaf nodes?
- You want to keep the scanned index range as small as possible
Examine the EXPLAIN EXTENDED
SHOW WARNINGS <- Run this right after EXPLAIN EXTENDED to review how the actual SQL will be executed

Step 2 - Review the filters, JOINS, WHERE predicates, ORDER BY, GROUP BY, HAVING BY, LIMIT

What is the filtered row count?
If there is a LIMIT clause? Is it occurring after or being the entire table is accessed?
Are there functions in the ORDER BY?
Review the data types to avoid implicit conversion
- truthy/falsey filtering in the Predicates (converting columns to Boolean)
- WHERE !TIMESTAMP (ignores index)
Don't wrap your indexes in expressions in the WHERE predicates
- WHERE DATE(TIMESTMAP) = '2010-10-10' (ignores index)
Should the query be using an INNER JOIN vs. LEFT JOIN?
Is the query unnecessarily filtering out duplicates? (UNION vs. UNION ALL)
Are there bind variables? Using prepared statements?

Step 3 - Know the Selectivity of the table data

Is the data skewed resulting in queries that cannot effectively use existing indexes?
Make Sure to work with the smallest possible logical set
Can additional WHERE predicates be added?
Know when the predicate is applied when dealing with multiple tables, you want the most SELECTIVE predicate earlier than later

Step 4 - Gather Table Information

Review Table definitions
Have the stats been updated? Get current stats.
Are there triggers on the table? If so, how often do they fire? What are they doing?
What is the storage engine? Does it allow for row level locking vs. table level locking?
Make sure that datatypes match for the columns that are being JOINED between two tables, including the CHARSET and COLLATION

Step 5 - Evaluate Existing Indexes

Look at Index Definitions
Are there redundant indexes?
Are there composite indexes?
Is there a primary key on the table?
Is the primary key an integer, char?
Is the Primary key unnecessarily large as in the case a large varchar primary key?
Are there far too many secondary indexes?
Know the Selectivity of the indexes
Is the cardinality for the indexes high/low?
Could a fulltext index be useful?
Is each column that could be JOINED indexed?
Check to see if covering indexes can be created (avoid duplication and overlapping of indexes)

Step 6 - Analyze Columns in the WHERE clause and SELECT

Look for SELECT * or scalar function (the more data brought back the less optimal it may be to use certain functions)
Look for CASE, CAST, CONVERT
Are there sub queries? Can the sub query be converted to a JOIN?

Step 7 - Run the Query and record baseline metrics

Gather average execution times
If you are using Master-Master, Master-Slave or some other type of cluster where each instance has the same data, test running the query on each replica to see if the execution time and/or execution plan changes

Step 8 - Adjust/Tune the query

Focus on the most expensive operations first (full table scans, index scans, file sort, temporary table)
Look for high cost steps to eliminate
Consider a covering index, which includes every column that satisfies the query
In the case of ORDER BY and GROUP BY, check to see if you can eliminate "Using temporary;" or "Using filesort;" by properly indexing the table
Try to eliminate the frequent counting of rows by utilizing summary tables
Rewrite sub queries as a join
Seek vs. Scans, which is more expensive?

NOTE: Out of date statistics can impact performance. Issue "ANAYLZE TABLE"
NOTE: Fragmented tables can impact performance - during a planned maintenance Issue "OPTIMIZE TABLE"

Step 9 - Re-run the query

Record the results and compare
Only make one change at a time

Step 10 - Consider adjusting indexes

Test using index hints to see if the optimizer is choosing poorly
Look to reduce logical I/O

Step 11 - Incremental changes

Continue to re-run query and record results after each incremental change
Have a peer review the query it with you to see if you are missing something

Step 12 - Engineer out the Stupid

Abuse of wildcards?
Use of views?
Scalar Functions in the query?
Can the query be cached? Does it make sense to be using MySQL query cache or a dedicated caching layer? Consider turning off MySQL query cache as the effort to check the cache can severely slow down your queries.
Is the query using a cursor in a stored procedure?
Does the server have enough RAM to keep indexes in memory?
Are the data files stored on fast drives?
Are all the tables in your query using the same storage engine?
Do all your tables have Primary Keys, proper secondary indexes?
Is your table normalized/de-normalized as appropriate?
Is your MySQL server so heavily overloaded that queries are performing poorly?
Has the query been tested on large and small data sets?
Is the query being used on a sharded database where different databases have the same tables but possibly very different data sets?
Is the query doing silly things like over use of the OR/UNION command when an IN clause may be more appropriate?
Are you forcing the query to do full table scan by putting wildcard characters (%) on the left side of an index?
Is the table definition using the right data types? Abuse of CHAR(255), VARCHAR(255), BIGINT?
Can a complex query with dependent sub queries be split into multiple queries and/or use transactions?

UPDATE April 2016:
Solar Winds finally came out with an infographic for MySQL:
http://www.solarwinds.com/assets/infographics/dpa-mysql-12-steps.aspx

MySQL Performance Tuning - 12 Step Guide from SolarWinds

MySQL desu

Thursday, July 30, 2015

sub-select to find max values with a group

Friday, July 24, 2015

Adding new MySQL users to an AWS MySQL instance and using an SSH tunnel

Tuesday, July 21, 2015

12 Steps to MySQL Query Tuning

12 Steps to MySQL Query Tuning

Step 1 - Examine the Execution Plan

Step 2 - Review the filters, JOINS, WHERE predicates, ORDER BY, GROUP BY, HAVING BY, LIMIT

Step 3 - Know the Selectivity of the table data

Step 4 - Gather Table Information

Step 5 - Evaluate Existing Indexes

Step 6 - Analyze Columns in the WHERE clause and SELECT

Step 7 - Run the Query and record baseline metrics

Step 8 - Adjust/Tune the query

Step 9 - Re-run the query

Step 10 - Consider adjusting indexes

Step 11 - Incremental changes

Step 12 - Engineer out the Stupid

Tuesday, July 14, 2015

time zone errors when replaying logs