MySQL desu: October 2015

Friday, October 30, 2015

Moving the bottleneck after adding indexes

A few weeks ago, I identified half a dozen indexes that were missing from database tables for an important application at work. There was no rush to get them added because we typically have all changes including database indexes go through dev, qa, alpha, beta and then to prod. Performance started to get so bad for this application that management approved adding the indexes directly to production.

Several of the queries I identified were were doing full table scans frequently and it was clear that these queries would benefit from the index. The application was seeing some slowness and sometimes hit max connections because these queries were not clearing quickly enough.

I got permission to have the indexes added. At night time I added the indexes, one of which I used Percona Online Schema Change.

The next day, the throughput on the server had increased so much that the application was severely taxing the resources on the database system with high CPU and 200~600 something threads running all of these well tuned queries. It was interesting to see what a huge change those indexes made but also that the bottleneck moved from slow queries to too many queries for that version of MySQL to realistically handle with the number of CPU/cores on the system.

Thursday, October 29, 2015

pt-query digest

I've used pt-query digest ever since I became a MySQL database guy. It is an extremely useful tool, especially when you company refuses to buy profiling tools for you. One drawback is that it is heavy on processing. For example, if you want to process several hours worth of logs it can take over an hour and is heavy on the processing power of the system. I wouldn't ever recommend running it on the same server as a production database because of this.

Today was the first time I actually used it against binary logs. Previously I've only done it on slow query logs. After requesting a copy of the binary logs (still cannot get them for myself), I converted the binary logs to text using mysqlbinlog.

Here is an example from their documentation:

https://www.percona.com/doc/percona-toolkit/2.2/pt-query-digest.html

mysqlbinlog mysql-bin.000441 > mysql-bin.000441.txt

pt-query-digest --type binlog mysql-bin.000441.txt

Here is how I ran it to get specific time ranges and to save the report to a file:

pt-query-digest --type=binlog --since='2015-10-27 11:22:36' --until='2015-10-27 16:40:49' --limit=100% mysqllog_all_day.out > mysqllog_tuesday_all_day_digest.txt

There are a lot of options to change how the report is created. Seeing the number of times a query with a specific footprint is executed and a graph of the distribution based on timings is super useful.

If you want to see more, here are a couple blog posts:

https://www.percona.com/blog/2011/12/29/identifying-the-load-with-the-help-of-pt-query-digest-and-percona-server/

https://www.percona.com/blog/2014/03/14/tools-and-tips-for-analysis-of-mysqls-slow-query-log/

Wednesday, October 28, 2015

Grouping by granular timestamp ranges - by day or by minute

I was asked to run some counts and make it granular to the minute. At first I was like, sure I will just do a group by on the MINUTE. That didn't exactly work as it was missing several hours, so I realized that since every hour follows the same pattern I'd have to add a group by for each hour of the day in addition to the minute...

I want to get a count up to the minute for the entire day:

SELECT timestamp, count(*) as count

FROM table_b

WHERE timestamp BETWEEN '2015-10-26 00:00:00' AND '2015-10-26 23:59:59'

GROUP BY HOUR(timestamp), MINUTE(timestamp);

If you don't add the HOUR as part of the GROUP BY, it will only give you one hour out of the day.

I want to get a count up to the hour for the entire year:

SELECT timestamp, count(*) as count

FROM table_a

WHERE start BETWEEN '2015-01-01 00:00:00' AND '2015-12-31 23:59:59'

GROUP BY MONTH(timestamp), DAY(timestamp), HOUR(timestamp)

If you don't add the MONTH as part of the GROUP BY, it will only give you the results for a single month.

I want to get a count up to the day that spans multiple years:

SELECT COUNT(*) as count, SUM(duration), YEAR(`timestamp `) as year, MONTH(`timestamp `) as month, DAY(`timestamp `) as day
FROM table_a
WHERE
`start` BETWEEN '2013-11-20 00:00:00' AND '2015-12-31 23:59:59'
GROUP BY YEAR(`timestamp `) , MONTH(`start`), DAY(`timestamp `)
ORDER BY year, month, day

Initially it doesn't seem intuitive but once you think about it makes sense.

For all the purists out there, this is an illegal GROUP BY. However, MySQL extends the GROUP BY. Lifted from the manual (https://dev.mysql.com/doc/refman/5.6/en/group-by-handling.html):

MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause. You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate.

Tuesday, October 20, 2015

Counting things that do not exist with a correlated subquery

Today I had a problem where I needed to find the difference between total and in-active row counts based on a FK value from a parent table. I had a developer explaining that she wanted to do a count for values that don't exist. She gave me a query like this:

SELECT leader_id, count(*) as count

FROM lists

WHERE active = 1

GROUP BY leader_id

HAVING count = 0

I told her that you cannot count something that doesn't exist. It took me a little bit to understand what was needed but I was able to do it with a correlated subquery. Take this example table and example data:

CREATE TABLE `lists` (

`id` int(10) unsigned NOT NULL AUTO_INCREMENT,

`leader_id` int(10) unsigned NOT NULL,

`active` tinyint(4) NOT NULL DEFAULT '1',

`name` varchar(100) DEFAULT NULL,

PRIMARY KEY (`id`),

KEY `idx_fk_leader_id` (`leader_id`)

) ENGINE=InnoDB DEFAULT CHARSET=utf8;

INSERT INTO `lists` (`id`, `leader_id`, `active`, `name`) VALUES (NULL, '1', '1', 'one');

INSERT INTO `lists` (`id`, `leader_id`, `active`, `name`) VALUES (NULL, '2', '1', 'two');

INSERT INTO `lists` (`id`, `leader_id`, `active`, `name`) VALUES (NULL, '1', '0', 'One old');

INSERT INTO `lists` (`id`, `leader_id`, `active`, `name`) VALUES (NULL, '1', '1', 'One also old');

INSERT INTO `lists` (`id`, `leader_id`, `active`, `name`) VALUES (NULL, '3', '1', 'three');

INSERT INTO `lists` (`id`, `leader_id`, `active`, `name`) VALUES (NULL, '3', '0', 'three old');

INSERT INTO `lists` (`id`, `leader_id`, `active`, `name`) VALUES (NULL, '4', '0', 'four old');

I wrote this query to get the difference:

SELECT l1.leader_id,

count(*) as total_count,

l2.inactive_count,

count(*) - l2.inactive_count as difference

FROM lists l1

INNER JOIN

(SELECT leader_id, count(*) as inactive_count FROM lists l3 WHERE active = 0

GROUP BY leader_id) l2 ON l1.leader_id = l2.leader_id

GROUP BY l1.leader_id

Results look like this:

What we really wanted was just the value where total count was equal to the inactive count (difference = 0) so I added a HAVING clause also: