Monday, December 28, 2015

Finding international phone numbers mixed in with other numbers

Today I had a developer ask me to help write a "fun" query. He is new to MySQL wasn't sure how to do the same things he is familiar doing in PHP. The developer said he wanted to get a count for all the rows in a table where we store phone numbers to see which entries would meet the criteria for an international number. He wanted to first strip out all the extra stuff people might enter in a phone field such as commas, dashes, spaces, parenthesis, then filter out any places where the client had entered in words instead of numbers, blank numbers, null values, exclude numbers that start with +1 (because that is USA number),  then trim off any numbers that start with 1 but don't already have a + in the beginning and don't count numbers that are 10 digits after stripping out the extra characters (because those should be considered USA numbers).

Here is an example table and sample data:

CREATE TABLE `contacts` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `phone` varchar(40) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_phone` (`phone`)
) ENGINE=InnoDB;

INSERT INTO `contacts` (`id`, `phone`)
VALUES
(1, NULL),
(2, '(123)'),
(3, '+222'),
(4, '+44 20 7895 1232'),
(5, '+448717923642'),
(6, ',123'),
(7, '1 (818) 555-1211'),
(8, '119999999999'),
(9, '123numbers'),
(10, '1555555555555'),
(11, '55555555'),
(12, '5555555555'),
(13, 'numbers'),
(14, 'numbers123');

Here is the query that gets my count: 

SELECT database() as `database`, COUNT(*) as count
FROM
(    SELECT database() as `database`,
    phone,
    TRIM(
        TRIM(LEADING '1' FROM
            REPLACE(
                REPLACE(
                    REPLACE(
                        REPLACE(
                            REPLACE(
                                phone,
                            '(',''),
                        ')',''),
                    '-',''),
                ' ',''),
            ',','')
        )
    ) as phone_changed,
    CHAR_LENGTH(
        TRIM(
            TRIM(LEADING '1' FROM
                REPLACE(
                    REPLACE(
                        REPLACE(
                            REPLACE(
                                REPLACE(
                                    phone,
                                '(',''),
                            ')',''),
                        '-',''),
                    ' ',''),
                ',','')
            )
        )
    ) as phone_changed_length
    FROM contacts
    WHERE phone NOT LIKE '+1%'
    AND phone <> ''
    AND phone IS NOT NULL
    AND phone REGEXP '[0-9]+'
    AND phone REGEXP '^[+]?[-(), 0-9]+$'
    HAVING phone_changed_length <> 10
) as t

Trimming 1 off of the numbers may not be the perfect solution for getting rid of the 1 when someone puts it at the  beginning of a US number but it worked fine for our situation. There are not any US area codes that start with 1 and it is mainly the US that uses the 1 for dialing a US phone number so we were not concerned that this count would be wrong. Example 2 and 6 in the above sample data will still get trimmed even though they are obviously not a valid US number and also not a international number and will still get counted.

The inner query is longer than it actually needs to be just to get this count, however, running the inner query by itself will allow you to see how the TRIM and REPLACE is stripping out some of the data and compare it to the original.

And to give credit where credit is due (needed to reference a few examples):

http://stackoverflow.com/questions/12652234/sql-query-to-remove-special-characters
http://stackoverflow.com/questions/5064977/detect-if-value-is-number-in-mysql
https://techjourney.net/remove-or-trim-first-or-last-few-characters-in-mysql-database-with-sql/
http://stackoverflow.com/questions/8844880/mysql-regexp-removing-white-space-and-non-numeric-characters
http://dev.mysql.com/doc/refman/5.7/en/regexp.html
http://stackoverflow.com/questions/15005916/get-number-of-rows-returned-in-mysql-query
http://www.w3resource.com/mysql/string-functions/mysql-trim-function.php
http://www.w3resource.com/mysql/string-functions/mysql-char_length-function.php

Tuesday, December 8, 2015

mysql dump triggers and stored procedures

I've been working on a little project to fix situations where we get incorrect definers on certain databases. The incorrect definer could be on an event, view, stored procedure, trigger, or a function. When fixing these problems it requires some effort because you cannot just edit a definer for triggers (even though you can for events and procedures). The trigger must be fully dropped and re-created with the new definer. To automate this task I've been writing a bash script to do the work.

Initially I had written a little bash function to DROP each function/procedure/trigger before restoring it. After writing the function I discovered that MySQL was automatically including a DROP IF EXISTS in the dump file for each stored procedure and each function. My bash function was redundant and not necessary. However, I wasn't seeing an equivalent DROP TRIGGER IF EXISTS command when I took a mysqldump of the triggers. After reading up on this, a lot of people were requesting this many years ago before I became a DBA: https://bugs.mysql.com/bug.php?id=34325

Lo and behold this feature was already in the version of MySQL I was running. I just needed to tell MySQL I wanted those statements included.

Here is the mysqldump command I've created for this task:

mysqldump -u user_name -p -h test_server.awesome.com --add-drop-trigger --databases my_test_db --no-create-info --no-data -R  > backup_triggers_routines.sql

Explanation of each option:
--add-drop-trigger  (this option tells it to add the DROP TRIGGER IF EXSTS syntax)
-- databases (this option allows me to specify more than one database, I'm only do one at a time)
--no-create-info (tell MySQL not to include CREATE TABLE info)
--no-data (tells MySQL not to include data because I only want the triggers/procs/functions)
-R (this is the same as --routines, tell it to also dump procedures/functions)

Note: Triggers are included by default but stored procedures and functions are not. If you wanted to you could include --triggers but it is not necessary.

If you wanted to dump only the routines (stored procedures/functions) you can use the --skip-triggers option to exclude triggers.

Seems silly that MySQL doesn't automatically include routines and then silly that it automatically includes the DROP syntax for routines but doesn't automatically include the DROP syntax for triggers.

Monday, December 7, 2015

pt-online-schema-change

Percona has a tool called the Percona Online Schema change which is part of the Percona Toolkit. I've heard it referred to as OSC. It is super useful for adding indexes, changing the storage engine of a table, adding a column, etc, when you want to be able to do the operation "online" without causing the table to lockup and essentially be unavailable for the duration of the operation. On large tables this can take a long time, I've done it on tables with 500 million rows and it has taken 6 hours to perform the operation.

The tool works by creating a new table, performing the changes on the empty table and then moving the data from the old table to the new table in chunks. Once the data is all moved over, the tool syncs up the two tables one last time so that the new data written to the old table while it was moving the other data gets moved over and then it renames the new table to the same name as the old table and then drops the old table. One of the added bonuses of doing all this work is the table also get optimized at the same time. If you do something on a large table that sees a lot of DELETES and UPDATES then is is likely fragmented. I've seen situations where a 100 GB table shrunk down to 50 GB after adding an index because it was so highly fragmented. However, within a few weeks the fragmentation was back.

The syncing of data works by means of triggers so the tool cannot be used on any tables that already have triggers. There are a number of options you can use when running the tool such as "Dry Run", and telling the tool if the server gets too busy to pause. You can also change the chunk size to be as small or large as you see fit. I typically use 10000 as the chunk size. I've tried a chunk size of 1000 and it is much slower for large tables. You could get away with using 50000 but I'd be careful of something too large.

I set a thresholds if 400 Threads_running which is really high in my examples. Threads_running of 50 might be high enough on your server that you would want the tool to pause and wait until the server is less busy to continue syncing data. The tool won't actually perform the operation unless you specify "--execute" (see below).

The tool also has a number of safety features where it checks replication lag on all the slaves and it won't run until replication lag catches up. Example output from using Online Schema Change is below.

Here are two examples I've done frequently:


Example adding an index to a table:

# non prod
## dry run
./pt-online-schema-change --host=non_prod.awesome.com D=my_non_prod_db,t=my_table --ask-pass -u'!me' --critical-load Threads_running=400 --max-load Threads_running=375 --chunk-size=10000 --dry --alter 'ADD INDEX idx_colum1_column2 (column_1, column_1)' > np_add_index_dry_results.log

## actual
./pt-online-schema-change --host=non_prod.awesome.com D=my_non_prod_db,t=my_table --ask-pass -u'!me' --critical-load Threads_running=400 --max-load Threads_running=375 --chunk-size=10000 --execute --alter 'ADD INDEX idx_column_1_column_2 (column_1, column_2)' > np_add_index_actual_results.log


# prod
## dry run
./pt-online-schema-change --host=prod.awesome.com D=my_prod_db,t=my_table --ask-pass -u'!me' --critical-load Threads_running=400 --max-load Threads_running=375 --chunk-size=10000 --dry --alter 'ADD INDEX idx_column_1_column_2 (column_1, column_2)' > add_index_dry_results.log

## actual
./pt-online-schema-change --host=prod.awesome.com D=my_prod_db,t=my_table --ask-pass -u'!me' --critical-load Threads_running=400 --max-load Threads_running=375 --chunk-size=10000 --execute --alter 'ADD INDEX idx_column_1_column_2 (column_1, column_2)' > add_index_actual_results.log


Example changing storage engine from MyISAM to InnoDB:

# non prod
## dry run
./pt-online-schema-change --host=non_prod.awesome.com D=my_database,t=my_table -u'!me' --ask-pass --critical-load Threads_running=400 --max-load Threads_running=375 --chunk-size=10000 --dry --alter "ENGINE = InnoDB" > non_prod_dry_my_table_change_table_engine.log
# actual
./pt-online-schema-change --host=non_prod.awesome.com D=my_database,t=my_table -u'!me' --ask-pass --critical-load Threads_running=400 --max-load Threads_running=375 --chunk-size=10000 --execute --alter "ENGINE = InnoDB" > non_prod_actual_my_table_change_table_engine.log

# prod
## dry run
./pt-online-schema-change --host=prod.awesome.com D=my_database,t=my_table -u'!me' --ask-pass --critical-load Threads_running=400 --max-load Threads_running=375 --chunk-size=10000 --dry --alter "ENGINE = InnoDB" > prod_dry_my_table_change_table_engine.log
## actual
./pt-online-schema-change --host=prod.awesome.com D=my_database,t=my_table -u'!me' --ask-pass --critical-load Threads_running=400 --max-load Threads_running=375 --chunk-size=10000 --execute --alter "ENGINE = InnoDB" > prod_dry_my_table_change_table_engine.log

Found 5 slaves:
  db_server_2
  db_server_3
  db_server_4
  db_server_5
  db_server_6
Will check slave lag on:
  db_server_2
  db_server_3
  db_server_4
  db_server_5
  db_server_6

Example output when changing table from MyISAM to InnoDB:

Operation, tries, wait:
  copy_rows, 10, 0.25
  create_triggers, 10, 1
  drop_triggers, 10, 1
  swap_tables, 10, 1
  update_foreign_keys, 10, 1
Altering `my_database`.`my_table`...
Creating new table...
Created new table my_database._my_table_new OK.
Altering new table...
Altered `my_database`.`_my_table_new` OK.
2015-12-07T17:53:43 Creating triggers...
2015-12-07T17:53:43 Created triggers OK.
2015-12-07T17:53:43 Copying approximately 188366 rows...
2015-12-07T17:54:07 Copied rows OK.
2015-12-07T17:54:07 Swapping tables...
2015-12-07T17:54:07 Swapped original and new tables OK.
2015-12-07T17:54:07 Dropping old table...
2015-12-07T17:54:08 Dropped old table `my_database`.`_my_table_old` OK.
2015-12-07T17:54:08 Dropping triggers...
2015-12-07T17:54:08 Dropped triggers OK.
Successfully altered `my_database`.`my_table`.

Tuesday, December 1, 2015

Why do developers sometimes forget to add a primary key to tables?

I work a lot with developers and I've discovered that their experience with data modeling or just database work in general ranges from knowing nearly nothing to being experts beyond my own experience. Some have taken courses in college, some are self trained and some have just picked things up on the job.

One of the first things I did two years ago when I started working at my company was to create a list of standards that all new database development in MySQL would adhere to. For example, all tables would use InnoDB as the default storage engine, utf8 as the default collation, no more enums allowed, all tables must have primary keys and lots of other standards.

The item about developers not putting primary keys on tables has been a mystery to me. It just seems very basic to me. We've got a legacy app that largely uses MyISAM tables and the developers that created these tables long before I ever got hired never put primary keys on some of them. There was no database code review process at that time and so they just got away with it.

I've been preaching ever since I got here that all the tables need to be converted to InnoDB and those tables need primary keys. But 24 months later, it still hasn't changed. If it ain't broke, don't fix it right?

I don't have any benchmarks but this is Percona's take scalability problems if your InnoDB tables don't have primary keys:

https://www.percona.com/blog/2013/10/18/innodb-scalability-issues-tables-without-primary-keys/
https://dzone.com/articles/innodb-scalability-issues-due

Here are some others:
http://www.psce.com/blog/2012/04/04/how-important-a-primary-key-can-be-for-mysql-performance/
http://blog.jcole.us/2013/05/02/how-does-innodb-behave-without-a-primary-key/


Finding tables is pretty easy, I copied some code from the data charmer blog to get this:

SELECT table_schema, table_name
FROM information_schema.tables
WHERE (table_catalog, table_schema, table_name) NOT IN
(SELECT table_catalog, table_schema, table_name
FROM information_schema.table_constraints
WHERE constraint_type in ('PRIMARY KEY'))
AND table_schema NOT IN ('information_schema', 'mysql');

Here I want to find all tables that do not have a PRIMARY key. Even if it has a UNIQUE key, I still want to get the table name.

One of the big use case on why the tables need to be converted is that several development managers want to try using Percona Cluster but having all these tables without Primary Keys isn't going to fly.

Thursday, November 26, 2015

Happy Thanksgiving

Thankful to all those that do blog about MySQL. I don't have any mentors at work because there aren't any Senior or Principal DBAs at my company. I learn from Percona Consultants, Percona Live seminars, webinars and lots of blogs.

Saturday, November 21, 2015

Big Data Conference and SQL saturday

I attended several great lectures at SQL Saturday and Big Data topics. One of the presenters said that everyone will soon be expected to be a Data Scientist to some degree. Just as typing was once a rare skill, the coming generation will be expected to program, use databases, mine data, perform statistical analytics on data and be able to present the data in meaningful ways.

Monday, November 16, 2015

Before and after upgrade to Percona 5.6

Last week we had a production server that wasn't doing so well on MySQL Oracle Community 5.5. Upgraded it to MySQL Percona 5.6 with the thread pool enabled. Here is the before and after looking at threads:

BEFORE:



AFTER:

Huge difference eh? The after graph has looked the same with no spikes in threads_running or slow queries for three days now. CPU has also lowered a lot. We did double the number of connections allowed from 1200 to 2400. The next day we never got close to that but RAM usage did increase.

The server only had 32 GB of RAM before and after the upgrade. Even though the server was doing much better with Percona Server 5.6, memory started to swap. Doubled it to 64 GB.



Here is CPU before and after:


Friday, November 6, 2015

New Musical called Kill the Query starring DBAs, NOC, Systems

Slightly bored at work, I wrote a little musical to the tune of Disney's Kill the Beast...

[DBAs:] The query will make off with your data.
[NOC:] {gasp}
[DBAs:] It'll come before the backups are done in the night.
[Developer:] No!
[DBAs:] We're not safe till its execution plan is mounted on my wall! I Say we kill the query!
[NOC:] Kill it!

[NOC I:] We're not safe until it's dead
[NOC ii:] It'll come stalking us in the slow query logs
[Manager:] Set to sacrifice our performance to its monstrous appetite
[NOC iii:] It'll wreak havoc on our servers if we let it wander free
[DBAs:] So it's time to take some action, boys
It's time to follow me

Through the red tape
Through the office floor plan
Through the open collaborators
It's a nightmare but it's one exciting ride
Say a prayer
Then we're there
At the command line or a GUI
And there's something truly terrible inside
It's a slow query
It's blocking all the updates
It locking all the tables
Massive joins
It is creating so much i/o
See it is doing full table scans
See it bring down our app
But we're not coming home
'Til it's dead
Good and dead
Kill the query!

[Developer:] No! I won't let you do this!
[DBAs:] If you're not with us, you're against us!
Bring the developer team lead!
[Team lead:] Get your hands off me!
[DBAs:] We can't have them running off to spawn more instances of the query.
[Developer:] Let us out!
[DBAs:] We'll rid the servers of this query. Who's with me?
[NOC:] I am! I am! I am! )

Turn on your device
Boot up your computer
[DBAs:] Write your documentation to the JIRA board
[NOC:] We're counting on the DBAs to lead the way
Through the red table
Through an open office floor plan
Where within a stalling database
Something's lurking that you shouldn't see ev'ry day
It's a slow query
One as slow as slug
We won't rest
'Til its's good and deceased
Sally forth
Tally ho
Grab your keyboard
Grab your mouse
Praise the monitoring system and here we type!

[DBAs:] We'll lay siege to the database and bring back the query execution plan!
[Developer:] I have to warn the other developers! This is all my fault! Oh, What are we going to do?
[Team Lead:] Now, now, we'll think of something.

[NOC:] We don't like
What we don't understand
In fact it scares us
And this monster is mysterious at least
Bring your query profilers
Bring your log files
Save your workers and their jobs
We'll save our customers from slow performance.
We'll kill the query!

[Systems:] I knew it! I knew it was foolish to get our hopes up.
[Systems:] Maybe it would have been better if we had never hired DBAs at all.
Could it be?
[Systems:] Is it them?
[Systems:] Sacre Bleu! DBAs!
[Systems:] Encroachers!
[Systems:] And they have access to the databases!
[Systems:] Warn the Managers! If it's a fight they want, we'll be
Ready for them! Who's with me?
[DBAs:] Take whatever monitoring you can find. But remember, the
Slow Query is mine!

[Systems:] Hearts ablaze
Banners high
We go marching into battle
Unafraid although the danger just increased
[NOC:] Raise the flag
Sing the song
Here we come, we're three strong
And three DBAs can't be wrong
Let's kill the Query!

[Systems:] Pardon me, Management.
[Management:] Leave me in peace.
[Systems:] But sir! The database is under attack!

[NOC:] Kill the Query!
Kill the Query

[Systems:] This isn't working!
[Systems:] Oh no, we must do something!
[Systems:] Wait, I know! )

[NOC:] Kill the Query!
Kill the Query!

[Systems] What shall we do, Management?
[Management:] It doesn't matter now. Just let the DBAs come.

[NOC:] Kill the Query!
Kill the Query!
Kill the Query!

Friday, October 30, 2015

Moving the bottleneck after adding indexes

A few weeks ago, I identified half a dozen indexes that were missing from database tables for an important application at work. There was no rush to get them added because we typically have all changes including database indexes go through dev, qa, alpha, beta and then to prod. Performance started to get so bad for this application that management approved adding the indexes directly to production.

Several of the queries I identified were were doing full table scans frequently and it was clear that these queries would benefit from the index. The application was seeing some slowness and sometimes hit max connections because these queries were not clearing quickly enough.

I got permission to have the indexes added. At night time I added the indexes, one of which I used Percona Online Schema Change.

The next day, the throughput on the server had increased so much that the application was severely taxing the resources on the database system with high CPU and 200~600 something threads running all of these well tuned queries. It was interesting to see what a huge change those indexes made but also that the bottleneck moved from slow queries to too many queries for that version of MySQL to realistically handle with the number of CPU/cores on the system.

Thursday, October 29, 2015

pt-query digest

I've used pt-query digest ever since I became a MySQL database guy. It is an extremely useful tool, especially when you company refuses to buy profiling tools for you. One drawback is that it is heavy on processing. For example, if you want to process several hours worth of logs it can take over an hour and is heavy on the processing power of the system. I wouldn't ever recommend running it on the same server as a production database because of this.

Today was the first time I actually used it against binary logs. Previously I've only done it on slow query logs. After requesting a copy of the binary logs (still cannot get them for myself), I converted the binary logs to text using mysqlbinlog. 


Here is an example from their documentation:

https://www.percona.com/doc/percona-toolkit/2.2/pt-query-digest.html

mysqlbinlog mysql-bin.000441 > mysql-bin.000441.txt

pt-query-digest --type binlog mysql-bin.000441.txt

Here is how I ran it to get specific time ranges and to save the report to a file:

pt-query-digest --type=binlog --since='2015-10-27 11:22:36' --until='2015-10-27 16:40:49' --limit=100%  mysqllog_all_day.out > mysqllog_tuesday_all_day_digest.txt

There are a lot of options to change how the report is created. Seeing the number of times a query with a specific footprint is executed and a graph of the distribution based on timings is super useful. 

If you want to see more, here are a couple blog posts:
https://www.percona.com/blog/2011/12/29/identifying-the-load-with-the-help-of-pt-query-digest-and-percona-server/
https://www.percona.com/blog/2014/03/14/tools-and-tips-for-analysis-of-mysqls-slow-query-log/




Wednesday, October 28, 2015

Grouping by granular timestamp ranges - by day or by minute

I was asked to run some counts and make it granular to the minute. At first I was like, sure I will just do a group by on the MINUTE. That didn't exactly work as it was missing several hours, so I realized that since every hour follows the same pattern I'd have to add a group by for each hour of the day in addition to the minute...

I want to get a count up to the minute for the entire day:

SELECT timestamp, count(*) as count
FROM table_b
WHERE timestamp BETWEEN '2015-10-26 00:00:00' AND '2015-10-26 23:59:59'
GROUP BY HOUR(timestamp), MINUTE(timestamp);

If you don't add the HOUR as part of the GROUP BY, it will only give you one hour out of the day. 


I want to get a count up to the hour for the entire year:

SELECT timestamp, count(*) as count
FROM table_a
WHERE start BETWEEN '2015-01-01 00:00:00' AND '2015-12-31 23:59:59'
GROUP BY MONTH(timestamp), DAY(timestamp), HOUR(timestamp)

If you don't add the MONTH as part of the GROUP BY, it will only give you the results for a single month. 

I want to get a count up to the day that spans multiple years:

SELECT COUNT(*) as count, SUM(duration),  YEAR(`timestamp `) as year, MONTH(`timestamp `) as month, DAY(`timestamp `) as day
FROM table_a
WHERE
`start` BETWEEN '2013-11-20 00:00:00' AND '2015-12-31 23:59:59'
GROUP BY YEAR(`timestamp `) , MONTH(`start`), DAY(`timestamp `)
ORDER BY year, month, day


Initially it doesn't seem intuitive but once you think about it makes sense.

For all the purists out there, this is an illegal GROUP BY. However, MySQL extends the GROUP BY. Lifted from the manual (https://dev.mysql.com/doc/refman/5.6/en/group-by-handling.html):

MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause. You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate. 

Tuesday, October 20, 2015

Counting things that do not exist with a correlated subquery

Today I had a problem where I needed to find the difference between total and in-active row counts based on a FK value from a parent table. I had a developer explaining that she wanted to do a count for values that don't exist. She gave me a query like this:

SELECT leader_id, count(*) as count
FROM lists
WHERE active = 1
GROUP BY leader_id
HAVING count = 0

I told her that you cannot count something that doesn't exist. It took me a little bit to understand what was needed but I was able to do it with a correlated subquery. Take this example table and example data:


CREATE TABLE `lists` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `leader_id` int(10) unsigned NOT NULL,
  `active` tinyint(4) NOT NULL DEFAULT '1',
  `name` varchar(100) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_fk_leader_id` (`leader_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

INSERT INTO `lists` (`id`, `leader_id`, `active`, `name`) VALUES (NULL, '1', '1', 'one');
INSERT INTO `lists` (`id`, `leader_id`, `active`, `name`) VALUES (NULL, '2', '1', 'two');
INSERT INTO `lists` (`id`, `leader_id`, `active`, `name`) VALUES (NULL, '1', '0', 'One old');
INSERT INTO `lists` (`id`, `leader_id`, `active`, `name`) VALUES (NULL, '1', '1', 'One also old');
INSERT INTO `lists` (`id`, `leader_id`, `active`, `name`) VALUES (NULL, '3', '1', 'three');
INSERT INTO `lists` (`id`, `leader_id`, `active`, `name`) VALUES (NULL, '3', '0', 'three old');
INSERT INTO `lists` (`id`, `leader_id`, `active`, `name`) VALUES (NULL, '4', '0', 'four old');

I wrote this query to get the difference:


SELECT l1.leader_id,
count(*) as total_count,
l2.inactive_count,
count(*) - l2.inactive_count as difference
FROM lists l1
INNER JOIN
(SELECT leader_id, count(*) as inactive_count FROM lists l3 WHERE active = 0
GROUP BY leader_id) l2 ON l1.leader_id = l2.leader_id
GROUP BY l1.leader_id

Results look like this:



What we really wanted was just the value where total count was equal to the inactive count (difference = 0) so I added a HAVING clause also:

SELECT l1.leader_id,
count(*) as total_count,
l2.inactive_count,
count(*) - l2.inactive_count as difference
FROM lists l1
INNER JOIN
(SELECT leader_id, count(*) as inactive_count FROM lists l3 WHERE active = 0
GROUP BY leader_id) l2 ON l1.leader_id = l2.leader_id
GROUP BY l1.leader_id
HAVING difference = 0

Results look like this:





Wednesday, September 23, 2015

How the MySQL query cache can kill performance on your database instance

A while back Percona did an audit for my company telling us that we probably ought to turn off the MySQL query cache on most of our servers because it was causing more overhead that it was providing value. Management had no interest in pursuing this recommendation so it never happened. 

Months later, one of our database servers had become so overloaded with queries that it was performing poorly. The queries themselves were pretty fast queries and tuned as much as I could get them. Management came to me asking how to make the server perform better. I told them more queries were running on the server than it could reasonably handle. The server has 24 CPU but the threads_running as evidenced from this graph taken from a profiling session with Jet Profiler shows that at certain times we had 100 threads running (this is the dark blue color in the graph below). I told them without giving me command level access to delve deeper, the only recommendations I can tell you is to get more powerful database servers and do better load balancing to spread some of the read-only queries onto slave servers.


Meanwhile, a co-worker on my team had noticed from his benchmarking efforts that if the query cache is turned on and he was running PERF UNIX utility (http://www.brendangregg.com/perf.html) he would see the following wait state as the number 1 or number 2 wait state from the mysqld daemon:

QUERY_CACHE:INSERT_INTO_FREE_MEMORY_SORTED_LIST

So he tested turning off the MySQL query cache and that wait state went away and MySQL performed much better on a test system.

With this evidence we took it to management to see if the System Engineers would run PERF on the database server (they won't give my team SSH access) and check to see if the above wait state was at the top of the list. When they checked, it sure was!

We got permission to change the query cache to type 2 (as in "on demand" so that a query would only use the query cache if it was instructed to do so). We ran these commands:

SET GLOBAL query_cache_type=2;
RESET query cache;
FLUSH query cache;

After doing this, the results were pretty good. In benchmarking we had seen a greater than 15% increase in transactions per second being server by the DB server. Threads_running went down to less than 10. Slowness went away and everyone was happy. Here is the graph after making the change:




Here are two graphs from a totally different MySQL server with a different load and similar results:


BEFORE (with query cache turned on):


AFTER (one day later with query cache turned off):


Tuesday, September 15, 2015

Using screen on Linux

One of tools I frequently use in linux tools is called "screen". If you are any kind of administrator (DBA/DevOps/System Engineer) and using Linux systems, screen is a tool you should be familiar with.

Once you've got it installed, screen allows you to keep a session running even after you disconnect from SSH. Why is this helpful? If you are running any command that has the potential to take longer than a few seconds you should run it inside of a screen session. For example here are a few instances where you would want to use screen:
  1. Installing new programs on your linux system
  2. Dumping data from your MySQL instance
  3. Restoring data into a MySQL instance
  4. Adding an index to a table that will take a while
  5. Running a bash script that requires a lot of processing and time to run
Typically I run "screen -R my_screen_name" to attach to a new screen session before performing the work. 

You can run "screen -ls" to see all the screen sessions that are currently running and also to see which one you are attached to. 

You can also use "screen -R <name>" to re-attach to a screen session that you have disconnected from. 

If you are currently attached to a screen session and you type "exit" then it will close the screen session. Unless this is what you want then you shouldn't type exit.

I usually just close the window and let it keep running on the console. Also if I lose VPN access then it just keeps running and I have to logon with a new SSH session to re-attach to the screen session. You can also detach from a session (without it exiting) by running "screen -d <name>"

Frequently you want to log what you did the during the screen session. Using "-L" will log what is done during that session. Screen will save a file like "screenlog.0" in the directory where you ran the command to create the new screen session. 


Here is a good quick reference: http://aperiodic.net/screen/quick_reference

Also if your screen sessions die sometimes you get a funny error like this:
"Suddenly the Dungeon collapses!! – You die…"

Tuesday, September 8, 2015

Orphaned MySQL privileges

I have discovered that some of the MySQL instances which I administer have orphaned privileges for users that do not exist. The user no longer has an entry in MySQL user table but for whatever reason there are privileges in the other mysql.* tables for these non-existent user. It is not clear to me how the privileges got into this state. Maybe someone deleted the user directly from the MySQL user table and didn't use the DROP USER command.

Here is the query I wrote (partly based on the MySQL audit query) to find these orphaned  privileges.

(SELECT @@hostname as 'hostname', CONCAT("'",`user`,"'",'@',"'",`host`,"'") as 'credentials', 'database' as 'priv_level', db as 'object', TRIM(TRAILING ',' FROM(RTRIM(CONCAT(
IF(md.Select_priv = 'Y', 'Select, ', ''),
IF(md.Insert_priv = 'Y', 'Insert, ', ''),
IF(md.Update_priv = 'Y', 'Update, ', ''),
IF(md.Delete_priv = 'Y', 'Delete, ', ''),
IF(md.Create_priv = 'Y', 'Create, ', ''),
IF(md.Drop_priv = 'Y', 'Drop, ', ''),
IF(md.Grant_priv = 'Y', 'Grant, ', ''),
IF(md.References_priv = 'Y', 'References, ', ''),
IF(md.Index_priv = 'Y', 'Index, ', ''),
IF(md.Alter_priv = 'Y', 'Alter, ', ''),
IF(md.Create_tmp_table_priv = 'Y', 'CREATE TEMPORARY TABLES, ', ''),
IF(md.Lock_tables_priv = 'Y', 'LOCK TABLES, ', ''),
IF(md.Create_view_priv = 'Y', 'CREATE VIEW, ', ''),
IF(md.Show_view_priv = 'Y', 'SHOW VIEW, ', ''),
IF(md.Create_routine_priv = 'Y', 'CREATE ROUTINE, ', ''),
IF(md.Alter_routine_priv = 'Y', 'ALTER ROUTINE, ', ''),
IF(md.Execute_priv = 'Y', 'Execute, ', ''),
IF(md.Event_priv = 'Y', 'Event, ', ''),
IF(md.Trigger_priv = 'Y', 'Trigger, ', '')
)))) as 'Privileges'
FROM mysql.db md
WHERE CONCAT("'",`user`,"'",'@',"'",`host`,"'") NOT IN (SELECT DISTINCT CONCAT("'",`user`,"'",'@',"'",`host`,"'") FROM mysql.user)
)
UNION ALL
(SELECT @@hostname as 'hostname', CONCAT("'",`user`,"'",'@',"'",`host`,"'") as 'credentials', 'table' as 'priv_level', table_name as 'object', table_priv as 'Privileges'
FROM mysql.tables_priv mt
WHERE CONCAT("'",`user`,"'",'@',"'",`host`,"'") NOT IN (SELECT DISTINCT CONCAT("'",`user`,"'",'@',"'",`host`,"'") FROM mysql.user)
)
UNION ALL
(SELECT @@hostname as 'hostname', CONCAT("'",`user`,"'",'@',"'",`host`,"'") as 'credentials', 'column' as 'priv_level', column_name as 'object', column_priv as 'Privileges'
FROM mysql.columns_priv mvc
WHERE CONCAT("'",`user`,"'",'@',"'",`host`,"'") NOT IN (SELECT DISTINCT CONCAT("'",`user`,"'",'@',"'",`host`,"'") FROM mysql.user)
)
UNION ALL
(SELECT @@hostname as 'hostname', CONCAT("'",`user`,"'",'@',"'",`host`,"'") as 'credentials', 'procs' as 'priv_level', routine_name as 'object', proc_priv as 'Privileges'
FROM mysql.procs_priv mf
WHERE CONCAT("'",`user`,"'",'@',"'",`host`,"'") NOT IN (SELECT DISTINCT CONCAT("'",`user`,"'",'@',"'",`host`,"'") FROM mysql.user)
);

I've seen on other blogs queries like this:

SELECT host, db, user
FROM
     information_schema.SCHEMATA right join
     mysql.db ON (SCHEMATA.SCHEMA_NAME=db.Db)
WHERE SCHEMA_NAME is null;

That query will help to find privileges for databases that do not exist anymore or never existed but it doesn't work so well in my case. We use wildcards in the GRANT statements a lot because we have thousands of databases on each server with the database has the same prefix. When I run the above query it returns results for the wildcard entries but those are not actually orphaned privileges.


Friday, September 4, 2015

Setting up replication on vagrant systems with Fatal error on UUID

My work uses Vagrant VMs to quickly create the same developer environment for everyone in the engineering department. I was setting up Master-Slave replication with a developer today on two Percona 5.6 Vagrant VMs and I kept getting an error when starting the slave. This was the first time I had tried to setup replication with one of these Vagrant VMs. The error could be seen in the Last_IO_Error column of "SHOW SLAVE STATUS"

Fatal error: The slave I/O thread stops because master and slave have equal MySQL server UUIDs; these UUIDs must be different for replication to work.

I didn't know until today that MySQL 5.6 has a file called auto.cnf in the data directory with the UUID for that system. Since our Vagrant VMs are all copied from the same system then of course they will all have the same auto.cnf file. Just removing the file and restarting MySQL fixed the issue. MySQL automatically created a new auto.cnf after restart.

This UUID is totally separate from the server-id in the my.cnf file.

Thursday, August 20, 2015

Query to Audit the Permissions/Privileges for a MySQL instances

I found some nice queries on this blog (http://blog.devart.com/how-to-get-a-list-of-permissions-of-mysql-users.html) and I've been using them to do user audits on the MySQL databases I administer.  I decided to UNION all the queries together and also make a few additions (checking if the password is blank, hostname, database version). The result is an easy to read list of all permissions for all MySQL users on an instance. Additionally, I changed the syntax of the queries so that the results could be copied and pasted into a GRANT statement.

Because this super long query is "unioning" the results of the global priv, the database priv, table priv, column priv and routine priv, it might be a little confusing to read at first. The entries that show the password hash are the global entries followed by a "-" where the priv is not applicable. Next you will see database priv (if there any) and a "-" for each non applicable section and so on through tables, column, view, and routines.

If you want to generate a list for all the user on the instance set @username to %. If you only want to look for one user then change @username to that username. When I create users on production that have elevated priv such as for a DBA, I create them a "bang" account and a separate read only account. The DBAs will have a bang account with all priv that looks like '!username'@'10.%' and then a second account with just the ability to browse the database and run SELECT queries that looks like 'username'@'10.%'.  When I run this audit query, if I put 'username' into the value below it will also show me '!username'.


SET @username = '<user name>';
SELECT -- the list of global privileges
@@hostname as server_host,
SUBSTRING(@@version , 1, CHAR_LENGTH(@@version) - 4) as 'database_version',
mu.host `host`,
mu.user `user`,
(case when `mu`.`password` ='' then "**Yes**" ELSE 'No' end) as `Empty Password?`,
mu.password as 'password',
'-' as `Database`,
'-' as `Tables`,
'-' as `Views`,
'-' as `Tables Columns`,
'-' as `Views Columns`,
'-' as `Procedures`,
'-' as `Functions`,
TRIM(TRAILING ',' FROM(RTRIM(CONCAT(
IF(mu.Select_priv = 'Y', 'Select, ', ''),
IF(mu.Insert_priv = 'Y', 'Insert, ', ''),
IF(mu.Update_priv = 'Y', 'Update, ', ''),
IF(mu.Delete_priv = 'Y', 'Delete, ', ''),
IF(mu.Create_priv = 'Y', 'Create, ', ''),
IF(mu.Drop_priv = 'Y', 'Drop, ', ''),
IF(mu.Reload_priv = 'Y', 'Reload, ', ''),
IF(mu.Shutdown_priv = 'Y', 'Shutdown, ', ''),
IF(mu.Process_priv = 'Y', 'Process, ', ''),
IF(mu.File_priv = 'Y', 'File, ', ''),
IF(mu.Grant_priv = 'Y', 'Grant, ', ''),
IF(mu.References_priv = 'Y', 'References, ', ''),
IF(mu.Index_priv = 'Y', 'Index, ', ''),
IF(mu.Alter_priv = 'Y', 'Alter, ', ''),
IF(mu.Show_db_priv = 'Y', 'SHOW DATABASES, ', ''),
IF(mu.Super_priv = 'Y', 'Super, ', ''),
IF(mu.Create_tmp_table_priv = 'Y', 'CREATE TEMPORARY TABLES, ', ''),
IF(mu.Lock_tables_priv = 'Y', 'LOCK TABLES, ', ''),
IF(mu.Execute_priv = 'Y', 'Execute, ', ''),
IF(mu.Repl_slave_priv = 'Y', 'REPLICATION SLAVE, ', ''),
IF(mu.Repl_client_priv = 'Y', 'REPLICATION CLIENT, ', ''),
IF(mu.Create_view_priv = 'Y', 'CREATE VIEW, ', ''),
IF(mu.Show_view_priv = 'Y', 'SHOW VIEW, ', ''),
IF(mu.Create_routine_priv = 'Y', 'CREATE ROUTINE, ', ''),
IF(mu.Alter_routine_priv = 'Y', 'ALTER ROUTINE, ', ''),
IF(mu.Create_user_priv = 'Y', 'CREATE USER, ', ''),
IF(mu.Event_priv = 'Y', 'Event, ', ''),
IF(mu.Trigger_priv = 'Y', 'Trigger, ', '')
)))) AS `Privileges`
FROM
mysql.user mu
WHERE user LIKE CONCAT('%', @username)
UNION
SELECT -- the list of privileges for a database
@@hostname as server_host,
SUBSTRING(@@version , 1, CHAR_LENGTH(@@version) - 4) as 'database_version',
md.host `host`,
md.user `user`,
'-' as `Empty Password?`,
'-' as 'password',
md.db `Database`,
'-' as `Tables`,
'-' as `Views`,
'-' as `Tables Columns`,
'-' as `Views Columns`,
'-' as `Procedures`,
'-' as `Functions`,
TRIM(TRAILING ',' FROM(RTRIM(CONCAT(
IF(md.Select_priv = 'Y', 'Select, ', ''),
IF(md.Insert_priv = 'Y', 'Insert, ', ''),
IF(md.Update_priv = 'Y', 'Update, ', ''),
IF(md.Delete_priv = 'Y', 'Delete, ', ''),
IF(md.Create_priv = 'Y', 'Create, ', ''),
IF(md.Drop_priv = 'Y', 'Drop, ', ''),
IF(md.Grant_priv = 'Y', 'Grant, ', ''),
IF(md.References_priv = 'Y', 'References, ', ''),
IF(md.Index_priv = 'Y', 'Index, ', ''),
IF(md.Alter_priv = 'Y', 'Alter, ', ''),
IF(md.Create_tmp_table_priv = 'Y', 'CREATE TEMPORARY TABLES, ', ''),
IF(md.Lock_tables_priv = 'Y', 'LOCK TABLES, ', ''),
IF(md.Create_view_priv = 'Y', 'CREATE VIEW, ', ''),
IF(md.Show_view_priv = 'Y', 'SHOW VIEW, ', ''),
IF(md.Create_routine_priv = 'Y', 'CREATE ROUTINE, ', ''),
IF(md.Alter_routine_priv = 'Y', 'ALTER ROUTINE, ', ''),
IF(md.Execute_priv = 'Y', 'Execute, ', ''),
IF(md.Event_priv = 'Y', 'Event, ', ''),
IF(md.Trigger_priv = 'Y', 'Trigger, ', '')
))))  AS `Privileges`
FROM
mysql.db md
WHERE user LIKE CONCAT('%', @username)
UNION
SELECT -- the list of privileges for tables
@@hostname as server_host,
SUBSTRING(@@version , 1, CHAR_LENGTH(@@version) - 4) as 'database_version',
mt.host `host`,
mt.user `user`,
'-' as `Empty Password?`,
'-' as 'password',
'-' as `Database`,
CONCAT(mt.Db, '.', mt.Table_name) `Tables`,
'-' as `Views`,
'-' as `Tables Columns`,
'-' as `Views Columns`,
'-' as `Procedures`,
'-' as `Functions`,
REPLACE(mt.Table_priv, ',', ', ') AS `Privileges`
FROM
mysql.tables_priv mt
WHERE
user LIKE CONCAT('%', @username)
AND
mt.Table_name IN
(SELECT
DISTINCT
t.table_name `tables`
FROM
information_schema.tables AS t
WHERE
t.table_type IN
('BASE TABLE', 'SYSTEM VIEW', 'TEMPORARY', '') OR
t.table_type <> 'VIEW' AND
t.create_options IS NOT NULL
)
UNION
SELECT -- the list of privileges for views
@@hostname as server_host,
SUBSTRING(@@version , 1, CHAR_LENGTH(@@version) - 4) as 'database_version',
mv.host `host`,
mv.user `user`,
'-' as `Empty Password?`,
'-' as 'password',
'-' as `Database`,
'-' as `Tables`,
CONCAT(mv.Db, '.', mv.Table_name) `Views`,
'-' as `Tables Columns`,
'-' as `Views Columns`,
'-' as `Procedures`,
'-' as `Functions`,
REPLACE(mv.Table_priv, ',', ', ') AS `Privileges`
FROM
mysql.tables_priv mv
WHERE
user LIKE CONCAT('%', @username)
AND
mv.Table_name IN
(SELECT
DISTINCT
v.table_name `views`
FROM
information_schema.views AS v
)
UNION
SELECT -- the list of privileges for table columns
@@hostname as server_host,
SUBSTRING(@@version , 1, CHAR_LENGTH(@@version) - 4) as 'database_version',
mtc.host `host`,
mtc.user `user`,
'-' as `Empty Password?`,
'-' as 'password',
'-' as `Database`,
'-' as `Tables`,
'-' as `Views`,
CONCAT(mtc.Db, '.', mtc.Table_name, '.', mtc.Column_name) `Tables Columns`,
'-' as `Views Columns`,
'-' as `Procedures`,
'-' as `Functions`,
REPLACE(mtc.Column_priv, ',', ', ') AS `Privileges`
FROM
mysql.columns_priv mtc
WHERE
user LIKE CONCAT('%', @username)
AND mtc.Table_name IN
(SELECT
DISTINCT
t.table_name `tables`
FROM
information_schema.tables AS t
WHERE
t.table_type IN
('BASE TABLE', 'SYSTEM VIEW', 'TEMPORARY', '') OR
t.table_type <> 'VIEW' AND
t.create_options IS NOT NULL
)
UNION
SELECT -- the list of privileges for view columns
@@hostname as server_host,
SUBSTRING(@@version , 1, CHAR_LENGTH(@@version) - 4) as 'database_version',
mvc.host `host`,
mvc.user `user`,
'-' as `Empty Password?`,
'-' as 'password',
'-' as `Database`,
'-' as `Tables`,
'-' as `Views`,
'-' as `Tables Columns`,
CONCAT(mvc.Db, '.', mvc.Table_name, '.', mvc.Column_name) `Views Columns`,
'-' as `Procedures`,
'-' as `Functions`,
REPLACE(mvc.Column_priv, ',', ', ') AS `Privileges`
FROM
mysql.columns_priv mvc
WHERE
user LIKE CONCAT('%', @username)
AND mvc.Table_name IN
(SELECT
DISTINCT
v.table_name `views`
FROM
information_schema.views AS v
)
UNION
SELECT -- the list of privileges for procedures
@@hostname as server_host,
SUBSTRING(@@version , 1, CHAR_LENGTH(@@version) - 4) as 'database_version',
mp.host `host`,
mp.user `user`,
'-' as `Empty Password?`,
'-' as 'password',
'-' as `Database`,
'-' as `Tables`,
'-' as `Views`,
'-' as `Tables Columns`,
'-' as `Views Columns`,
CONCAT(mp.Db, '.', mp.Routine_name) `Procedures`,
'-' as `Functions`,
REPLACE(mp.Proc_priv, ',', ', ') AS `Privileges`
FROM
mysql.procs_priv mp
WHERE
mp.Routine_type = 'PROCEDURE'
AND user LIKE CONCAT('%', @username)
UNION
SELECT -- the list of privileges for functions
@@hostname as server_host,
SUBSTRING(@@version , 1, CHAR_LENGTH(@@version) - 4) as 'database_version',
mf.host `host`,
mf.user `user`,
'-' as `Empty Password?`,
'-' as 'password',
'-' as `Database`,
'-' as `Tables`,
'-' as `Views`,
'-' as `Tables Columns`,
'-' as `Views Columns`,
'-' as `Procedures`,
CONCAT(mf.Db, '.', mf.Routine_name) `Functions`,
REPLACE(mf.Proc_priv, ',', ', ') AS `Privileges`
FROM
mysql.procs_priv mf
WHERE
mf.Routine_type = 'FUNCTION'
AND user LIKE CONCAT('%', @username);

Wednesday, August 12, 2015

Bang Account for MySQL

When I was a Windows admin my team referred to the special user we had for doing admin work as a bang account. This would be a separate account from your normal active directory account. My manager had a bang bang account which allowed him to administer the admins.

When I became a DBA I carried this concept into the creation of users on database systems. I do a lot of Database Engineering work and I'm constantly troubleshooting queries, writing queries, dropping databases, deleting data, etc. Our database accounts are not yet tied into LDAP or PAM or active directory or tied into roles or anything like that yet. They are managed independently of any other systems.

Even with the precautions of colored screens and warnings it is easy to mix up that I'm on a prod systems and not a dev systems and accidentally drop a database that wasn't supposed to be dropped. Because of this I create two accounts on production for DBAs. A bang account that has all priv and a read only account that only has Select, Process, Execute, REPLICATION CLIENT, Trigger. It allows me to browse the database without the risk of me accidentally making changes. If I do need to do admin work or make a change then I log on with the bang account and immediately close the connection when I'm done. I never saved the bang account credentials so that I don't accidentally logon unaware. There have been so many times when I thought I was on a dev system or a test system and issued a DDL change but I was actually on production, since I was logged onto my read only account, no changes were made.

To differentiate a bang account from a read only account I simply prefix them with an exclamation point. Some people have told me this is silliness and creates additional user management overhead. I feel it is worth it. Does anyone else do some thing similar?

Tuesday, August 11, 2015

I found a MySQL 5.6 bug

I setup a MySQL 5.6 instance and immediately granted table level permissions to a user. I then ran my audit query to confirm that the permissions were added. The query returned zero results. I thought maybe I had done something wrong. I confirmed that the permissions were actually there.

Here is an example of the table level portion of the audit query:

SELECT -- the list of privileges for tables
mt.host `host`,
mt.user `user`,
CONCAT(mt.Db, '.', mt.Table_name) `Tables`,
REPLACE(mt.Table_priv, ',', ', ') AS `Privileges`
FROM
mysql.tables_priv mt
WHERE
mt.Table_name IN
(SELECT
DISTINCT
t.table_name `tables`
FROM
information_schema.tables AS t
WHERE
t.table_type IN
('BASE TABLE', 'SYSTEM VIEW', 'TEMPORARY', '') OR
t.table_type <> 'VIEW' AND
t.create_options IS NOT NULL
)

The query has a SUB SELECT that checks to make sure tables with the name of grant actually exist. For some reason doing a sub-select on the information schema does not work on MySQL 5.6. There is a bug already reported for this:

http://bugs.mysql.com/bug.php?id=77191

The work around is to run this:
set optimizer_switch='semijoin=off';

Friday, August 7, 2015

MySQL Tuning methodology?

I really want to have a solid tuning methodology. That is partly why I wrote the post on the 12 Steps to MySQL Query Tuning. Whenever I do internet searches on this topic, I get so many random slide presentations.

For several years I've thought of database tuning as more of an art than a science. I say this because each database installation can be so different. Some database are meant for OLTP while others are meant for OLAP or some are treated like queues. Every application has different usage patterns for how it may or may not stress a database, every database is going to have different data structures, have different hardware resources available, different network constraints, different hosting (virtual, SAN, on-premise), and on an on. You cannot say that what works with one systems is going to work for another system. Not all systems fit easily into a cookie cutter approach where you can use the same settings on each database instance and expect the same results.

While chatting with a Principal MySQL consultant at Pythian, he said he definitely had a tuning methodology and that my belief that database tuning is more of an art with trial an error was not good enough to him.

My tuning approach has been this:

Step 1:
Review the database parameters, compare them against what my "baseline" of recommend parameters is. For MySQL this means review the my.cnf file. If I don't have file system access, I compare what is in the "SHOW GLOBAL VARIABLES". If I find anything obviously out of place then I will see if I can make incremental changes. Sometimes I think a variable needs to be increased and waste time incrementally increasing it when it actually worked better being decreased.

Step 2:
Next I'll review the results of real-time profiling. I'll be looking for low hanging fruit. Are there more threads running on the instance than CPU available? Too many connections being created for this system to handle? If there is replication, is it lagging? Specific queries with poor execution plans? Full table scans, full index scans, poorly written queries? Are the queries ones that were written by humans or generated by an ORM tool? Are tables lacking indexes? Are there specific DELETES or UPDATES causing excessive locking or blocking?

Step 3:
Then I'll review what my trend history tool shows me. That could be Cacti, Zabbix, Ganglia or whatever you use. Is this server experiencing a higher trend than previously? Did I/O significantly increase recently? Was there any change or spike that stands out? If I see something concerning, I'll dig further into that.

Step 4:
If I have access to the  system, I'll look at MySQL wait states, memory usage, events happening, error logs, slow query logs.

Step 5:
If that hasn't proven useful I'll look into the table structures to see if the data model has obvious points of contention. Since I'm usually not an expert in the application that is using this database, I'll go ask the developers, what is the purpose of such and such query, why is it running so frequently, is is really necessary to do such and such? Sometimes the answer is, "We actually don't need to run that query 50,000 times per minute, once every minute would be fine." And then the code or scheduler or whatever is adjusted to fit that new discovery.

I don't have a ton of experience in MySQL command level tools for in depth trace analysis and I haven't had the opportunity or time to go deeper. Up to know, with decent data modeling, good my.cnf parameters, proper memory settings, load balancing with Master-Slave setups, I've been able to achieve "good enough" performance to meet SLAs. If I can get the queries to run less than 100 ms, I'm pretty happy and the business is happy.

Since my contact at Pythian wasn't kind enough to educate me on his own tuning methodology I started poking around for one and came upon this blog entry: http://www.pythian.com/blog/optimizing-your-mysql-tuning-methodology/

It stills seems "artsy" with plenty of trial and error to me.

I like having a list of tips like this but frequently all of these tips have already been applied and I still need a tuning methodology:
http://www.monitis.com/blog/2011/07/12/101-tips-to-mysql-tuning-and-optimization/

I also like this long list from Dual, this is probably the most "procedural based" tuning list I've seen:
http://www.fromdual.com/mysql-performance-tuning-key

What about you? Do you have methodology to share?


Thursday, July 30, 2015

sub-select to find max values with a group

Several times recently I've needed to get the max or the min value within a group of values. For example, say I want the max id for a user each day or the max id for each user who has several entries in some kind of a history table.

 At first I struggled with how I was going to get that data but the solution was actually fairly simple with a sub-select and GROUP BY. Since I realized how easy this is to do, I've had to do a very similar thing at work quite often. It is nice to remember how useful GROUP BY can be. This is a similar query I wrote recently with a GROUP BY and a sub-select. 

This trick only works when the table has an auto-increment id. You also have to make an assumption for your data that the max auto increment id has the highest value you are looking for in the group. 

Might result in full index scans so it may not be the most optimal solution but it worked for what I needed it for:
    
    SET @day_number = 1;
    SELECT id
    FROM table_a
    WHERE date_created BETWEEN DATE_SUB(NOW(), INTERVAL @day_number DAY) AND NOW()
    AND value1 < (@day_number*86400)
    AND id IN (SELECT MAX(id) FROM table_a GROUP BY user_id);

Friday, July 24, 2015

Adding new MySQL users to an AWS MySQL instance and using an SSH tunnel

This may sound really stupid but I was creating users today on a MySQL instance that is hosted in AWS and couldn't figure out what hostname value to use for the new users in order to get them to work.

Whoever at my work setup the instance completely forgot the root password. They were managing this themselves and eventually it got thrown over the fence to my team.

This site has a great little tutorial on resetting the root password for MySQL when you have shell access to the box:
http://www.rackspace.com/knowledge_center/article/mysql-resetting-a-lost-mysql-root-password

Normally I work with MySQL instances in our own local data center. This MySQL instance was in AWS and couldn't be accessed without tunneling in via SSH. I wasn't sure what the host should for the MySQL users. At first I was limiting them to the IP range of my office. After several attempts to logon to the MySQL instances over the SSH tunnel it still wasn't working. At then it dawned on me that after connecting to the system via SSH you are localhost. So the mysql users have to be user_name@localhost. Worked fine after I realized that!

Tuesday, July 21, 2015

12 Steps to MySQL Query Tuning

I've been a fan of Solar Winds' 12 Steps to Query Performing Tuning Infographic which they published for Oracle and SQL Server. However, they didn't publish one for MySQL. Using the same 12 step model as in their infographic, I've written my own:

 12 Steps to MySQL Query Tuning


Step 1 - Examine the Execution Plan

Make sure you are working with tables, not views
Make Certain you know the row count
Review the Access Predicates (key_len, ref columns in EXPLAIN PLAN)
- Is all or part of composite index being used?
Are there full table scans? Are there full index scans?
Review the Extra column
- Do you see "Using temporary;" or "Using filesort;" (BAD)
- Do you see "Using Index" (GOOD)
Is the Possible keys column NULL?
Is the value in the Rows column high? (BAD)
Does the type column show RANGE?
- Is it having traverse large number of rows in the Index leaf nodes?
- You want to keep the scanned index range as small as possible
Examine the EXPLAIN EXTENDED
SHOW WARNINGS <- Run this right after EXPLAIN EXTENDED to review how the actual SQL will be executed

Step 2 - Review the filters, JOINS, WHERE predicates, ORDER BY, GROUP BY, HAVING BY, LIMIT

What is the filtered row count?
If there is a LIMIT clause? Is it occurring after or being the entire table is accessed?
Are there functions in the ORDER BY?
Review the data types to avoid implicit conversion
- truthy/falsey filtering in the Predicates (converting columns to Boolean)
- WHERE !TIMESTAMP (ignores index)
Don't wrap your indexes in expressions in the WHERE predicates
- WHERE DATE(TIMESTMAP) = '2010-10-10' (ignores index)
Should the query be using an INNER JOIN vs. LEFT JOIN?
Is the query unnecessarily filtering out duplicates? (UNION vs. UNION ALL)
Are there bind variables? Using prepared statements?

Step 3 - Know the Selectivity of the table data

Is the data skewed resulting in queries that cannot effectively use existing indexes?
Make Sure to work with the smallest possible logical set
Can additional WHERE predicates be added?
Know when the predicate is applied when dealing with multiple tables, you want the most SELECTIVE predicate earlier than later

Step 4 - Gather Table Information

Review Table definitions
Have the stats been updated? Get current stats.
Are there triggers on the table? If so, how often do they fire? What are they doing?
What is the storage engine? Does it allow for row level locking vs. table level locking?
Make sure that datatypes match for the columns that are being JOINED between two tables, including the CHARSET and COLLATION

Step 5 - Evaluate Existing Indexes

Look at Index Definitions
Are there redundant indexes?
Are there composite indexes?
Is there a primary key on the table?
Is the primary key an integer, char?
Is the Primary key unnecessarily large as in the case a large varchar primary key?
Are there far too many secondary indexes?
Know the Selectivity of the indexes
Is the cardinality for the indexes high/low?
Could a fulltext index be useful?
Is each column that could be JOINED indexed?
Check to see if covering indexes can be created (avoid duplication and overlapping of indexes)

Step 6 - Analyze Columns in the WHERE clause and SELECT

Look for SELECT * or scalar function (the more data brought back the less optimal it may be to use certain functions)
Look for CASE, CAST, CONVERT
Are there sub queries? Can the sub query be converted to a JOIN?

Step 7 - Run the Query and record baseline metrics

Gather average execution times
If you are using Master-Master, Master-Slave or some other type of cluster where each instance has the same data, test running the query on each replica to see if the execution time and/or execution plan changes

Step 8 - Adjust/Tune the query

Focus on the most expensive operations first (full table scans, index scans, file sort, temporary table)
Look for high cost steps to eliminate
Consider a covering index, which includes every column that satisfies the query
In the case of ORDER BY and GROUP BY, check to see if you can eliminate "Using temporary;" or "Using filesort;" by properly indexing the table
Try to eliminate the frequent counting of rows by utilizing summary tables
Rewrite sub queries as a join
Seek vs. Scans, which is more expensive?

NOTE: Out of date statistics can impact performance. Issue "ANAYLZE TABLE"
NOTE: Fragmented tables can impact performance - during a planned maintenance Issue "OPTIMIZE TABLE"

Step 9 - Re-run the query

Record the results and compare
Only make one change at a time

Step 10 - Consider adjusting indexes

Test using index hints to see if the optimizer is choosing poorly
Look to reduce logical I/O

Step 11 - Incremental changes

Continue to re-run query and record results after each incremental change
Have a peer review the query it with you to see if you are missing something

Step 12 - Engineer out the Stupid

Abuse of wildcards?
Use of views?
Scalar Functions in the query?
Can the query be cached? Does it make sense to be using MySQL query cache or a dedicated caching layer? Consider turning off MySQL query cache as the effort to check the cache can severely slow down your queries.
Is the query using a cursor in a stored procedure?
Does the server have enough RAM to keep indexes in memory?
Are the data files stored on fast drives?
Are all the tables in your query using the same storage engine?
Do all your tables have Primary Keys, proper secondary indexes?
Is your table normalized/de-normalized as appropriate?
Is your MySQL server so heavily overloaded that queries are performing poorly?
Has the query been tested on large and small data sets?
Is the query being used on a sharded database where different databases have the same tables but possibly very different data sets?
Is the query doing silly things like over use of the OR/UNION command when an IN clause may be more appropriate?
Are you forcing the query to do full table scan by putting wildcard characters (%) on the left side of an index?
Is the table definition using the right data types? Abuse of CHAR(255), VARCHAR(255), BIGINT?
Can a complex query with dependent sub queries be split into multiple queries and/or use transactions?

UPDATE April 2016:
Solar Winds finally came out with an infographic for MySQL:
http://www.solarwinds.com/assets/infographics/dpa-mysql-12-steps.aspx



Tuesday, July 14, 2015

time zone errors when replaying logs


I kept getting this annoying error on my test instances of MySQL when running pt-query-upgrade using logs I pulled from production:

DBD::mysql::st execute failed: Unknown or incorrect time zone: 'America/Los_Angeles' [for Statement "SET time_zone = 'America/Los_Angeles'"]

SET time_zone = 'America/Los_Angeles'


The solution was to install time zones onto the testing instance like this:

mysql_tzinfo_to_sql /usr/share/zoneinfo | mysql -u root mysql


Some of the timezones didn't install but it did install for America/Lost Angelese which made my error go away. 

Tuesday, June 23, 2015

pt-upgrade

I love the idea of pt-upgrade in the Percona Toolkit but have struggled to effectively use the tool.

I've taken a backup of all the databases of a database server and restored that backup onto two different hosts, one running MySQL 5.1 and one running MySQL 5.6. I want to find failing queries that work on MySQL 5.1 but don't work on MySQL 5.6.

For initial testing, I just want to find SELECT statements that return different results or fail.

I've read over the documentation and this is how I'm executing the tool:

pt-upgrade h=SERVER1 -uUSERNAME -pPASSWORD  h=SERVER2 uUSERNAME -pPASSWORD --type='genlog' --max-class-size=2 --max-examples=2  --run-time=10m 'general_log_mysqld.log' 1>report.txt 2>err.txt &

I've chosen to use the general log in this example instead of the slow query log but I've tested with both. The slow query log is more bloated with useless info while the general log has more bang for your buck in terms of size. The big problem I have with this tool is that I have hundreds databases on my server. Inside the general log are lots of "use <database>" statements before each query. However, when I run the tool it complains about not being able to run the query because the tables are missing:


On both hosts:

DBD::mysql::st execute failed: No database selected [for Statement "....

I can't just select one database when I run the command, the tool needs to read the USE <database> from the log file.

I tried again and found a database server with only a few databases. Sometimes it was working and sometimes it was failing to use the correct databases. As a work around I created the same tables inside each database. This only worked if all the tables had unique names.  This way it would not matter which database the tool was running against. At this point, at least I got a report with useful results.

It isn't always feasible to do what I did by putting the same tables in every database...what am I doing wrong with this tool?

Issues I've had with Percona Upgrade:
  1. Queries pulled from production can be too intensive for a test machine running with limited resources. Queries I was testing with  were taken from systems with 30+ cores and 64 GB of RAM, queries would time out after a few minutes and the tool would stop working, queries taken from prod should be run on a prod like database servers, also queries like this "SELECT SQL_NO_CACHE * FROM table_name" which came from a backup job seemed  to break the tool
  2. Some queries broke the tool, this included not only the backup queries, but weird custom SQL that were written by developers internally
  3. The same query would happen so frequently in the logs (100s of thousands of times) that the reports became useless
  4. There were a lot of failed queries when running the SELECT statements because of missing tables, these were not true temporary tables but tables that are created for a short time and then dropped, these tables were not part of my dump and restore because I didn't know the code was creating these "transitory" tables
  5. The tool would not database context switch for me, I don't know why but when the logs would issue  a "use database_name" it would ignore it and try to run a query that was meant for a different database. The tool would report a huge number of failed queries because of this. My work around was to create the same tables in every schema so I would not get those failed queries anymore
After engaging Percona consulting they told me to only use the SLOW LOG files. For my system they said I would need to massage the files with pt-query-digest before running pt-upgrade like this:

Step 1: Massage a slow query log for SELECT statements using pt-query-digest
pt-query-digest --filter '$event->{arg} =~ m/^select/i' --sample 5 --no-report --output slowlog my_slow.log > my_slow_massaged_for_select.log
Step 2: Next run pt-upgrade
pt-upgrade h=my_source_host.com -uUSER -pPASSWORD h=my_target_host.com -uUSER -pPASSWORD --type=slowlog --max-class-size=1 --max-examples=1 --run-time=1m 'slow_massaged_for_select.log' 1> report_1.txt 2> error_1.txt &
The above massaging worked well for SELECT statements but when testing DDL/DML I ran into more problems. I would still see a lot errors for tables that don't exist because they are tmp tables and come and go during a session.

Step 1: Massage a slow query log for SELECT/DDL/DML statements using pt-query-digest
pt-query-digest --filter '$event->{arg} =~ m/^[select|alter|create|drop|insert|replace|update|delete]/i' --sample 5 --no-report --output slowlog alpha_slow.log > my_slow_massaged_for_dml_ddl.log
Step 2: Clean up the log file to remove LOCKS
Because my slow log file had a number of LOCK statements, I used SED to remove all rows that had any references to LOCKS.
Step 3: Next run pt-upgrade
pt-upgrade h=my_source_host.com -uUSER -pPASSWORD h=my_target_host.com -uUSER -pPASSWORD --type=slowlog --max-class-size=1 --max-examples=1 --run-time=1m --no-read-only 'my_slow_massaged_for_dml_ddl.log' 1> report_2.txt 2> error_2.txt &
Even running all the DDL, I would still see a lot errors for tables that don't exist because they are tmp tables and come and go during a session. At this point, I was able to get more confidence that the upgrade was going to work. 


Friday, June 19, 2015

Why is the triggers table in the information_schema so slow?

We have a sharded MySQL infrastructure at work where we sometimes create new shards from a .sql file. Each shard has all the same tables/triggers/functions, etc but the data is unique to the customer for which that shard is assigned to. This file is created from an alpha environment which has different users than production. This started to result in a situation where we are getting definers for triggers/functions on production but for users that do not exist. I wrote a script to send alerts for these but manually fixing them was getting annoying. The permanent fix so that doesn't happen anymore is in the works but the bureaucracy at work is taking too long so I wrote a bash script to fix them automatically on production. A portion of my bash script was inspired from this site:

http://codersresource.com/news/dzone-snippets/change-ownership-of-definer-and-triggers

Here is the script from the above site:

#!/bin/sh 
host='localhost' 
user='root' 
port='3306' 
# following should be the root@localhost password 
password='root@123' 

# triggers backup 
mysqldump -h$host -u$user -p$password -P$port --all-databases -d --no-create-info > triggers.sql 
if [[ $? -ne 0 ]]; then exit 81; fi 

# stored procedure backup 
mysqldump -h$host -u$user -p$password -P$port --all-databases --no-create-info --no-data -R --skip-triggers > procedures.sql 
if [[ $? -ne 0 ]]; then exit 91; fi 

# triggers backup 
mysqldump -h$host -u$user -p$password -P$port --all-databases -d --no-create-info | sed -e 's/DEFINER[ ]*=[ ]*[^*]*\*/\*/' > triggers_backup.sql 
if [[ $? -ne 0 ]]; then exit 101; fi 

# drop current triggers 
mysql -h$host -u$user -p$password -P$port -Bse"select CONCAT('drop trigger ', TRIGGER_SCHEMA, '.', TRIGGER_NAME, ';') from information_schema.triggers" | mysql -h$host -u$user -p$password -P$port 
if [[ $? -ne 0 ]]; then exit 111; fi 

# Restore from file, use root@localhost credentials 
mysql -h$host -u$user -p$password -P$port < triggers_backup.sql 
if [[ $? -ne 0 ]]; then exit 121; fi 

# change all the definers of stored procedures to root@localhost 
mysqldump -h$host -u$user -p$password -P$port --all-databases --no-create-info --no-data -R --skip-triggers | sed -e 's/DEFINER=[^*]*\*/\*/' | mysql -h$host -u$user -p$password -P$port 
if [[ $? -ne 0 ]]; then exit 131; fi 

My script was different but the idea was the same. However, it blew up and this is a bad idea. I tested it several times on a non-prod and it was working pretty good. My testing environment only had about a dozen databases. The particular MySQL server that has over 1500 databases. For whatever reason that I haven't been able to pinpoint, running a query on the triggers table takes 20 minutes. That little portion in the above script that creates a DROP triggers query with the SELECT concat and then sends the results back into another mysql session is not a good idea! It will cause queries to get locked up and hold up replication. It created slave lag in our clustered environment which got really far behind. There are plenty of other problem with our MySQL infrastructure which are out of my control which also contributed to this situation.

Long story short is that I'll be re-writing my version of the script to use SHOW TRIGGERS/SHOW FUNCTION STATUS/SHOW PROCEDURE STATUS because those commands run much faster. However, I'll have to loop over every single database and limit my query to only that database.

Tuesday, June 16, 2015

/bin/rm: Argument list too long

I was doing some replication testing with master-master on Percona 5.6 today and I kept having problems with my test instances not starting or shutting down when I changed the my.cnf setting file to use the new relay log files. I had previously used theses boxes for master-slave testing and had left it for weeks un-attended and replication had broken and relay log files were building up like crazy.

I ran this:
STOP SLAVE;
RESET SLAVE;

cd into this directory:


/var/log/mysql/

Lots of relay files like this:

relay.137105

I thought that RESET SLAVE was suppose to delete and those. Maybe it is and it just taking a long time.

So I try to delete them:

rm /var/log/mysql/relay.*
-bash: /bin/rm: Argument list too long

rm /var/log/mysql/relay.1*

bash: /bin/rm: Argument list too long

I was able to run this:

rm /var/log/mysql/relay.12*
rm /var/log/mysql/relay.13*
rm /var/log/mysql/relay.14*
and on and on

I didn't want to spend all night doing this.

I tried this:

cd /var/log/mysql/

find . -name "relay.*" -print | xargs rm
Copied from here: http://itigloo.com/how-do-i/binrm-argument-list-too-long-error/

Still a no go. It just hangs. Seems to be too much for my underpowered VM to handle.

I tried it again with only one file:

find . -name "relay.097436" -print | xargs rm

It worked. 

How about a few more files:

find . -name "relay.3*" -print | xargs rm

That worked.

I tried this again:


find . -name "relay.*" -print | xargs rm

At last it worked!!