Tuesday, December 26, 2017

Setting up consumers - events_statements_history for Aurora instances with performance schema

I am wanting to setup Percona Monitoring Manager (PMM) for a client which uses Aurora. To do so, the documentation says you need to turn on the consumer in the performance schema so that events_statements_history os enabled. There isn't an option in the AWS console to do this and there isn't a parameter group setting to modify this so it must be done directly on the instance.

Here I will show you from the command line:

MySQL [(none)]> show global variables like 'performance_schema';

+--------------------+-------+

| Variable_name | Value |

+--------------------+-------+

| performance_schema | ON |

+--------------------+-------+

1 row in set (0.00 sec)

MySQL [(none)]> use performance_schema

Reading table information for completion of table and column names

You can turn off this feature to get a quicker startup with -A

Database changed

MySQL [performance_schema]>

MySQL [performance_schema]> select * from setup_consumers WHERE name = 'events_statements_history';

+---------------------------+---------+

| NAME | ENABLED |

+---------------------------+---------+

| events_statements_history | NO |

+---------------------------+---------+

1 row in set (0.00 sec)

MySQL [performance_schema]> update setup_consumers set enabled='yes' WHERE name = 'events_statements_history';

Query OK, 1 row affected (0.00 sec)

Rows matched: 1 Changed: 1 Warnings: 0

MySQL [performance_schema]> select * from setup_consumers WHERE name = 'events_statements_history';

+---------------------------+---------+

| NAME | ENABLED |

+---------------------------+---------+

| events_statements_history | YES |

+---------------------------+---------+

1 row in set (0.00 sec)

However, after an instance restart, the changes performed to setup_consumers table will be reversed.

MySQL [performance_schema]> show global variables like 'performance_schema';

+--------------------+-------+

| Variable_name | Value |

+--------------------+-------+

| performance_schema | ON |

+--------------------+-------+

1 row in set (0.00 sec)

MySQL [performance_schema]> select * from setup_consumers WHERE name = 'events_statements_history';

+---------------------------+---------+

| NAME | ENABLED |

+---------------------------+---------+

| events_statements_history | NO |

+---------------------------+---------+

1 row in set (0.01 sec)

MySQL [performance_schema]> update setup_consumers set enabled='yes' WHERE name = 'events_statements_history';

Query OK, 1 row affected (0.01 sec)

Rows matched: 1 Changed: 1 Warnings: 0

MySQL [performance_schema]> select * from setup_consumers WHERE name = 'events_statements_history';

+---------------------------+---------+

| NAME | ENABLED |

+---------------------------+---------+

| events_statements_history | YES |

+---------------------------+---------+

1 row in set (0.00 sec)

I could setup a Zabbix trigger that turns it back whenever it detects a server restart or some other script. I think the easier route it to create an event that keeps turning it back on. Here is a simple example:

DROP EVENT IF EXISTS enable_statement_history;

CREATE

DEFINER=`root`@`localhost`

EVENT IF NOT EXISTS enable_statement_history

ON SCHEDULE EVERY 60 SECOND

STARTS NOW()

update performance_schema.setup_consumers set enabled='yes' WHERE name = 'events_statements_history';

Monday, December 18, 2017

Encrypting the defaults file for logging into mysql

MySQL has an option to store credentials in a file so that you don't have to enter them at the command line when connecting to MySQL.

For example the "normal" way of connecting to MySQL from the command line would be like this:

# mysql -u<my_user> -p -h<server name>
Enter password:

OR

# mysql -u<my_user> -p<MYPassword> -h<server name>

In the first example you have to enter in your password which won't work for scripts. In the second example you would have to hard code the password into your script or pull it out of a variable but it would get stored in the command line making it visible by anyone who can see what is running on the system. This is bad from a security perspective.

Instead you can use a defaults file and reference the file to logon to MySQL like this:

mysql --defaults-file=location_of_my_default_file.cnf -h<server name>

The defaults file only needs to contain these three lines:

[client]
user=my_user
password='123#_BLABLA'

You can also add a line for host if you want to limit the file to only be used by one server.

If your password is going to have special characters like # then make sure it is surrounded by single quotes like the above example.

The problem with this is now the password is stored in plain text and the security team at your company is not going to like it. This is better than having it in the command line history and visible in the process list but still too easy to discover. You could lock the permissions down so that only the root user can view it and only people with root access should theoretically ever be able to see it but that may still give several teams the possibility to view it and other indexing applications to easily discover it.

In MySQL 5.6, a new feature was added to encrypt this file with mysql_config_editor.

From the manual (https://dev.mysql.com/doc/refman/5.7/en/mysql-config-editor.html):

The encryption used by mysql_config_editor prevents passwords from appearing in .mylogin.cnf as cleartext and provides a measure of security by preventing inadvertent password exposure. For example, if you display a regular unencrypted my.cnf option file on the screen, any passwords it contains are visible for anyone to see. With .mylogin.cnf, that is not true. But the encryption used will not deter a determined attacker and you should not consider it unbreakable. A user who can gain system administration privileges on your machine to access your files could decrypt the .mylogin.cnf file with some effort.

Here is how you would create the encrypted file:

mysql_config_editor set --login-path=my_encrypted_defaults_file.cnf --host=localhost --user=root --password

And then to use it:

mysql --login-path=my_encrypted_defaults_file.cnf

OR

mysql --login-path=my_encrypted_defaults_file.cnf -h<server name>

You won't be able to view the file at my_encrypted_defaults_file.cnf. This may be "good enough" to satisfy your security team but we can do even better by using GPG. I do something similar to what is described in this Percona blog post using GPG on my laptop. I will leave that for another blog post:

https://www.percona.com/blog/2016/10/12/encrypt-defaults-file/

Wednesday, December 13, 2017

How to identify AWS Aurora instances with aurora_server_id

Many DBAs like to use "SELECT @@hostname" to identify a MySQL server. This can be problematic for a couple reasons. I've seen servers where the actual host name the result from "SELECT @@hostname" do not match. Someone had configured the MySQL hostname incorrectly and the result was the hostname of a different server. In my scripts I have to check for this and make sure the resolved hostname matches the value from "SELECT @@hostname" and if it does not then do additional checking.

Furthermore, when using RDS/Aurora, you cannot rely on the value of "SELECT @@hostname" because it will give some other value, such as a private IP which can be the same on multiple systems and is not the hostname you are looking for.

In your scripts you can run:

show global variables like 'aurora_version';

This will let you know that you are on an Aurora instance. Then you can use this query to get the aurora_server_id which is the closest thing to hostname.

show global variables like 'aurora_server_id';

The aurora_server_id is not going to be resolvable unless you add an alias to it in your DNS but you can get the full resolvable server address from the AWS RDS CLI.

If you are using Python you can use the boto3 RDS docs.

Monday, December 4, 2017

Rewriting sub queries to use joins for better performance

For almost every client I have worked with, I have done some amount of profiling on their databases and recommend minor changes to improve performance. Almost every single time I see queries that use sub-selects when it is not necessary. When the data sets are small (in the thousands or less), sub selects performance is typically not an issue. However, when the number of rows in the tables get into the hundreds of thousands to millions, sub-select performance usually tanks. I've seen developers that treat MySQL tables like a queue (not a best practice) and use sub-selects. Performance will appear to be fine until the queue fills up and all of sudden the entire application is broken because they were not expecting 500,000 to a million rows in the table. Sometimes these poorly written sub-select queries will take 2~3 minutes and pile up on each other causing the MySQL server to be overwhelmed and grind to a halt.

A performance gain can almost always be gained by re-writing a query to use a join instead of a sub-select. Additionally, you need to make sure that the columns which are being used for the join have an index. If you read the "High Performance MySQL" book by the experts at Percona you will see this is one of their recommendations.

Here is a simple example I wrote to query a table that contains a list of database servers.

This query uses a sub-select. In the Explain plan's extra column, notice Using temporary; Using filesort and the the type column shows ALL meaning it is looking at all rows in the table:

This query uses a JOIN without the sub-select. If you know how to read MySQL explain plans, this one looks much better! The type columns changes to range meaning the query is no longer doing a full table scan and the filesort and temporary went away.

The data size on this table is pretty small (less than 2,000 rows). The query with a sub-select took about 65 ms while the query with the join took 50 ms.

Your mileage with re-writing queries to use a join will vary based on the data distribution, primary key, aggregation, indexes and such but the larger the table gets, a join will usually give you better performance than a sub-select. I've seen queries go from minutes to seconds or from 5 sec to 2 sec after re-writing them use a join.

Thursday, November 23, 2017

GNU Parallel for speeding up scripts

One of my favorite tools for speeding up work is GNU Parallel.

https://www.gnu.org/software/parallel/

When I write automation scripts, I try to write them to do "one simple task". For example, it will logon to a single database or a single server and so something.

In reality, I usually want the task to be repeated thousands of times and to be done in parallel to finish quickly.

Alternatively, I some times write the script to only print out the statement required to complete the task on the command line.

With GNU parallel I can use it to control my script and have it run in parallel and throttle the process. GNU parallel can limit the script to a certain number of instances of the script running at the same time.

In this example, my "createSomething.py" script only print out the commands to do the work. It will print out hundreds of commands which need to be run.

By piping it into parallel, it will automatically run 10 of those processes at a time and keep running them until they are all done.

./createSomething.py <input values> | parallel

In this example the file server_list.txt has multiple columns. Each column in the text file is separated by a tab. This is specified in --colsep '\t' as seen below. The values from each line will appear where is {}.

cat server_list.txt | parallel -j10 --colsep '\t' "my_shell_script.sh {}"

In this example the value which is read from server_list.txt is put into the input values after "my_python_script.py" where {.} is:

cat server_list.txt | parallel -j10 python my_python_script.py {}

Here is another example running a query on a long list of servers:

contents of long_list_of_servers.txt would look like:
server1.com
server2.com
server3.com

Execute whatever code is in the "execute_this.sql" on every server in long_list_of_servers.txt. I use the -vvv option to log all the commands so they can be reviewed.

cat long_list_of_servers.txt | parallel -j10 "mysql --defaults-file='my_defaults_file' -h '{}' -vvv < execute_this.sql >> outputfile.log"

If the code in "execute_this.sql" was a grant statement to create a new user or add more permissions on a list of Aurora instances, you could verify that it worked by running this next:

cat long_list_of_servers.txt | parallel -j10 "mysql --defaults-file='my_defaults_file.cnf' -h '{}' -e \"show global variables like 'aurora_server_id'; show grants for myUSER; '\" >> outputfile.log"

Now the log file will have the aurora_server_id for each servers and you can verify that each instance has the correct grants.

Monday, November 13, 2017

REPLACE function

The replace function is another one of my often used data analysis MySQL functions. However, it can create some ugly queries when it is nested many times. Here is an example I how I used it remove some text from server name. I have a client that creates replicas of their servers and names them <servername>_0, _1, _2. etc. I wrote a script to collect database sizes, table sizes, row counts and then each day run some reports on the changes. I only collect the data from the replicas and not the primary server but wanted to remove the replica names and only refer to the "cluster" name.

MySQL REPLACE() replaces all the occurrences of a substring within a string.
https://www.w3resource.com/mysql/string-functions/mysql-replace-function.php

Here is my example query that will give me the last 30 days of data:

SELECT
ss.server_name as server_name,
REPLACE(REPLACE(REPLACE(REPLACE(ss.server_name, '_2', ''),'_1',''),'_0',''),'_3','') as server_name,
ss.total_size_mb as total_size_mb,
ss.date_created,
ss.date as date
FROM growth_stats_lmp.schema_stats ss
WHERE ss.minutes_since_last_timestamp IS NOT NULL -- Removes the entries from a first run of growth_stats collection
AND ss.date BETWEEN CURDATE() - INTERVAL 30 DAY AND CURDATE()
GROUP BY server_name,schema_name,date;

The replace function will remove the _2, _1, _0, _3 out of the servername column (if it finds those values) and display only the remaining text that hasn't been "replaced out".

Monday, November 6, 2017

Setting up sys schema for Aurora

Here is a good article on how to setup the sys schema for AWS Aurora:

https://www.datadoghq.com/blog/how-to-collect-aurora-metrics/

These are the steps I followed for my Aurora instances:

git clone https://github.com/mysql/mysql-sys.git

cd mysql-sys

./generate_sql_file.sh -v 56 -b -u root

gsed -i '10486d' gen/sys_1.5.1_56_inline.sql

mysql -u root -p -h <Aurora Server> -P 3306 < gen/sys_1.5.1_56_inline.sql

After setting up sys schema, you will need a user that can view the reports.

If you plan to use MySQL workbench to view the reports, some won't be available unless the user has EXECUTE on the SYS schema and PROCESS globally in addition to SELECT access to databases.

GRANT SELECT, SHOW VIEW, EXECUTE ON sys.* TO 'UserName'@'%';

GRANT SELECT ON peformance_schema.* TO 'UserName'@'%';

GRANT SELECT, SHOW DATABASES, PROCESS ON *.* TO 'UserName'@'%';

If you don't give the user SELECT access to *.* because you want to limit permissions to not include the mysql schema then you would need to GRANT SELECT to the non system databases. Also make sure the user has a password hash statement after running the GRANT statements so the user doesn't have an empty password.

Thursday, November 2, 2017

Barracuda format causes Aurora read only instances to restart randomly

As of writing this, have found a problem with using Barracuda table format in Aurora.

Per the MySQL manual, to use Barracuda, you can set ROW_FORMAT=COMPRESSED or ROW_FORMAT=DYNAMIC.

From: https://dev.mysql.com/doc/refman/5.6/en/innodb-compression-usage.html

Aurora doesn’t actually support compressed tables and will automatically change the format if you try to use ROW_FORMAT=COMPRESSED.

Per their documentation:

Amazon Aurora doesn't support compressed tables (that is, tables created with ROW_FORMAT=COMPRESSED).

Copied from: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/AuroraMySQL.Migrating.RDSMySQL.Import.html

So, if you want to use Barracuda, you are left with ROW_FORMAT=DYNAMIC. I have a client who wanted to use Barracuda and was using it on a physical machine at their data center. After doing the migration to Aurora, everything automatically changed to Antelope because the table create statements were using ROW_FORMAT=COMPRESSED. We went through and changed everything to ROW_FORMAT=DYNAMIC in order to force the tables to use Barracuda. This client has a sharded application spread out over about 30 instances. Each cluster has a writer end point instance and a replica for read only traffic. After we converted the tables to Barracuda, the read only replicas would randomly restart. The error on the application side was "org.mariadb.jdbc.internal.util.dao.QueryException: unexpected end of stream". Then they would get a number of errors related to the instance not being available. Then it would work fine for a while.

It was hard to figure it out but eventually we traced it back to the read only replicas using Barracuda. For a while we transferred all the read only traffic to the writer end points and all the problems stopped.

If I log onto the writer end point, you can see that I've set the innodb_file_format and innodb_file_format_max to Barracuda (same settings for read only replica also).

mysql> show global variables like 'innodb_file_%';

+--------------------------+-----------+

| Variable_name | Value |

+--------------------------+-----------+

| innodb_file_format | Barracuda |

| innodb_file_format_check | ON |

| innodb_file_format_max | Barracuda |

| innodb_file_per_table | ON |

+--------------------------+-----------+

4 rows in set (0.09 sec)

This query shows I'm on the writer end point:

mysql> show global variables like 'innodb_read_only';

+------------------+-------+

| Variable_name | Value |

+------------------+-------+

| innodb_read_only | OFF |

+------------------+-------+

1 row in set (0.12 sec)

mysql> show global variables like 'aurora_version';

+----------------+--------+

| Variable_name | Value |

+----------------+--------+

| aurora_version | 1.15.1 |

+----------------+--------+

1 row in set (0.12 sec)

mysql>

mysql> use tmp;

Database changed

mysql>

mysql> DROP TABLE IF EXISTS test_table;

Query OK, 0 rows affected (0.12 sec)

mysql> CREATE TABLE `test_table` (

-> `id` int(11) NOT NULL AUTO_INCREMENT,

-> `message` varchar(255) NOT NULL,

-> `created_at` datetime NOT NULL,

-> PRIMARY KEY (`id`)

-> ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 ROW_FORMAT=COMPRESSED;

Query OK, 0 rows affected, 2 warnings (0.13 sec)

mysql> SELECT * FROM information_schema.INNODB_SYS_TABLES WHERE NAME = 'tmp/test_table';

+----------+----------------+------+--------+-------+-------------+------------+---------------+

+----------+----------------+------+--------+-------+-------------+------------+---------------+

| 196 | tmp/test_table | 1 | 6 | 138 | Antelope | Compact | 0 |

+----------+----------------+------+--------+-------+-------------+------------+---------------+

1 row in set (0.09 sec)

Table became Antelope (Aurora silently changes the format for you).

mysql> DROP TABLE IF EXISTS test_table;

Query OK, 0 rows affected (0.11 sec)

mysql> CREATE TABLE `test_table` (

-> `id` int(11) NOT NULL AUTO_INCREMENT,

-> `message` varchar(255) NOT NULL,

-> `created_at` datetime NOT NULL,

-> PRIMARY KEY (`id`)

-> ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 ROW_FORMAT=DYNAMIC ;

Query OK, 0 rows affected (0.11 sec)

mysql> SELECT * FROM information_schema.INNODB_SYS_TABLES WHERE NAME = 'tmp/test_table';

+----------+----------------+------+--------+-------+-------------+------------+---------------+

+----------+----------------+------+--------+-------+-------------+------------+---------------+

| 197 | tmp/test_table | 33 | 6 | 139 | Barracuda | Dynamic | 0 |

+----------+----------------+------+--------+-------+-------------+------------+---------------+

1 row in set (0.10 sec)

Now the table is Barracuda.

The issue we seem to have is something to with Aurora starting up with the setting in the parameter group for innodb_file_format_max as Barracuda. The system tables are using Antelope but when it reads a table using Barracuda, it tries to set the innodb_file_format_max to Barracuda but since it is read only it crashes.

After the read only replica crashes, it seems to fix itself for a while. And then eventually crashes again. I'm not sure how it gets back into a bad state which allows it to crash again. Reverting all the tables back to Antelope solved the issue.

Monday, October 23, 2017

Slave skip counter terminates the entire transaction

If you have been an admin for MySQL with replication, at some point you are going to need to skip statements on a replica that passed on the master but are failing on the slave. There are too many reason this can happen and depending on the situation it might be safe to skip or might lead to data inconsistencies on the replica.

When you skip the error, it skips the entire transaction. Even if the first part of the transaction would have been successful because there are multiple statements in the transaction and one of them fails, then all of them are skipped on the slave.

This has been accurately described by Jervin Real in this blog post:

https://www.percona.com/blog/2013/07/23/another-reason-why-sql_slave_skip_counter-is-bad-in-mysql/

Something to note is that skipping errors on Aurora is different than on normal MySQL server:

On a normal MySQL slave you would run this;

STOP SLAVE;

SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1;

START SLAVE;

show slave status;

If you have thousands of errors and want to skip them all then you can put a large number in the SQL_SLAVE_SKIP_COUNTER. If you have that many error then you probably have larger issues and might want to consider rebuilding the slave to prevent data inconsistency problems.

However on Aurora you can only skip one error at a time:

CALL mysql.rds_skip_repl_error;

show slave status;

You don't have to stop and start replication but do need to use the stored procedure provided by the Aurora folks.

I have a client that uses Aurora as a disaster recovery "site". The client has thousands of databases that are being replicated from their local data center into Aurora. If their local data center were to go down, they have all their data in Aurora. However, periodically some statement breaks replication and I have to skip replication if it is safe to do so.

One of the semi frustrating things I have to deal with is the widespread use of the BLACKHOLE storage engine in Aurora by my client. When replicating from the local data center to Aurora, the client does not always want to replicate every single database into Aurora. There might be hundreds of databases on a single server and the client only want to replicate one of them into Aurora. What I do is export and import all the tables for all the database into Aurora. Then I change all the tables for the databases I don't want to replicate to BLACKHOLE. However, eventually people start creating new tables, modifying existing tables on the master server which is located in the local data center and those DDL changes are replicated to the database for which I have set all the tables to BLACKHOLE. Eventually this causes replication to break. Because I don't care about the data in the BLACKHOLE tables I can skip all the errors but sometimes I have to skip hundreds of errors.

Thursday, October 19, 2017

How to break replication to Aurora instance slaves

I have a client which is migrating database to the AWS cloud. We are setting up Aurora slaves from the physical data to replicate all data. Today I was updating my permissions for my own user and ran a query like this on all the physical machines:

GRANT ALL PRIVILEGES ON *.* TO 'me'@'%' IDENTIFIED BY PASSWORD '*PASSWORDHASH' WITH GRANT OPTION;

That query ran fine on all the physical machine in the data center but it broke replication on every single Aurora instance that was replicating. This is because there are many permissions which are not allowed in Aurora because only the "SUPER" user can have them and that user is controlled by Amazon. When granting permissions or changing password hash, we have to be careful to only run grant statements which will not break replication on Aurora.

Here is an interesting list of other things you can learn from doing migrations to amazon rds:

https://www.percona.com/blog/2014/07/28/what-i-learned-while-migrating-a-customer-mysql-installation-to-amazon-rds/

Tuesday, October 10, 2017

if else / case statements in MySQL queries

Being able to change what is displayed in the query or to display the results of a different column based on criteria in a WHERE clause is super helpful. Because this is so useful and something I use so frequently I writing a blog post on it.

Here is am example query I wrote. I want to display the Aurora "EndPoint" alias when I provide an IP Address. I've created two tables, one which has RDS/Aurora Cluster details, and one with RDS/Aurora Instance details. I've pulled this information from the AWS API and stored it locally to easily query it in a relational database. If the IP Address turns out to be the writer end point then I want the c.EndPoint column value to be displayed. If the IP Address is for reader end point then I want the c.ReaderEndpoint to be displayed. If there is only one instance in the cluster then this query will always return c.EndPoint (writer end point).

SELECT
IF(i.IsClusterWriter = 1, c.EndPoint, c.ReaderEndpoint ) AS alias
FROM RDSCluster c
INNER JOIN RDSInstance i ON (i.DBClusterIdentifier = c.DBClusterIdentifier)
WHERE i.IPAddress = 'xx.xx.xxx.xx';

With a case statement, I could also write it like this:

SELECT
(case when (i.IsClusterWriter = 1)
THEN
c.EndPoint
ELSE
c.ReaderEndpoint
END)
as alias
FROM RDSCluster c
INNER JOIN RDSInstance i ON (i.DBClusterIdentifier = c.DBClusterIdentifier)
WHERE i.IPAddress = 'xx.xx.xxx.xx'

Other examples:

https://stackoverflow.com/questions/8763310/how-do-write-if-else-statement-in-a-mysql-query

Tuesday, October 3, 2017

Clearing the buffer cache before starting MySQL - Flush out the file system cache

Here was as good question I found on stackexchange:

How do you empty the buffers and cache on a Linux system?

http://unix.stackexchange.com/questions/87908/how-do-you-empty-the-buffers-and-cache-on-a-linux-system

If I don't flush out the file system cache when restarting MySQL on prod severs, I frequently get timeouts when restarting. The servers I typically work with have about 250GB of RAM, 40 CPU. Something like this will happen where I have to kill the start command or it just doesn't start.

# service mysql status
MySQL (Percona Server) running (182654) [ OK ]
# service mysql restart
Shutting down MySQL (Percona Server).......................[ OK ]......................................
Starting MySQL (Percona Server).......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................^C

However, if I run this:

sync && echo 3 > /proc/sys/vm/drop_caches

MySQL starts up quick.

Starting MySQL (Percona Server)............................[ OK ]....................

Friday, September 29, 2017

Error installing sys schema because performance schema structure is wrong

When attempting to install the sys schema which I downloaded from here, I sometimes I get this error:

mysql -u root -p -h MyServerName.com < ./sys_56.sql

ERROR 1033 (HY000) at line 47 in file: './views/p_s/processlist.sql': Incorrect information in file: './performance_schema/threads.frm'

This is related to a bug after upgrading MySQL. The structure of the performance schema is wrong because it wasn't fixed at the time of upgrade.

The way I typically fix this is by dropping performance schema and re-installing it.

First ssh into the system.

Logon to MySQL:
DROP DATABASE performance_schema;
Exit MySQL and run:

mysql_upgrade -u root -p

mysql_upgrade will re-create the performance schema with the correct structure.

Now install sys schema.

You will need to restart MySQL service to actually get the performance schema and sys schema to start working.

Thursday, September 21, 2017

Migrating users to RDS or Aurora

I've mentioned this before what a pain it can be migrating databases that have DEFINERS in the triggers/stored procs/events to AWS. I'm doing migrations to Aurora now and the user permissions are also very annoying.

There are some privileges which are not allowed at all on Aurora and RDS:
Super, Shutdown, File

The above three privileges will give this error:
Access denied for user '<user name>'@'%' (using password: YES)

If you are scripting out user migrations from MySQL to RDS and have users with "ALL PRIVILEGES ON *.*" then you need to remove that from the grant statements and replace it with allowed permissions.

These are allowed when granting global privileges on *.* with RDS:

Select, Insert, Update, Delete, Create, Drop, Reload, Process, References, Index, Alter, SHOW DATABASES, CREATE TEMPORARY TABLES, LOCK TABLES, Execute, REPLICATION SLAVE, REPLICATION CLIENT, CREATE VIEW, SHOW VIEW, CREATE ROUTINE, ALTER ROUTINE, CREATE USER, Event, Trigger, WITH GRANT OPTION;

For example:

GRANT Select, Insert, Update, Delete, Create, Drop, Reload, Process, References, Index, Alter, SHOW DATABASES, CREATE TEMPORARY TABLES, LOCK TABLES, Execute, REPLICATION SLAVE, REPLICATION CLIENT, CREATE VIEW, SHOW VIEW, CREATE ROUTINE, ALTER ROUTINE, CREATE USER, Event, Trigger ON *.* TO 'test_user'@'%' IDENTIFIED BY '12345678' WITH GRANT OPTION;

Note: Using "WITH GRANT OPTION" should only be give to administrators.

These are allowed when granting to <database>.* on RDS:

Select, Insert, Update, Delete, Create, Drop, References, Index, Alter, CREATE TEMPORARY TABLES, LOCK TABLES, Execute, CREATE VIEW, SHOW VIEW, CREATE ROUTINE, ALTER ROUTINE, Event, Trigger, WITH GRANT OPTION

For example:

GRANT Select, Insert, Update, Delete, Create, Drop, References, Index, Alter, CREATE TEMPORARY TABLES, LOCK TABLES, Execute, CREATE VIEW, SHOW VIEW, CREATE ROUTINE, ALTER ROUTINE, Event, Trigger ON test.* TO 'test_user'@'%' IDENTIFIED BY '12345678' WITH GRANT OPTION;

Note: Using "WITH GRANT OPTION" should only be give to administrators.

All of the following GLOBAL PRIVILEGES will fail regardless of using RDS or normal MySQL if you try to grant them to a specific database (they can only be granted to *.*):

Process, SHOW DATABASES, CREATE USER, REPLICATION SLAVE, REPLICATION CLIENT, Reload

For example:

GRANT Process, SHOW DATABASES, CREATE USER, REPLICATION SLAVE, REPLICATION CLIENT, Reload ON test.* TO 'test_user'@'%' IDENTIFIED BY '12345678';

The error you will get is:

Incorrect usage of DB GRANT and GLOBAL PRIVILEGES

There are also more permissions which you can grant to a user for AWS that I have not referenced here such as:

GRANT SELECT INTO S3 ON *.* TO 'test_user'@'%';

I am not covering those AWS specific permissions because those won't appear in your non RDS MySQL Servers because they are specific to RDS. But you would get an error if you try running that on a normal MySQL server.

Thursday, September 14, 2017

Querying processlist and grouping by IPAddress

Wanted to group all the sleeping connections on a MySQL instance by IP Address. Here is the simple query:

SELECT user, LEFT( host, ( LOCATE( ':', host ) - 1 ) ) as host, count(*) as count
FROM `information_schema`.processlist where Command = 'Sleep'
GROUP BY user, LEFT( host, ( LOCATE( ':', host ) - 1 ) )
ORDER BY count DESC

Explanation of the functions used:
MySQL LOCATE() returns the position of the first occurrence of a string within a string.

MySQL LEFT() returns a specified number of characters from the left of the string. Both the number and the string are supplied as arguments of the function.

Found this query from Ben Nadel:

https://www.bennadel.com/blog/3054-grouping-the-mysql-processlist-by-ip-address-to-view-connection-counts.htm

SELECT
 tmp.ipAddress,

 -- Calculate how many connections are being held by this IP address.
 COUNT( * ) AS ipAddressCount,

 -- For each connection, the TIME column represent how many SECONDS it has been in
 -- its current state. Running some aggregates will give us a fuzzy picture of what
 -- the connections from this IP address is doing.
 FLOOR( AVG( tmp.time ) ) AS timeAVG,
 MAX( tmp.time ) AS timeMAX
FROM
 -- Let's create an intermediary table that includes an additional column representing
 -- the client IP address without the port.
 (

  SELECT
   -- We don't actually need all of these columns for the demo. But, I'm
   -- including them here to demonstrate what fields COULD be used in the
   -- processlist system.
   pl.id,
   pl.user,
   pl.host,
   pl.db,
   pl.command,
   pl.time,
   pl.state,
   pl.info,

   -- The host column is in the format of "IP:PORT". We want to strip off
   -- the port number so that we can group the results by the IP alone.
   LEFT( pl.host, ( LOCATE( ':', pl.host ) - 1 ) ) AS ipAddress
  FROM
   INFORMATION_SCHEMA.PROCESSLIST pl

 ) AS tmp
GROUP BY
 tmp.ipAddress
ORDER BY
 ipAddressCount DESC

Tuesday, September 5, 2017

MySQL DBA interview questions

I've done a fair share of DBA interviews, both when I was the one being interviewed and interviewing others for my team. Some managers have only asked me to describe projects I worked on while others asked me a variety of database and linux questions. Many times the questions were so far "out there" and had nothing to do with being a DBA or anything I have ever experiences as a DBA and couldn't answer them. For those interviews, apparently I didn't have the "right stuff" and didn't get the job. I think those questions are stupid. I was once asked computer science questions related to unbalanced b trees. As a production DBA, I have never worried about unbalanced b trees or anything like that. I'm not a software developer but I'd never be any good at my job if I couldn't code and work with developers. I've learned my coding on the job and from doing projects. I'm never going to be any good at computer science theory questions.

At one company, they trained us to ask Situation, Task, Action, Results questions during interviews. aka STAR. Not everyone likes these behavioral questions but I think these types of questions are better to ask during an interview than technical questions.

Here are some examples, you could use to see how a candidate does troubleshooting and has worked through the problems in the past.

Tell me about a time you used scripting (bash, python, power shell, PHP, perl, etc) to automate a task (could be a DBA task or something else)? STAR method (Situation, Task, Action, Results)
Tell me about your experience researching or escalating database (or other technical problem) problems you were unable to solve on your own? STAR method (Situation, Task, Action, Results)
Tell me about your experience managing changes in a production environment (database related or not)? How do you seek approvals to impalement changes? How do you do peer reviews? Do you use a change management tracking system? How do you minimize downtime?
Tell me about your experience being on-call (as a DBA or other technical role)?

I'd expect a MySQL DBA to have an understanding and experience using Linux and Windows. They would need at a minimum basic Linux experience. Most people run MySQL on Linux. Here are some basic questions to check that they have used Linux.

Basic Linux questions

What is a Linux command you use frequently? (to see if candidate uses Linux enough to have a command available in their mind)
What distros have you used? (Red Hat, CentOS, Ubuntu, Gentoo, etc)
How do you list files in a directory? (ls)
How do you see space available on a file system? (df)
How do you see space use on current directory? (du)
What is one way to find a file on a linux file system (use the find command or locate command)
Do you have a particular command line editor you like to use on Linux? (vi, vim, nano, etc)
What command would you use to search through a file or a output from another command? (grep)
What is the crontab used for? (scheduling tasks, scripts)

The following questions are going to be a hit and miss. Some people will know some of them but I think many candidates will not have an answer for a many of them. I've been asked many of these questions during interviews and didn't have an answer at the time. In no way does not being able to answer some of the questions make a candidate not a good person to hire. All of the information below can be learned on the job, you only need a smart person who can troubleshoot and learn. If a candidate has 5+ years working with MySQL then I would expect them to be able to answer many of the following questions. Many candidates are going to have more experience in certain areas (prod support/application support/database development/scripting) and will probably be lacking in other areas. Lacking areas can be learned on the job so don't expect someone to be awesome at all areas.

General databases questions

What is a database? How would you describe the difference between using a spreadsheet and using a database?
What is an index? How would you describe it?
Who is the most important user of a database? (Does the candidate think of only certain users are more important? All connections are important, root user isn't more important than the application user)
Which is faster, inserting one million rows of data or update one million rows of data? (Looking to see how candidate tries to answer question, what justification is used for the answer, understands set based updates? Would they use a cursor to insert data? What could make an insert faster, or what could make an update faster, etc.)
https://www.quora.com/Which-is-faster-Inserting-200-000-rows-of-data-or-updating-200-000-rows-of-data-SQL
Suppose you are creating tables and you have a situation with a many to many relationship. How do you design tables to handle this relationship (Create a mapping table, or a intermediary table)
Table joints. Inner join, left outer join, right outer join, cross join, self join and cartesian product. Describe table join concepts and each type of join.
What affect on a relational DBMS performance do joins have?
What is the difference between an EXPLICIT join and an IMPLICIT join?
https://stackoverflow.com/questions/44917/explicit-vs-implicit-sql-joins
What does ACID stand for?
Atomicity - guarantee that either all of the tasks of a transaction are performed or none of them are
Consistency - database remains in a consistent state before the start of the transaction and after the transaction is over (whether successful or not)
Isolation - other operations cannot access or see the data in an intermediate state during a transaction
Durability - guarantee that once the user has been notified of success, the transaction will persist, and not be undone
Why can allowing developers to use SELECT * frequently in code cause problems? (return more data than usually needed, problems with backwards compatibility for future changes)
What is cardinality? (number of unique values in a particular column)
What is selectivity? (measure of how much variety there is in the values of a given table column in relation to the total number of rows in a given table )
Normalization - describe the concept of normalization and then the differences of 3rd normal form and 4th normal form. Why would someone denormalize MySQL tables.
Tell me about a database or tables you have designed. For what purpose was it created?
Did you normalize the tables? De-normalize? What considerations did you take into account for performance?

MySQL specific questions

Can you name a few different types of indexes used in MySQL (B+ tree index, FULL TEXT index, Log Structure Merge Tree (like TokuDB's fractal tree)
RDBMS locking and concurrency control, describe the concept of MVCC (multiversion concurrency control) and several isolation levels - serializable, repeatable reads, read committed and read uncommitted. Describe the advantages and disadvantages of each method. Where are isolation levels controlled in MySQL and how to set them.
In MySQL can you give me an example of a statement that would be unsafe for statement-based replication? http://dev.mysql.com/doc/refman/5.7/en/replication-rbr-safe-unsafe.html
How could you intentionally break replication between a master and slave? Some examples...

delete data directly on a slave as a super user and then try to update the data on the master, replication will break when it tries to update the missing row on the slave
add data to the master but prevent it from writing to binary log and then update that data but allow it to be written to binary log, replication will break when it tries to update the missing row on the slave
cause slave lag on the slave by doing some long DDL change on the master but purge the binary logs on the master before the slave has enough time to catch up
Use limits to randomly delete rows with statement based replication which will probably eventually cause a conflict
Create a user or add data on a slave that is part of a primary key or unique index and then re-create the same data on the master, replication will break because of primary key conflicts

In MySQL, how could you perform an ALTER TABLE without downtime? (Percona online schema change, Use MySQL 5.6, perform the changes on a slave first and then promote it to master and then make the change on each node)
In what order does MySQL evaluate a multi column or composite index? (Left to right)
Tell me about how you would implement a backup strategy in MySQL. LVM snapshots - describe the concepts and tools, Percona XtraBackup concepts and tools.
How can you perform a backup or datadump in MySQL? (mysqldump utility, mydumper, save a CSV file, Percona XtraBackup)
How can you import data into MySQL? (Load data infile for CSV, at the command line re-direct a dump file into MySQL, myloader (parallel restores))
What experience do you have with Percona Toolkit? (or Maakit?)
In a MySQL query, how can you get a unique list of values from a column? (use a group by or distinct key word)
What are some ways to optimize a sub query in MySQL? (re-write as a join, separate into two queries)
What do you know about sharding MySQL databases? Have you ever set sharding? What is the difference between partitioning and sharding?
Explain the difference between horizontal scaling and vertical scaling.
What monitoring tools have you used in MySQL? (Trend analysis tools, real time profiling tools, others)
What is your experience with MySQL load balancing? (Proxy servers, splitting, reads and writes, etc)
What is your experience with High Availability in MySQL? (Setting up replication, Master-Master, Master-Slave, Tungsten replicator, Galera Cluster, Percona Cluster, etc)
What privilege in MySQL is needed to see TRIGGERS? (TRIGGER)
What privilege in MySQL is needed to see decrypt data when you do not have a decryption key? (SUPER)
What privilege in MySQL is needed to see all the currently running queries? (PROCESS)
What privilege in MySQL is needed to see the results of SHOW SLAVE STATUS, view replication lag, or seconds behind master? (REPLICATION CLIENT)
In MySQL, what happens if you grant a privilege to a user that doesn't exist and the NO_AUTO_CREATE_USER mode is not enabled? (automatically creates a user with blank password)
If you a have a system that is experience deadlock errors, what are some ways to troubleshoot and possibly resolve deadlock scenarios?
Is a distributed MySQL cluster such as two nodes using Master-Master replication, is this considered ACID compliant? (It is not because it is asynchronous, there are no guarantees that queries executed on one server will be committed on the other. This results in eventual consistency)
What are some clustering options available in the MySQL eco-system? What are some of the pros and cons of these systems?

MySQL Cluster (NDB Cluster)
Galera Cluster/Percona Cluster
Master-Master
Master-Slave
MySQL Fabric
Google's Vitess and other specialized sharding frameworks

How does Master-Master replication in MySQL avoid primary key collisions?
Can you explain to me how indexes work in MySQL? (B+ tree, similar to a phone book lookup)
Suppose management complains of the application being slow and thinks the database might be a bottleneck. What actions would you take to prove that it is or is not the database?
Suppose you have identified a query that is taking several seconds to run. How would view the execution plan?
In an EXPLAIN plan in MySQL what column shows you if the query will be doing a full table scan? (type column)
In an EXPLAIN plan in MySQL what column will show the index length in bytes? (key_len column)
In an EXPLAIN plan, you see "Using Index" in the EXTRA column, what does that mean? (covering index)
How do you identify a redundant index in MySQL? Or can you give me an example of a redundant index in MySQL?
What steps could you perform to improve the performance of slow running query?
MySQL - describe the key difference of MySQL vs other databases (pluggable storage engines). How can MySQL support multiple storage engines? Compare MySQL to MariaDB to Percona.
Data types. Describe MySQL datatypes that are available, what is the difference between timestamp and datetime - they store identical data. What are ENUM and SET used for, how to store json and xml data in MySQL. Using text datatype vs varchar. BLOB and CLOB data and what it can be used for. Describe numeric datatypes.
MySQL storage engines - MYISAM, INNODB, BLACKHOLE (why would someone use it), CVS, ARCHIVE, TOKUDB.
INNODB. Locking mechanism, concurrency control, transactions and isolation, Btree index concepts, fragmentation, storage allocation and de allocation, statistics, monitoring, explain "show engine INNODB status output", critical INNODB parameters, INNODB tablespaces, logs and WAL (write ahead logs).
Indexes and primary keys. INNODB indexing concepts, covering index, clustered index, index creation, optimization and MySQL explain plan. Why would someone force an index usage, table partitioning, table compression - advantages and disadvantages.
Security. How to reset forgotten MySQL root password, how to secure MySQL, what is SQL injection and how to protect against it. MySQL files permissions. MySQL grants to objects. SSL encryption, data obfuscation and related tools.
Logging. MySQL logs describe the types - WAL, error, SQL, binary etc...Describe logrotate facility and how to use it.
Character sets. UTF8, Latin1. How MySQL implements UTF8, time zones and why is it important to use NTP.
Replication. Two types asynchronous and synchronous. Async - MySQL replication, Tungsten. Synchronous - NDB, Galera, DRBD. What are the differences, latency, advantages and disadvantages of each. Describe the concept of MySQL log server and why use BLACKHOLE Engine.
High Availability. Describe tools and differences. MySQL MHA, Continuent Tungsten, DRBD, Galera, MySQL Cluster. Alarms and monitoring.

Monday, August 7, 2017

Unknown or incorrect time zone error

Sometimes I get this error on replication slaves. I had created an index on the master and the command replicated to the slave. However, replication broken on the slave and the error from "show slave status" was the following:

Error 'Unknown or incorrect time zone: 'America/Denver'' on query. Default database: ''. Query: 'CREATE INDEX `my_index` ON `myDatabase`.`my_table` (column1, column2) COMMENT '' ALGORITHM DEFAULT LOCK DEFAULT'

For whatever reason this MySQL server didn't have the time zone information. You can add the time zone information like this:

mysql_tzinfo_to_sql /usr/share/zoneinfo | mysql -u root -p mysql

Here is a reference to this on stack exchange:

https://dba.stackexchange.com/questions/120945/how-do-i-resolve-this-error-error-1298-hy000-unknown-or-incorrect-time-zone

Tuesday, July 18, 2017

Using alias in WHERE clause in MySQL - understanding how queries are parsed

I had written a very long query where I was using an alias and needed to be able to filter the query on the alias. Turns out you cannot use an alias in a WHERE clause. BUT you can use a HAVING clause which can be made to mimic certain aspects of a WHERE clause.

https://stackoverflow.com/questions/200200/can-you-use-an-alias-in-the-where-clause-in-mysql

In relation to this, it can be important to understand how queries are parsed:

https://stackoverflow.com/questions/24127932/mysql-query-clause-execution-order

From the above post on stack overflow:

Order that queries are executed, although I think HAVING and GROUP BY could come after SELECT:

FROM clause
WHERE clause
SELECT clause
GROUP BY clause
HAVING clause
ORDER BY clause

This is important for understanding how queries are parsed. You cannot use a column alias defined in a SELECT in the WHERE clause, for instance, because the WHERE is parsed before the SELECT. On the other hand, such an alias can be in the ORDER BY clause.

And another blog post on SQL Query Order of Operations:

https://www.bennadel.com/blog/70-sql-query-order-of-operations.htm

Tuesday, July 11, 2017

Query to find which users have priv on a specific database

SET @db_name = 'db_name';
SELECT -- the list of global privileges
@@hostname as server_host,
SUBSTRING(@@version , 1, CHAR_LENGTH(@@version) - 4) as 'database_version',
mu.host `host`,
mu.user `user`,
(case when `mu`.`password` ='' then "**Yes**" ELSE 'No' end) as `Empty Password?`,
mu.password as 'password',
'ALL' as `Database`,
'-' as `Tables`,
'-' as `Views`,
'-' as `Tables Columns`,
'-' as `Views Columns`,
'-' as `Procedures`,
'-' as `Functions`,
TRIM(TRAILING ',' FROM(RTRIM(CONCAT(
IF(mu.Select_priv = 'Y', 'Select, ', ''),
IF(mu.Insert_priv = 'Y', 'Insert, ', ''),
IF(mu.Update_priv = 'Y', 'Update, ', ''),
IF(mu.Delete_priv = 'Y', 'Delete, ', ''),
IF(mu.Create_priv = 'Y', 'Create, ', ''),
IF(mu.Drop_priv = 'Y', 'Drop, ', ''),
IF(mu.Reload_priv = 'Y', 'Reload, ', ''),
IF(mu.Shutdown_priv = 'Y', 'Shutdown, ', ''),
IF(mu.Process_priv = 'Y', 'Process, ', ''),
IF(mu.File_priv = 'Y', 'File, ', ''),
IF(mu.Grant_priv = 'Y', 'Grant, ', ''),
IF(mu.References_priv = 'Y', 'References, ', ''),
IF(mu.Index_priv = 'Y', 'Index, ', ''),
IF(mu.Alter_priv = 'Y', 'Alter, ', ''),
IF(mu.Show_db_priv = 'Y', 'SHOW DATABASES, ', ''),
IF(mu.Super_priv = 'Y', 'Super, ', ''),
IF(mu.Create_tmp_table_priv = 'Y', 'CREATE TEMPORARY TABLES, ', ''),
IF(mu.Lock_tables_priv = 'Y', 'LOCK TABLES, ', ''),
IF(mu.Execute_priv = 'Y', 'Execute, ', ''),
IF(mu.Repl_slave_priv = 'Y', 'REPLICATION SLAVE, ', ''),
IF(mu.Repl_client_priv = 'Y', 'REPLICATION CLIENT, ', ''),
IF(mu.Create_view_priv = 'Y', 'CREATE VIEW, ', ''),
IF(mu.Show_view_priv = 'Y', 'SHOW VIEW, ', ''),
IF(mu.Create_routine_priv = 'Y', 'CREATE ROUTINE, ', ''),
IF(mu.Alter_routine_priv = 'Y', 'ALTER ROUTINE, ', ''),
IF(mu.Create_user_priv = 'Y', 'CREATE USER, ', ''),
IF(mu.Event_priv = 'Y', 'Event, ', ''),
IF(mu.Trigger_priv = 'Y', 'Trigger, ', '')
)))) AS `Privileges`
FROM
mysql.user mu
HAVING Privileges <> ''
UNION
SELECT -- the list of privileges for a database
@@hostname as server_host,
SUBSTRING(@@version , 1, CHAR_LENGTH(@@version) - 4) as 'database_version',
md.host `host`,
md.user `user`,
'-' as `Empty Password?`,
'-' as 'password',
md.db `Database`,
'-' as `Tables`,
'-' as `Views`,
'-' as `Tables Columns`,
'-' as `Views Columns`,
'-' as `Procedures`,
'-' as `Functions`,
TRIM(TRAILING ',' FROM(RTRIM(CONCAT(
IF(md.Select_priv = 'Y', 'Select, ', ''),
IF(md.Insert_priv = 'Y', 'Insert, ', ''),
IF(md.Update_priv = 'Y', 'Update, ', ''),
IF(md.Delete_priv = 'Y', 'Delete, ', ''),
IF(md.Create_priv = 'Y', 'Create, ', ''),
IF(md.Drop_priv = 'Y', 'Drop, ', ''),
IF(md.Grant_priv = 'Y', 'Grant, ', ''),
IF(md.References_priv = 'Y', 'References, ', ''),
IF(md.Index_priv = 'Y', 'Index, ', ''),
IF(md.Alter_priv = 'Y', 'Alter, ', ''),
IF(md.Create_tmp_table_priv = 'Y', 'CREATE TEMPORARY TABLES, ', ''),
IF(md.Lock_tables_priv = 'Y', 'LOCK TABLES, ', ''),
IF(md.Create_view_priv = 'Y', 'CREATE VIEW, ', ''),
IF(md.Show_view_priv = 'Y', 'SHOW VIEW, ', ''),
IF(md.Create_routine_priv = 'Y', 'CREATE ROUTINE, ', ''),
IF(md.Alter_routine_priv = 'Y', 'ALTER ROUTINE, ', ''),
IF(md.Execute_priv = 'Y', 'Execute, ', ''),
IF(md.Event_priv = 'Y', 'Event, ', ''),
IF(md.Trigger_priv = 'Y', 'Trigger, ', '')
)))) AS `Privileges`
FROM
mysql.db md
WHERE md.Db = @db_name
UNION
SELECT -- the list of privileges for tables
@@hostname as server_host,
SUBSTRING(@@version , 1, CHAR_LENGTH(@@version) - 4) as 'database_version',
mt.host `host`,
mt.user `user`,
'-' as `Empty Password?`,
'-' as 'password',
'-' as `Database`,
CONCAT(mt.Db, '.', mt.Table_name) `Tables`,
'-' as `Views`,
'-' as `Tables Columns`,
'-' as `Views Columns`,
'-' as `Procedures`,
'-' as `Functions`,
REPLACE(mt.Table_priv, ',', ', ') AS `Privileges`
FROM
mysql.tables_priv mt
WHERE mt.Db = @db_name
AND
mt.Table_name IN
(SELECT
DISTINCT
t.table_name `tables`
FROM
information_schema.tables AS t
WHERE
t.table_type IN
('BASE TABLE', 'SYSTEM VIEW', 'TEMPORARY', '') OR
t.table_type <> 'VIEW' AND
t.create_options IS NOT NULL
)
UNION
SELECT -- the list of privileges for views
@@hostname as server_host,
SUBSTRING(@@version , 1, CHAR_LENGTH(@@version) - 4) as 'database_version',
mv.host `host`,
mv.user `user`,
'-' as `Empty Password?`,
'-' as 'password',
'-' as `Database`,
'-' as `Tables`,
CONCAT(mv.Db, '.', mv.Table_name) `Views`,
'-' as `Tables Columns`,
'-' as `Views Columns`,
'-' as `Procedures`,
'-' as `Functions`,
REPLACE(mv.Table_priv, ',', ', ') AS `Privileges`
FROM
mysql.tables_priv mv
WHERE
mv.Db = @db_name
AND
mv.Table_name IN
(SELECT
DISTINCT
v.table_name `views`
FROM
information_schema.views AS v
)
UNION
SELECT -- the list of privileges for table columns
@@hostname as server_host,
SUBSTRING(@@version , 1, CHAR_LENGTH(@@version) - 4) as 'database_version',
mtc.host `host`,
mtc.user `user`,
'-' as `Empty Password?`,
'-' as 'password',
'-' as `Database`,
'-' as `Tables`,
'-' as `Views`,
CONCAT(mtc.Db, '.', mtc.Table_name, '.', mtc.Column_name) `Tables Columns`,
'-' as `Views Columns`,
'-' as `Procedures`,
'-' as `Functions`,
REPLACE(mtc.Column_priv, ',', ', ') AS `Privileges`
FROM
mysql.columns_priv mtc
WHERE
mtc.Db = @db_name
AND mtc.Table_name IN
(SELECT
DISTINCT
t.table_name `tables`
FROM
information_schema.tables AS t
WHERE
t.table_type IN
('BASE TABLE', 'SYSTEM VIEW', 'TEMPORARY', '') OR
t.table_type <> 'VIEW' AND
t.create_options IS NOT NULL
)
UNION
SELECT -- the list of privileges for view columns
@@hostname as server_host,
SUBSTRING(@@version , 1, CHAR_LENGTH(@@version) - 4) as 'database_version',
mvc.host `host`,
mvc.user `user`,
'-' as `Empty Password?`,
'-' as 'password',
'-' as `Database`,
'-' as `Tables`,
'-' as `Views`,
'-' as `Tables Columns`,
CONCAT(mvc.Db, '.', mvc.Table_name, '.', mvc.Column_name) `Views Columns`,
'-' as `Procedures`,
'-' as `Functions`,
REPLACE(mvc.Column_priv, ',', ', ') AS `Privileges`
FROM
mysql.columns_priv mvc
WHERE
mvc.Db = @db_name
AND mvc.Table_name IN
(SELECT
DISTINCT
v.table_name `views`
FROM
information_schema.views AS v
)
UNION
SELECT -- the list of privileges for procedures
@@hostname as server_host,
SUBSTRING(@@version , 1, CHAR_LENGTH(@@version) - 4) as 'database_version',
mp.host `host`,
mp.user `user`,
'-' as `Empty Password?`,
'-' as 'password',
'-' as `Database`,
'-' as `Tables`,
'-' as `Views`,
'-' as `Tables Columns`,
'-' as `Views Columns`,
CONCAT(mp.Db, '.', mp.Routine_name) `Procedures`,
'-' as `Functions`,
REPLACE(mp.Proc_priv, ',', ', ') AS `Privileges`
FROM
mysql.procs_priv mp
WHERE
mp.Routine_type = 'PROCEDURE'
AND mp.Db = @db_name
UNION
SELECT -- the list of privileges for functions
@@hostname as server_host,
SUBSTRING(@@version , 1, CHAR_LENGTH(@@version) - 4) as 'database_version',
mf.host `host`,
mf.user `user`,
'-' as `Empty Password?`,
'-' as 'password',
'-' as `Database`,
'-' as `Tables`,
'-' as `Views`,
'-' as `Tables Columns`,
'-' as `Views Columns`,
'-' as `Procedures`,
CONCAT(mf.Db, '.', mf.Routine_name) `Functions`,
REPLACE(mf.Proc_priv, ',', ', ') AS `Privileges`
FROM
mysql.procs_priv mf
WHERE
mf.Routine_type = 'FUNCTION'
AND mf.Db = @db_name

Thursday, July 6, 2017

Use if exists to run a different query on different version of MySQL

I wrote a simple query for checking daily if there are any users with blank passwords for a client which uses the IF EXISTS syntax.

Client has a mix of MySQL 5.1, 5.5, 5.6, 5.7, Aurora, and RDS.

The need for this query was because the mysql.user table is different in MySQL 5.7. The password column was removed and the hash was changed to be stored in the authentication_String column. I needed to be able to run a different query depending on the version. This query gets run on thousands of servers and is rolled up as part of a report.

SELECT IF (EXISTS(
SELECT @@version
FROM DUAL
WHERE @@version LIKE '5.1%'
OR @@version LIKE '5.5%'
OR @@version LIKE '5.6%'
)
,concat('SELECT user, host, \'users\' FROM mysql.user WHERE password = \'\';')
,concat('SELECT user, host, \'users\' FROM mysql.user WHERE authentication_String = \'\';')) into @a;
SELECT @a;
PREPARE stmt1 FROM @a;
EXECUTE stmt1;
DEALLOCATE PREPARE stmt1;

Tuesday, May 30, 2017

Moving data from MySQL into Aurora/RDS

Migrating databases from your data center to Amazon RDS or Amazon Aurora can be fun. Here are some tips to help.

If you data set is larger than a few GB, use mydumper and myloader instead of a normal MySQL dump. It will go a lot faster because mydumper and myloader is multi-threaded. Using these tools does introduce some caveats though.

Aurora will not let the user doing the restore create definers for other users. If you try this you will get this misleading error:

ERROR 1227 (42000) at line 1902: Access denied; you need (at least one of) the SUPER privilege(s) for this

If you are migrating a schema with routines (functions or stored procedures), events, triggers, or views then you are going to have definers for the code in each of those. You can either change all the definers in your schemas before you dump so they are standardized or change them after the dump but before importing into AWS.

This is the process I am using to dump and restore from a client's data center to Aurora. My client has a long running query killer on their master servers so I generally perform the dumps on a read-only replica (slave). If I must perform the dump on the master, then I must disable the event that performs the query killing during the duration of the dump.

# 1. Create a dump of objects with no data excluding stored procedures, functions, events of views.
# 2. Restore dump on new server.
# 3. Re-create the users on new server but not the grants for any stored procedures, functions, or events. Remove/replace any grants which are not compatible with RDS/Aurora.
# 4. Create a dump of the the objects without data (a second time) but this time include stored procedures, functions, events and views.
# 5. Get a list of the unique definers on the source system. Do a find and replace of all the unique definer users and replace with standard definer.
# 6. Restore the dump on the new server as the same user which is the definer.
# 7. Dump the grants from the source.
# 8. Remove/replace any grants which are not compatible with RDS/Aurora.
# 9. Apply all the grants including stored procedures, functions, events.
# 10. Do a full dump and restore of all data and all objects

The reason for this multi stop process is you cannot create objects which have a definer if the user has not been created yet. You cannot grant permissions to objects if the object has not been created yet. This is why you cannot always do a straight dump on a new machine. Restoring to RDS adds in more steps because the definers must be the same user which is doing the restore.

Here is a great white paper written by the Aurora team lead at Amazon:

https://d0.awsstatic.com/whitepapers/Migration/amazon-aurora-migration-handbook.pdf