Tuesday, December 26, 2017

Setting up consumers - events_statements_history for Aurora instances with performance schema

I am wanting to setup Percona Monitoring Manager (PMM) for a client which uses Aurora. To do so, the documentation says you need to turn on the consumer in the performance schema so that events_statements_history os enabled. There isn't an option in the AWS console to do this and there isn't a parameter group setting to modify this so it must be done directly on the instance.


Here I will show you from the command line:


MySQL [(none)]> show global variables like 'performance_schema';
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| performance_schema | ON    |
+--------------------+-------+
1 row in set (0.00 sec)

MySQL [(none)]> use performance_schema
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
MySQL [performance_schema]>
MySQL [performance_schema]> select * from setup_consumers WHERE name = 'events_statements_history';
+---------------------------+---------+
| NAME                      | ENABLED |
+---------------------------+---------+
| events_statements_history | NO      |
+---------------------------+---------+
1 row in set (0.00 sec)

MySQL [performance_schema]> update setup_consumers set enabled='yes' WHERE name = 'events_statements_history';
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

MySQL [performance_schema]> select * from setup_consumers WHERE name = 'events_statements_history';
+---------------------------+---------+
| NAME                      | ENABLED |
+---------------------------+---------+
| events_statements_history | YES     |
+---------------------------+---------+
1 row in set (0.00 sec)


However, after an instance restart, the changes performed to setup_consumers table will be reversed. 


MySQL [performance_schema]> show global variables like 'performance_schema';
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| performance_schema | ON    |
+--------------------+-------+
1 row in set (0.00 sec)

MySQL [performance_schema]> select * from setup_consumers WHERE name = 'events_statements_history';
+---------------------------+---------+
| NAME                      | ENABLED |
+---------------------------+---------+
| events_statements_history | NO      |
+---------------------------+---------+
1 row in set (0.01 sec)

MySQL [performance_schema]> update setup_consumers set enabled='yes' WHERE name = 'events_statements_history';
Query OK, 1 row affected (0.01 sec)
Rows matched: 1  Changed: 1  Warnings: 0

MySQL [performance_schema]> select * from setup_consumers WHERE name = 'events_statements_history';
+---------------------------+---------+
| NAME                      | ENABLED |
+---------------------------+---------+
| events_statements_history | YES     |
+---------------------------+---------+
1 row in set (0.00 sec)


I could setup a Zabbix trigger that turns it back whenever it detects a server restart or some other script. I think the easier route it to create an event that keeps turning it back on. Here is a simple example:


DROP EVENT IF EXISTS enable_statement_history;

CREATE
DEFINER=`root`@`localhost`
EVENT IF NOT EXISTS enable_statement_history
ON SCHEDULE EVERY 60 SECOND
    STARTS NOW()
DO
   update performance_schema.setup_consumers set enabled='yes' WHERE name = 'events_statements_history';



Monday, December 18, 2017

Encrypting the defaults file for logging into mysql

MySQL has an option to store credentials in a file so that you don't have to enter them at the command line when connecting to MySQL.

For example the "normal" way of connecting to MySQL from the command line would be like this:

# mysql -u<my_user> -p -h<server name>
Enter password:

OR

# mysql -u<my_user> -p<MYPassword>  -h<server name>

In the first example you have to enter in your password which won't work for scripts. In the second example you would have to hard code the password into your script or pull it out of a variable but it would get stored in the command line making it visible by anyone who can see what is running on the system. This is bad from a security perspective.

Instead you can use a defaults file and reference the file to logon to MySQL like this:

mysql --defaults-file=location_of_my_default_file.cnf -h<server name>

The defaults file only needs to contain these three lines:

[client]
user=my_user
password='123#_BLABLA'

You can also add a line for host if you want to limit the file to only be used by one server.

If your password is going to have special characters like # then make sure it is surrounded by single quotes like the above example.

The problem with this is now the password is stored in plain text and the security team at your company is not going to like it. This is better than having it in the command line history and visible in the process list but still too easy to discover. You could lock the permissions down so that only the root user can view it and only people with root access should theoretically ever be able to see it but that may still give several teams the possibility to view it and other indexing applications to easily discover it.

In MySQL 5.6, a new feature was added to encrypt this file with mysql_config_editor.

From the manual (https://dev.mysql.com/doc/refman/5.7/en/mysql-config-editor.html):

The encryption used by mysql_config_editor prevents passwords from appearing in .mylogin.cnf as cleartext and provides a measure of security by preventing inadvertent password exposure. For example, if you display a regular unencrypted my.cnf option file on the screen, any passwords it contains are visible for anyone to see. With .mylogin.cnf, that is not true. But the encryption used will not deter a determined attacker and you should not consider it unbreakable. A user who can gain system administration privileges on your machine to access your files could decrypt the .mylogin.cnf file with some effort.

Here is how you would create the encrypted file:

mysql_config_editor set --login-path=my_encrypted_defaults_file.cnf --host=localhost --user=root --password

And then to use it:

mysql --login-path=my_encrypted_defaults_file.cnf

OR

mysql --login-path=my_encrypted_defaults_file.cnf -h<server name>

You won't be able to view the file at my_encrypted_defaults_file.cnf. This may be "good enough" to satisfy your security team but we can do even better by using GPG. I do something similar to what is described in this Percona blog post using GPG on my laptop. I will leave that for another blog post:

https://www.percona.com/blog/2016/10/12/encrypt-defaults-file/


Wednesday, December 13, 2017

How to identify AWS Aurora instances with aurora_server_id

Many DBAs like to use "SELECT @@hostname" to identify a MySQL server. This can be problematic for a couple reasons. I've seen servers where the actual host name the result from "SELECT @@hostname" do not match. Someone had configured the MySQL hostname incorrectly  and the result was the hostname of a different server. In my scripts I have to check for this and make sure the resolved hostname matches the value from "SELECT @@hostname" and if it does not then do additional checking.

Furthermore, when using RDS/Aurora, you cannot rely on the value of "SELECT @@hostname" because it will give some other value, such as a private IP which can be the same on multiple systems and is not the hostname you are looking for.

In your scripts you can run:

show global variables like 'aurora_version';

This will let you know that you are on an Aurora instance. Then you can use this query to get the aurora_server_id which is the closest thing to hostname.

show global variables like 'aurora_server_id';

The aurora_server_id is not going to be resolvable unless you add an alias to it in your DNS but you can get the full resolvable server address from the AWS RDS CLI.

If you are using Python you can use the boto3 RDS docs.

Monday, December 4, 2017

Rewriting sub queries to use joins for better performance

For almost every client I have worked with, I have done some amount of profiling on their databases and recommend minor changes to improve performance. Almost every single time I see queries that use sub-selects when it is not necessary. When the data sets are small (in the thousands or less), sub selects performance is typically not an issue. However, when the number of rows in the tables get into the hundreds of thousands to millions, sub-select performance usually tanks. I've seen developers that treat MySQL tables like a queue (not a best practice) and use sub-selects. Performance will appear to be fine until the queue fills up and all of sudden the entire application is broken because they were not expecting 500,000 to a million rows in the table. Sometimes these poorly written sub-select queries will take 2~3 minutes and pile up on each other causing the MySQL server to be overwhelmed and grind to a halt.

A performance gain can almost always be gained by re-writing a query to use a join instead of a sub-select. Additionally, you need to make sure that the columns which are being used for the join have an index. If you read the "High Performance MySQL" book by the experts at Percona you will see this is one of their recommendations.

Here is a simple example I wrote to query a table that contains a list of database servers.

This query uses a sub-select. In the Explain plan's extra column, notice Using temporary; Using filesort and the the type column shows ALL meaning it is looking at all rows in the table:


This query uses a JOIN without the sub-select. If you know how to read MySQL explain plans, this one looks much better! The type columns changes to range meaning the query is no longer doing a full table scan and the filesort and temporary went away.


The data size on this table is pretty small (less than 2,000 rows). The query with a sub-select took about 65 ms while the query with the join took 50 ms.

Your mileage with re-writing queries to use a join will vary based on the data distribution, primary key, aggregation, indexes and such but the larger the table gets, a join will usually give you better performance than a sub-select. I've seen queries go from minutes to seconds or from 5 sec to 2 sec after re-writing them use a join.