Tuesday, March 24, 2015

View contents of a file that is compressed

This blog post is an example of how to use the linux "cat" command or linux "head" command on a file that has been compressed with gzip. The following example uses a mysqldump file as part of the example.

Create a MySQL dump and compress it with gzip:
mysqldump -u USERNAME -p -h MY_MASTER_SERVER --all-databases --master-data --add-drop-table --add-drop-trigger --single-transaction --events --triggers --routines | gzip > dump.sql.gz

Restore a MySQL dump that was compressed:
gunzip < dump.sql.gz | mysql -u USERNAME -p -h MY_SLAVE_SERVER

View contents of a compressed file:
gzip -cd dump.sql.gz | head -n25

I used the above trick to grab the master log file and log position from a dump file that is compressed like so in a bash script:

local loc_active_master_log_file=`gzip -cd dump.sql.gz | head -n25 | grep -oh '\w*mysqld-bin.\w*'`

local loc_active_master_log_pos=`gzip -cd dump.sql.gz | head -n25 | grep -o -P '(?<=MASTER_LOG_POS=)[0-9]+'`

Inside my bash script, I then continue to add a slave server into the cluster using the variables after the import has finished:

STOP SLAVE;

RESET SLAVE;

CHANGE MASTER TO MASTER_HOST='MY_MASTER_SERVER',
          MASTER_USER='${replication_user}',
          MASTER_PASSWORD='${replication_password}',
          MASTER_LOG_FILE='${loc_active_master_log_file}',
          MASTER_LOG_POS=${loc_active_master_log_pos};"

START SLAVE;

Wednesday, March 18, 2015

ERROR 2013 (HY000): Lost connection to MySQL server during query

I was trying to run a mysql import on a brand new VM running Percona server 5.6 which had been built with ansible and I kept getting this error over and over again:

ERROR 2013 (HY000) at line 1525: Lost connection to MySQL server during query

I was confused so I did some searching and came upon this thread:

http://stackoverflow.com/questions/10563619/error-code-2013-lost-connection-to-mysql-server-during-query

One of the answers was to check the innodb buffer pool settings which I did. The ansible script had incorrectly set the innodb_buffer_pool_size to MB instead of GB. The person setting it up the ansible automation thought the values were in KB when they were in fact bytes. My system only had about 2 MB for the innodb_buffer_pool_size and it should have been about 2 GB. After upping the size allocate to innodb_buffer_pool_size, the problem went away.

Tuesday, March 10, 2015

Optimization of InnoDB Tables for fragmentation

I have a client running MySQL 5.1 with a SAAS application. Every one of their customers gets their own database schema and each database has about 400 tables.With thousands of  customers, the databases get spread out over many servers. This allows my client to scale horizontally.

In additional to all the performance problems in MySQL 5.1, certain customers experience "slowness" and it is very difficult to pin point it to an exact cause. Whenever I transfer my client's customer database from one server to another server for load balancing purposes, they mention how much faster everything becomes for about a day and then it becomes slow again. The tables in each database typically only have thousands of rows and some tables get up to about 10 million. It is fairly easy to move a client to a new server with minimal downtime on their part.

Instead of moving the database to a different server each day, I have found that optimizing every single one of their tables during the night seems to do the trick. If we let it go a few days and don't do the optimization then the client starts noticing slowness again. Since I'm not able to upgrade their systems to a newer version of MySQL yet which would likely solve their performance woes, I'm stuck with semi-hacks like this. Why does this work? It could be fixing fragmentation (which re-appears after a day or two), warming up the buffer pool, updates stats with the ANALYZE command after the optimize or a combination of all of these.

In general, InnoDB tables shouldn't even need to be optimized very often but since this seems to be making the client feel like performance is better then I continue to do it.

The tricky part has been to write a script that goes an optimizes all the tables for every database on the server. I wrote a script that would only optimize the tables that showed fragmentation from a query like this:

SELECT  t.table_schema, ENGINE, TABLE_NAME,Round( DATA_LENGTH/1024/1024) as data_length , round(INDEX_LENGTH/1024/1024) as index_length, round(DATA_FREE/ 1024/1024) as data_free, (data_free/(index_length+data_length)) as frag_ratio
FROM information_schema.tables t
WHERE  DATA_FREE > 0
AND t.table_schema = 'db_name'
ORDER BY frag_ratio desc;

That list would be kinda small and so optimizing that small list of tables was easy enough. However, these database servers have hundreds of schema and each schema has hundreds of tables. I want almost every table to get rebuilt at least every few days (some have become too big to realistically run "optimize table"). I've seen some open source scripts out there that will optimize everything on your server everyday but because the script only does one table at time, it was too slow and would take more than 24 hours to finish some of my servers.

I had to write a script to get a list of all the tables (and schema) on each server to be optimized. And then I pipe this into a multi-threading app which controls how many operations can be done simultaneously on each server.

I've read a number of resources on table fragmentation. Here are a few...

https://www.percona.com/blog/2010/12/09/mysql-optimize-tables-innodb-stop/

http://serverfault.com/questions/202000/how-find-and-fix-fragmented-mysql-tables

https://www.percona.com/blog/2008/05/01/learning-about-mysql-table-fragmentation/

http://www.databasejournal.com/features/mysql/article.php/3927871/MySQL-Data-Fragmentation---What-When-and-How.htm

http://meinit.nl/optimize-only-fragmented-tables-mysql



Wednesday, March 4, 2015

to CHAR or VARCHAR?

I have a client with an application that has been using MySQL for about 10 years. When designing their columns they used CHAR(255) all over the place. When I looked at their database, I asked them why they used CHAR(255) so frequently, the architect said because with older versions of MySQL (before MySQL 5.1), apparently, having fixed width columns would give you better performance than variable width columns. My client was bragging about how much "data" they had. Client was saying, we have about 8 TB of such and such data in our database. After a number of benchmarks, I convinced the client that CHAR columns are no longer going to help you with performance. It would actually help to change all those CHAR columns to VARCHAR. I wrote some migrations and we changed all their large CHAR columns to VARCHAR. That 8 TB of data went down to about 3 TB of data. Their database were bloated with huge amounts of white space. In addition to saving a huge amount of disk space, we noticed improvements in query performance. It wasn't a huge improvement but after doing pre and post profiling with percona toolkit (pt-query-digest), several queries improved performance. I'm guessing it was because the join didn't have to deal with as much data and the buffers could now hold more "actual" data. This is hard to prove but either way customer was happy to see this improvement but also "sad" that they didn't have as much data as they had thought.

My recommendations:

- Only use CHAR when the string length in the CHAR column is the same length in all records or it is very short. When the string length is consistent (like a 10 digit phone number) performance is a slightly better than VARCHAR because MySQL does not have to store the length of the VARCHAR column. Also when the string length is very short like CHAR(3), performance is better. 

- Whenever the string length is variable, use VARCHAR as this will make the table smaller and require less data to be pulled into the query and therefore improve performance. The idea of using static row lengths in MySQL in order to improve performance is no longer applicable for the latest versions of MySQL and InnoDB.