Tuesday, March 10, 2015

Optimization of InnoDB Tables for fragmentation

I have a client running MySQL 5.1 with a SAAS application. Every one of their customers gets their own database schema and each database has about 400 tables.With thousands of  customers, the databases get spread out over many servers. This allows my client to scale horizontally.

In additional to all the performance problems in MySQL 5.1, certain customers experience "slowness" and it is very difficult to pin point it to an exact cause. Whenever I transfer my client's customer database from one server to another server for load balancing purposes, they mention how much faster everything becomes for about a day and then it becomes slow again. The tables in each database typically only have thousands of rows and some tables get up to about 10 million. It is fairly easy to move a client to a new server with minimal downtime on their part.

Instead of moving the database to a different server each day, I have found that optimizing every single one of their tables during the night seems to do the trick. If we let it go a few days and don't do the optimization then the client starts noticing slowness again. Since I'm not able to upgrade their systems to a newer version of MySQL yet which would likely solve their performance woes, I'm stuck with semi-hacks like this. Why does this work? It could be fixing fragmentation (which re-appears after a day or two), warming up the buffer pool, updates stats with the ANALYZE command after the optimize or a combination of all of these.

In general, InnoDB tables shouldn't even need to be optimized very often but since this seems to be making the client feel like performance is better then I continue to do it.

The tricky part has been to write a script that goes an optimizes all the tables for every database on the server. I wrote a script that would only optimize the tables that showed fragmentation from a query like this:

SELECT  t.table_schema, ENGINE, TABLE_NAME,Round( DATA_LENGTH/1024/1024) as data_length , round(INDEX_LENGTH/1024/1024) as index_length, round(DATA_FREE/ 1024/1024) as data_free, (data_free/(index_length+data_length)) as frag_ratio
FROM information_schema.tables t
WHERE  DATA_FREE > 0
AND t.table_schema = 'db_name'
ORDER BY frag_ratio desc;

That list would be kinda small and so optimizing that small list of tables was easy enough. However, these database servers have hundreds of schema and each schema has hundreds of tables. I want almost every table to get rebuilt at least every few days (some have become too big to realistically run "optimize table"). I've seen some open source scripts out there that will optimize everything on your server everyday but because the script only does one table at time, it was too slow and would take more than 24 hours to finish some of my servers.

I had to write a script to get a list of all the tables (and schema) on each server to be optimized. And then I pipe this into a multi-threading app which controls how many operations can be done simultaneously on each server.

I've read a number of resources on table fragmentation. Here are a few...

https://www.percona.com/blog/2010/12/09/mysql-optimize-tables-innodb-stop/

http://serverfault.com/questions/202000/how-find-and-fix-fragmented-mysql-tables

https://www.percona.com/blog/2008/05/01/learning-about-mysql-table-fragmentation/

http://www.databasejournal.com/features/mysql/article.php/3927871/MySQL-Data-Fragmentation---What-When-and-How.htm

http://meinit.nl/optimize-only-fragmented-tables-mysql



No comments:

Post a Comment