Wednesday, March 4, 2015

to CHAR or VARCHAR?

I have a client with an application that has been using MySQL for about 10 years. When designing their columns they used CHAR(255) all over the place. When I looked at their database, I asked them why they used CHAR(255) so frequently, the architect said because with older versions of MySQL (before MySQL 5.1), apparently, having fixed width columns would give you better performance than variable width columns. My client was bragging about how much "data" they had. Client was saying, we have about 8 TB of such and such data in our database. After a number of benchmarks, I convinced the client that CHAR columns are no longer going to help you with performance. It would actually help to change all those CHAR columns to VARCHAR. I wrote some migrations and we changed all their large CHAR columns to VARCHAR. That 8 TB of data went down to about 3 TB of data. Their database were bloated with huge amounts of white space. In addition to saving a huge amount of disk space, we noticed improvements in query performance. It wasn't a huge improvement but after doing pre and post profiling with percona toolkit (pt-query-digest), several queries improved performance. I'm guessing it was because the join didn't have to deal with as much data and the buffers could now hold more "actual" data. This is hard to prove but either way customer was happy to see this improvement but also "sad" that they didn't have as much data as they had thought.

My recommendations:

- Only use CHAR when the string length in the CHAR column is the same length in all records or it is very short. When the string length is consistent (like a 10 digit phone number) performance is a slightly better than VARCHAR because MySQL does not have to store the length of the VARCHAR column. Also when the string length is very short like CHAR(3), performance is better. 

- Whenever the string length is variable, use VARCHAR as this will make the table smaller and require less data to be pulled into the query and therefore improve performance. The idea of using static row lengths in MySQL in order to improve performance is no longer applicable for the latest versions of MySQL and InnoDB.

No comments:

Post a Comment