I've taken a backup of all the databases of a database server and restored that backup onto two different hosts, one running MySQL 5.1 and one running MySQL 5.6. I want to find failing queries that work on MySQL 5.1 but don't work on MySQL 5.6.
For initial testing, I just want to find SELECT statements that return different results or fail.
I've read over the documentation and this is how I'm executing the tool:
pt-upgrade h=SERVER1 -uUSERNAME -pPASSWORD h=SERVER2 uUSERNAME -pPASSWORD --type='genlog' --max-class-size=2 --max-examples=2 --run-time=10m 'general_log_mysqld.log' 1>report.txt 2>err.txt &
I've chosen to use the general log in this example instead of the slow query log but I've tested with both. The slow query log is more bloated with useless info while the general log has more bang for your buck in terms of size. The big problem I have with this tool is that I have hundreds databases on my server. Inside the general log are lots of "use <database>" statements before each query. However, when I run the tool it complains about not being able to run the query because the tables are missing:
On both hosts:
DBD::mysql::st execute failed: No database selected [for Statement "....
I can't just select one database when I run the command, the tool needs to read the USE <database> from the log file.
I tried again and found a database server with only a few databases. Sometimes it was working and sometimes it was failing to use the correct databases. As a work around I created the same tables inside each database. This only worked if all the tables had unique names. This way it would not matter which database the tool was running against. At this point, at least I got a report with useful results.
It isn't always feasible to do what I did by putting the same tables in every database...what am I doing wrong with this tool?
Issues I've had with Percona Upgrade:
- Queries pulled from production can be too intensive for a test machine running with limited resources. Queries I was testing with were taken from systems with 30+ cores and 64 GB of RAM, queries would time out after a few minutes and the tool would stop working, queries taken from prod should be run on a prod like database servers, also queries like this "SELECT SQL_NO_CACHE * FROM table_name" which came from a backup job seemed to break the tool
- Some queries broke the tool, this included not only the backup queries, but weird custom SQL that were written by developers internally
- The same query would happen so frequently in the logs (100s of thousands of times) that the reports became useless
- There were a lot of failed queries when running the SELECT statements because of missing tables, these were not true temporary tables but tables that are created for a short time and then dropped, these tables were not part of my dump and restore because I didn't know the code was creating these "transitory" tables
- The tool would not database context switch for me, I don't know why but when the logs would issue a "use database_name" it would ignore it and try to run a query that was meant for a different database. The tool would report a huge number of failed queries because of this. My work around was to create the same tables in every schema so I would not get those failed queries anymore
After engaging Percona consulting they told me to only use the SLOW LOG files. For my system they said I would need to massage the files with pt-query-digest before running pt-upgrade like this:
Step 1: Massage a slow query log for SELECT statements using pt-query-digest
pt-query-digest --filter '$event->{arg} =~ m/^select/i' --sample 5 --no-report --output slowlog my_slow.log > my_slow_massaged_for_select.log
pt-query-digest --filter '$event->{arg} =~ m/^select/i' --sample 5 --no-report --output slowlog my_slow.log > my_slow_massaged_for_select.log
Step 2: Next run pt-upgrade
pt-upgrade h=my_source_host.com -uUSER -pPASSWORD h=my_target_host.com -uUSER -pPASSWORD --type=slowlog --max-class-size=1 --max-examples=1 --run-time=1m 'slow_massaged_for_select.log' 1> report_1.txt 2> error_1.txt &
pt-upgrade h=my_source_host.com -uUSER -pPASSWORD h=my_target_host.com -uUSER -pPASSWORD --type=slowlog --max-class-size=1 --max-examples=1 --run-time=1m 'slow_massaged_for_select.log' 1> report_1.txt 2> error_1.txt &
The above massaging worked well for SELECT statements but when testing DDL/DML I ran into more problems. I would still see a lot errors for tables that don't exist because they are tmp tables and come and go during a session.
Step 1: Massage a slow query log for SELECT/DDL/DML statements using pt-query-digest
pt-query-digest --filter '$event->{arg} =~ m/^[select|alter|create|drop|insert|replace|update|delete]/i' --sample 5 --no-report --output slowlog alpha_slow.log > my_slow_massaged_for_dml_ddl.log
Step 2: Clean up the log file to remove LOCKS
Because my slow log file had a number of LOCK statements, I used SED to remove all rows that had any references to LOCKS.
Step 3: Next run pt-upgrade
pt-upgrade h=my_source_host.com -uUSER -pPASSWORD h=my_target_host.com -uUSER -pPASSWORD --type=slowlog --max-class-size=1 --max-examples=1 --run-time=1m --no-read-only 'my_slow_massaged_for_dml_ddl.log' 1> report_2.txt 2> error_2.txt &
Even running all the DDL, I would still see a lot errors for tables that don't exist because they are tmp tables and come and go during a session. At this point, I was able to get more confidence that the upgrade was going to work.