Monday, December 7, 2015

pt-online-schema-change

Percona has a tool called the Percona Online Schema change which is part of the Percona Toolkit. I've heard it referred to as OSC. It is super useful for adding indexes, changing the storage engine of a table, adding a column, etc, when you want to be able to do the operation "online" without causing the table to lockup and essentially be unavailable for the duration of the operation. On large tables this can take a long time, I've done it on tables with 500 million rows and it has taken 6 hours to perform the operation.

The tool works by creating a new table, performing the changes on the empty table and then moving the data from the old table to the new table in chunks. Once the data is all moved over, the tool syncs up the two tables one last time so that the new data written to the old table while it was moving the other data gets moved over and then it renames the new table to the same name as the old table and then drops the old table. One of the added bonuses of doing all this work is the table also get optimized at the same time. If you do something on a large table that sees a lot of DELETES and UPDATES then is is likely fragmented. I've seen situations where a 100 GB table shrunk down to 50 GB after adding an index because it was so highly fragmented. However, within a few weeks the fragmentation was back.

The syncing of data works by means of triggers so the tool cannot be used on any tables that already have triggers. There are a number of options you can use when running the tool such as "Dry Run", and telling the tool if the server gets too busy to pause. You can also change the chunk size to be as small or large as you see fit. I typically use 10000 as the chunk size. I've tried a chunk size of 1000 and it is much slower for large tables. You could get away with using 50000 but I'd be careful of something too large.

I set a thresholds if 400 Threads_running which is really high in my examples. Threads_running of 50 might be high enough on your server that you would want the tool to pause and wait until the server is less busy to continue syncing data. The tool won't actually perform the operation unless you specify "--execute" (see below).

The tool also has a number of safety features where it checks replication lag on all the slaves and it won't run until replication lag catches up. Example output from using Online Schema Change is below.

Here are two examples I've done frequently:


Example adding an index to a table:

# non prod
## dry run
./pt-online-schema-change --host=non_prod.awesome.com D=my_non_prod_db,t=my_table --ask-pass -u'!me' --critical-load Threads_running=400 --max-load Threads_running=375 --chunk-size=10000 --dry --alter 'ADD INDEX idx_colum1_column2 (column_1, column_1)' > np_add_index_dry_results.log

## actual
./pt-online-schema-change --host=non_prod.awesome.com D=my_non_prod_db,t=my_table --ask-pass -u'!me' --critical-load Threads_running=400 --max-load Threads_running=375 --chunk-size=10000 --execute --alter 'ADD INDEX idx_column_1_column_2 (column_1, column_2)' > np_add_index_actual_results.log


# prod
## dry run
./pt-online-schema-change --host=prod.awesome.com D=my_prod_db,t=my_table --ask-pass -u'!me' --critical-load Threads_running=400 --max-load Threads_running=375 --chunk-size=10000 --dry --alter 'ADD INDEX idx_column_1_column_2 (column_1, column_2)' > add_index_dry_results.log

## actual
./pt-online-schema-change --host=prod.awesome.com D=my_prod_db,t=my_table --ask-pass -u'!me' --critical-load Threads_running=400 --max-load Threads_running=375 --chunk-size=10000 --execute --alter 'ADD INDEX idx_column_1_column_2 (column_1, column_2)' > add_index_actual_results.log


Example changing storage engine from MyISAM to InnoDB:

# non prod
## dry run
./pt-online-schema-change --host=non_prod.awesome.com D=my_database,t=my_table -u'!me' --ask-pass --critical-load Threads_running=400 --max-load Threads_running=375 --chunk-size=10000 --dry --alter "ENGINE = InnoDB" > non_prod_dry_my_table_change_table_engine.log
# actual
./pt-online-schema-change --host=non_prod.awesome.com D=my_database,t=my_table -u'!me' --ask-pass --critical-load Threads_running=400 --max-load Threads_running=375 --chunk-size=10000 --execute --alter "ENGINE = InnoDB" > non_prod_actual_my_table_change_table_engine.log

# prod
## dry run
./pt-online-schema-change --host=prod.awesome.com D=my_database,t=my_table -u'!me' --ask-pass --critical-load Threads_running=400 --max-load Threads_running=375 --chunk-size=10000 --dry --alter "ENGINE = InnoDB" > prod_dry_my_table_change_table_engine.log
## actual
./pt-online-schema-change --host=prod.awesome.com D=my_database,t=my_table -u'!me' --ask-pass --critical-load Threads_running=400 --max-load Threads_running=375 --chunk-size=10000 --execute --alter "ENGINE = InnoDB" > prod_dry_my_table_change_table_engine.log

Found 5 slaves:
  db_server_2
  db_server_3
  db_server_4
  db_server_5
  db_server_6
Will check slave lag on:
  db_server_2
  db_server_3
  db_server_4
  db_server_5
  db_server_6

Example output when changing table from MyISAM to InnoDB:

Operation, tries, wait:
  copy_rows, 10, 0.25
  create_triggers, 10, 1
  drop_triggers, 10, 1
  swap_tables, 10, 1
  update_foreign_keys, 10, 1
Altering `my_database`.`my_table`...
Creating new table...
Created new table my_database._my_table_new OK.
Altering new table...
Altered `my_database`.`_my_table_new` OK.
2015-12-07T17:53:43 Creating triggers...
2015-12-07T17:53:43 Created triggers OK.
2015-12-07T17:53:43 Copying approximately 188366 rows...
2015-12-07T17:54:07 Copied rows OK.
2015-12-07T17:54:07 Swapping tables...
2015-12-07T17:54:07 Swapped original and new tables OK.
2015-12-07T17:54:07 Dropping old table...
2015-12-07T17:54:08 Dropped old table `my_database`.`_my_table_old` OK.
2015-12-07T17:54:08 Dropping triggers...
2015-12-07T17:54:08 Dropped triggers OK.
Successfully altered `my_database`.`my_table`.

No comments:

Post a Comment