Friday, February 19, 2016

Tracking down where file system space has gone

I have to do this so frequently that I wanted to write about tracking down where all your file system space has gone.

Your alerting system, Nagios, Zabbix, PagerDuty or whatever informs you that your file system is 80% full. In MySQL I usually check space in this order:

  1. Binary Logs using up a lot of space? Binary Logs are usually set to expire out after a couple days. Extremely high activity (a lot importing, updating, deleting and re-importing) could result in large binary log files
  2. Slow query logs or general logs turned on to collect EVERYTHING? The slow query log and general log can be set to collect all SELECTS, DDL and DML.
    1. The slow query log can be set to log all queries that take greater than 0 sec and will include query execution times. This is useful for profiling but on a busy system within minutes it can collect GBs of data.
    2. General log turned on?  When the general log is turned on it will log everything except query execute times. On a busy system, in just a few minutes it can log GBs of data. 
  3. Custom logging or auditing turned on? Your company may have other ways to collect data about what the databases are doing. If this data is stored in separate data store on the same server, for auditing, debugging, etc, check here. 
    1. This can result in several additional GB per day added to server. 
  4. Is customer growth the cause of the disk space usage? This would fall under regular capacity planning. If you are not already checking and storing the size of each database on each server daily you should be. This will help you identify tends and do proper capacity planning. 
  5. Still need more space? Start using the du -h command to find other large files which could be eating up space
Using the du -h and df-h at the command line will help you figure out where all the big files are. 


1. Run a df -h command to see which filesystems are nearly full. Now you know which one is 80% but have no clue what files are contributing to that. Is there some log file that has unexpectedly grown huge and isn't being cleared out or isn't being cleared our frequently enough? How are you going to find this file?

Here is are good example:
http://unix.stackexchange.com/questions/125429/tracking-down-where-disk-space-has-gone-on-linux

du -h <dir> | grep '[0-9\.]\+G'
Some times you have filled up an entire filesystem. When this happens you gets lot of problems and cannot create new files or edit existing files.

Even after identifying "run a way" log files and deleting them, it is possible your space may not clear. What I've seen sometimes is whatever script, application or processes that causes the file system to fill up the file system may still be running. There might even be multiple copies of it running in memory and locking the file system so that the files you cleared and not giving the space back.

Check your running processes for the problem. Suppose it was a script called "log_generator.bash".

You could check for all the processes running the script like this:
ps -aux | grep log_generator.bash | grep -v grep

If you see several of them, you can kill them all with a command like this:

 ps -aux | grep log_generator.bash | grep -v grep | awk '{print $2}' | xargs kill -9 

Thursday, February 11, 2016

Ordering by specific field

Today I had a developer ask me how they could get their values to return in a specific order with a LIMIT clause. Developer had a query like:

SELCT name, type
FROM foo
WHERE type_id IN (10,2,5,4)
LIMIT 5

I told him you needed to add an order by clause before the LIMIT to get it ordered the way you want it. Then developer said he wanted to give certain type_id's priority and to be at the top of the list. I hadn't had a need to do before and so after some research discovered the "FIELD" option. When added to an ORDER BY into the query like this you can accomplish what this:


SELCT name, type
FROM foo
WHERE type_id IN (10,2,5,4)
ORDER BY FIELD(10,2,5,4, type_id) ASC
LIMIT 5

Developer is happy he can do what he wanted in MySQL and not have to script it out.

Here are some good examples:

http://stackoverflow.com/questions/958627/mysql-order-by-values-within-in


http://www.electrictoolbox.com/mysql-order-specific-field-values/

Friday, February 5, 2016

SQL Style from Joe Celko

As a DBA at a small company, one of the things that I've had to do is establish naming standards for all things SQL and enforce them. The standard I chose was snake_case for all names of objects (lower case with an underscore between words). These naming standards get more detailed for how to name foreign key constraints, how to name indexes, tables, triggers and so on. Besides MySQL, I'm also a SQL Server DBA and I've found that some developers do not like my standards. My company has a dozen or so applications that we are actively developing and some of these are SQL Server projects. One of the managers for one of the SQL Server projects decided not to use my naming standards and instead to use his own. So now I am in the business of enforcing Pascal Casing for one project and the "company standards" for everything else. Pascal is similar to CamelCase.

I'd like to reference a little book that is now in the public domain called "Joe Celko's SQL Programming Style". I've seen it in the public domain but it can also be purchased online.

I agree to "Avoid the Use of CamelCase" from section 2.1.5. The book has lots of other great advice for naming things. This should probably be a required reading for all DBAs.

Thursday, February 4, 2016

mydumper and myloader

I've been doing a lot of testing with datasets and was getting very frustrated by how slow using mysqldump and then restoring the data is. I started searching for a better way and discovered mydumper. I tested it and it is much, much faster! I created a couple of very simple bash scripts below and have been using it with much success for my testing. Restoring a backup on MySQL 5.6 was so much slower than restoring the same backup on MySQL 5.1. Thankfully with mydimper/myloader I can achieve much better speeds.


1. Download mydumper from https://launchpad.net/mydumper
2. Compile, see here: http://www.serveradminblog.com/2014/11/mydumper-centos-howto/


Example from percona to dump multiple databases or tables:

https://www.percona.com/blog/2014/09/26/logical-mysql-backup-tool-mydumper-0-6-2-now-available/

There isn't a whole lot of documentation as this isn't supported by a large company like Oracle. For example this page has some decent documentation: http://centminmod.com/mydumper.html. However, it isn't entirely correct. I wanted to dump only a specific list of tables and that site's documentation has the command as "--table-list" but it is actually "--tables-list".

It also won't automatically write to the binary log unless you specify this ( --enable-binlog, -e) which can bite you if you are writing to a master-master or master-slave server and you "assume" the data is being replicated.

Another annoying thing I've seen with mydumper is it does a check table status on every table on the server. I'm not sure why this is needed. It really slows down a backup if the server has a lot of tables (hundreds of thousands).

Simple bash script to dump and restore a single database using mydumper:


#!/bin/bash
# Source credentials
source_user="root"
source_password="$uperS3krit"
source_host="db1.wherever.com"
# Target credentials
target_user="root"
target_password="$uperS3krit"
target_host="db2.wherever.com"
# database to dump/restore
db_name='mydb1'
restore_db_name='mydb1'
# Env settings
DUMPER_HOME="/opt/mydumper-0.9.1/"
# mkdir backup dir
mkdir -p ./backup/${db_name}
# Other options
backup_path="./backup/${db_name}"
# Set default file permissions
umask 177
echo "${source_host}"
echo "START BACKUP TIME" `date`
# Dump database
time ${DUMPER_HOME}/mydumper --no-locks --user=${source_user} --password=${source_password} --host=${source_host} --outputdir=${backup_path}  --compress --database=${db_name} --threads=12
echo "END BACKUP TIME" `date`
echo ""
# Load database
echo "${target_host}"
echo "START IMPORT TIME" `date`
time ${DUMPER_HOME}/myloader --user=${target_user} --password=${target_password} --host=${target_host} --enable-binlog --directory=${backup_path} --overwrite-tables --database=${restore_db_name} --threads=12
echo "END IMPORT TIME" `date`


Simple bash script to dump and restore several database using mydumper. This script works to do the entire server and then exclude the databases you don't want.


#!/bin/bash
# Source credentials
source_user="root"
source_password="$uperS3krit"
source_host="db1.wherever.com"
# Target credentials
target_user="root"
target_password="$uperS3krit"
target_host="db2.wherever.com"
# folder to dump/restore
# You could change this value from source_host to whatever you want
backup_dir_name=${source_host}
# Env settings
DUMPER_HOME="/opt/mydumper-0.9.1/"
# mkdir backup dir
mkdir -p ./backup/${backup_dir_name}
# Other options
backup_path="./backup/${backup_dir_name}"
# Set default file permissions
umask 177
echo "${source_host}"
echo "START BACKUP TIME" `date`
# Dump databases
time ${DUMPER_HOME}/mydumper --user=${source_user} --password=${source_password} --host=${source_host} --outputdir=${backup_path}  --compress  --regex '^(?!(mysql|test|performance_schema|information_schema))' --threads=4
echo "END BACKUP TIME" `date`
echo ""
# Load databases
echo "${target_host}"
echo "START IMPORT TIME" `date`
time ${DUMPER_HOME}/myloader --user=${target_user} --password=${target_password} --host=${target_host} --enable-binlog --directory=${backup_path} --overwrite-tables --threads=4
echo "END IMPORT TIME" `date`