Friday, September 29, 2017

Error installing sys schema because performance schema structure is wrong

When attempting to install the sys schema which I downloaded from here, I sometimes I get this error:

mysql -u root -p -h MyServerName.com < ./sys_56.sql

ERROR 1033 (HY000) at line 47 in file: './views/p_s/processlist.sql': Incorrect information in file: './performance_schema/threads.frm'

This is related to a bug after upgrading MySQL. The structure of the performance schema is wrong because it wasn't fixed at the time of upgrade.

The way I typically fix this is by dropping performance schema and re-installing it.

First ssh into the system.

Logon to MySQL:
DROP DATABASE performance_schema;
Exit MySQL and run:
mysql_upgrade -u root -p
mysql_upgrade will re-create the performance schema with the correct structure. 
Now install sys schema. 

You will need to restart MySQL service to actually get the performance schema and sys schema to start working. 

Thursday, September 21, 2017

Migrating users to RDS or Aurora

I've mentioned this before what a pain it can be migrating databases that have DEFINERS in the triggers/stored procs/events to AWS. I'm doing migrations to Aurora now and the user permissions are also very annoying.

There are some privileges which are not allowed at all on Aurora and RDS:
Super,  Shutdown, File

The above three privileges will give this error:
Access denied for user '<user name>'@'%' (using password: YES)

If you are scripting out user migrations from MySQL to RDS and have users with "ALL PRIVILEGES ON *.*" then you need to remove that from the grant statements and replace it with allowed permissions.

These are allowed when granting global privileges on *.* with RDS:

Select, Insert, Update, Delete, Create, Drop, Reload, Process, References,  Index, Alter, SHOW DATABASES, CREATE TEMPORARY TABLES, LOCK TABLES, Execute, REPLICATION SLAVE, REPLICATION CLIENT, CREATE VIEW, SHOW VIEW, CREATE ROUTINE, ALTER ROUTINE, CREATE USER, Event, Trigger, WITH GRANT OPTION;

For example:

GRANT Select, Insert, Update, Delete, Create, Drop, Reload, Process, References,  Index, Alter, SHOW DATABASES, CREATE TEMPORARY TABLES, LOCK TABLES, Execute, REPLICATION SLAVE, REPLICATION CLIENT, CREATE VIEW, SHOW VIEW, CREATE ROUTINE, ALTER ROUTINE, CREATE USER, Event, Trigger  ON *.* TO 'test_user'@'%' IDENTIFIED BY '12345678' WITH GRANT OPTION;

Note: Using "WITH GRANT OPTION" should only be give to administrators.

These are allowed when granting to <database>.* on RDS:

Select, Insert, Update, Delete, Create, Drop, References,  Index, Alter, CREATE TEMPORARY TABLES, LOCK TABLES, Execute,  CREATE VIEW, SHOW VIEW, CREATE ROUTINE, ALTER ROUTINE, Event, Trigger, WITH GRANT OPTION

For example:

GRANT Select, Insert, Update, Delete, Create, Drop, References,  Index, Alter, CREATE TEMPORARY TABLES, LOCK TABLES, Execute,  CREATE VIEW, SHOW VIEW, CREATE ROUTINE, ALTER ROUTINE, Event, Trigger  ON test.* TO 'test_user'@'%' IDENTIFIED BY '12345678' WITH GRANT OPTION;

Note: Using "WITH GRANT OPTION" should only be give to administrators.

All of the following  GLOBAL PRIVILEGES will fail regardless of using RDS or normal MySQL if you try to grant them to a specific database (they can only be granted to *.*):

Process, SHOW DATABASES, CREATE USER, REPLICATION SLAVE, REPLICATION CLIENT, Reload

For example:

GRANT Process, SHOW DATABASES, CREATE USER, REPLICATION SLAVE, REPLICATION CLIENT, Reload ON test.* TO 'test_user'@'%' IDENTIFIED BY '12345678';

The error you will get is:
Incorrect usage of DB GRANT and GLOBAL PRIVILEGES

There are also more permissions which you can grant to a user for AWS that I have not referenced here such as:

GRANT SELECT INTO S3 ON *.* TO 'test_user'@'%';

I am not covering those AWS specific permissions because those won't appear in your non RDS MySQL Servers because they are specific to RDS. But you would get an error if you try running that on a normal MySQL server.

Thursday, September 14, 2017

Querying processlist and grouping by IPAddress

Wanted to group all the sleeping connections on a MySQL instance by IP Address. Here is the simple query:

SELECT user, LEFT( host, ( LOCATE( ':', host ) - 1 ) ) as host, count(*) as count
FROM `information_schema`.processlist where Command = 'Sleep'
GROUP BY user, LEFT( host, ( LOCATE( ':', host ) - 1 ) )
ORDER BY count DESC

Explanation of the functions used:
MySQL LOCATE() returns the position of the first occurrence of a string within a string.

MySQL LEFT() returns a specified number of characters from the left of the string. Both the number and the string are supplied as arguments of the function.


Found this query from Ben Nadel:

https://www.bennadel.com/blog/3054-grouping-the-mysql-processlist-by-ip-address-to-view-connection-counts.htm


SELECT
 tmp.ipAddress,

 -- Calculate how many connections are being held by this IP address.
 COUNT( * ) AS ipAddressCount,

 -- For each connection, the TIME column represent how many SECONDS it has been in
 -- its current state. Running some aggregates will give us a fuzzy picture of what
 -- the connections from this IP address is doing.
 FLOOR( AVG( tmp.time ) ) AS timeAVG,
 MAX( tmp.time ) AS timeMAX
FROM
 -- Let's create an intermediary table that includes an additional column representing
 -- the client IP address without the port.
 (

  SELECT
   -- We don't actually need all of these columns for the demo. But, I'm
   -- including them here to demonstrate what fields COULD be used in the
   -- processlist system.
   pl.id,
   pl.user,
   pl.host,
   pl.db,
   pl.command,
   pl.time,
   pl.state,
   pl.info,

   -- The host column is in the format of "IP:PORT". We want to strip off
   -- the port number so that we can group the results by the IP alone.
   LEFT( pl.host, ( LOCATE( ':', pl.host ) - 1 ) ) AS ipAddress
  FROM
   INFORMATION_SCHEMA.PROCESSLIST pl

 ) AS tmp
GROUP BY
 tmp.ipAddress
ORDER BY
 ipAddressCount DESC

Tuesday, September 5, 2017

MySQL DBA interview questions

I've done a fair share of DBA interviews, both when I was the one being interviewed and interviewing others for my team. Some managers have only asked me to describe projects I worked on while others asked me a variety of database and linux questions. Many times the questions were so far "out there" and had nothing to do with being a DBA or anything I have ever experiences as a DBA and couldn't answer them. For those interviews, apparently I didn't have the "right stuff" and didn't get the job. I think those questions are stupid. I was once asked computer science questions related to unbalanced b trees. As a production DBA, I have never worried about unbalanced b trees or anything like that. I'm not a software developer but I'd never be any good at my job if I couldn't code and work with developers. I've learned my coding on the job and from doing projects. I'm never going to be any good at computer science theory questions. 


At one company, they trained us to ask Situation, Task, Action, Results questions during interviews. aka STAR. Not everyone likes these behavioral questions but I think these types of questions are better to ask during an interview than technical questions.

Here are some examples, you could use to see how a candidate does troubleshooting and has worked through the problems in the past.
  • Tell me about a time you used scripting (bash, python, power shell, PHP, perl, etc) to automate a task (could be a DBA task or something else)? STAR method (Situation, Task, Action, Results)
  • Tell me about your experience researching or escalating database (or other technical problem) problems you were unable to solve on your own?  STAR method (Situation, Task, Action, Results)
  • Tell me about your experience managing changes in a production environment (database related or not)? How do you seek approvals to impalement changes? How do you do peer reviews? Do you use a change management tracking system? How do you minimize downtime?
  • Tell me about your experience being on-call (as a DBA or other technical role)?
I'd expect a MySQL DBA to have an understanding and experience using Linux and Windows. They would need at a minimum basic Linux experience. Most people run MySQL on Linux. Here are some basic questions to check that they have used Linux.

Basic Linux questions


  • What is a Linux command you use frequently? (to see if candidate uses Linux enough to have a command available in their mind)
  • What distros have you used? (Red Hat, CentOS, Ubuntu, Gentoo, etc)
  • How do you list files in a directory? (ls)
  • How do you see space available on a file system? (df)
  • How do you see space use on current directory? (du)
  • What is one way to find a file on a linux file system (use the find command or locate command)
  • Do you have a particular command line editor you like to use on Linux? (vi, vim, nano, etc)
  • What command would you use to search through a file or a output from another command? (grep)
  • What is the crontab used for? (scheduling tasks, scripts)

The following questions are going to be a hit and miss. Some people will know some of them but I think many candidates will not have an answer for a many of them. I've been asked many of these questions during interviews and didn't have an answer at the time. In no way does not being able to answer some of the questions make a candidate not a good person to hire. All of the information below can be learned on the job, you only need a smart person who can troubleshoot and learn. If a candidate has 5+ years working with MySQL then I would expect them to be able to answer many of the following questions. Many candidates are going to have more experience in certain areas (prod support/application support/database development/scripting) and will probably be lacking in other areas. Lacking areas can be learned on the job so don't expect someone to be awesome at all areas.  

General databases questions

  • What is a database? How would you describe the difference between using a spreadsheet and using a database?
  • What is an index? How would you describe it?
  • Who is the most important user of a database? (Does the candidate think of only certain users are more important? All connections are important, root user isn't more important than the application user)
  • Which is faster, inserting one million rows of data or update one million rows of data? (Looking to see how candidate tries to answer question, what justification is used for the answer, understands set based updates? Would they use a cursor to insert data? What could make an insert faster, or what could make an update faster, etc.)
  • https://www.quora.com/Which-is-faster-Inserting-200-000-rows-of-data-or-updating-200-000-rows-of-data-SQL
  • Suppose you are creating tables and you have a situation with a many to many relationship. How do you design tables to handle this relationship (Create a mapping table, or a intermediary table)
  • Table joints. Inner join, left outer join, right outer join, cross join, self join and cartesian product. Describe table join concepts and each type of join. 
  • What affect on a relational DBMS performance do joins have?  
  • What is the difference between an EXPLICIT join and an IMPLICIT join?
  • https://stackoverflow.com/questions/44917/explicit-vs-implicit-sql-joins
  • What does ACID stand for?
  • Atomicity - guarantee that either all of the tasks of a transaction are performed or none of them are
  • Consistency - database remains in a consistent state before the start of the transaction and after the transaction is over (whether successful or not)
  • Isolation - other operations cannot access or see the data in an intermediate state during a transaction
  • Durability - guarantee that once the user has been notified of success, the transaction will persist, and not be undone
  • Why can allowing developers to use SELECT * frequently in code cause problems? (return more data than usually needed, problems with backwards compatibility for future changes)
  • What is cardinality? (number of unique values in a particular column)
  • What is selectivity? (measure of how much variety there is in the values of a given table column in relation to the total number of rows in a given table )
  • Normalization - describe the concept of normalization and then the differences of 3rd normal form and 4th normal form. Why would someone denormalize MySQL tables. 
  • Tell me about a database or tables you have designed. For what purpose was it created?
  • Did you normalize the tables? De-normalize? What considerations did you take into account for performance?

MySQL specific questions


  • Can you name a few different types of indexes used in MySQL (B+ tree index, FULL TEXT index, Log Structure Merge Tree (like TokuDB's fractal tree)
  • RDBMS locking and concurrency control, describe the concept of MVCC (multiversion concurrency control) and several isolation levels -  serializable, repeatable reads, read committed and read uncommitted. Describe the advantages and disadvantages of each method. Where are isolation levels controlled in MySQL and how to set them.
  • In MySQL can you give me an example of a statement that would be unsafe for statement-based replication? http://dev.mysql.com/doc/refman/5.7/en/replication-rbr-safe-unsafe.html
  • How could you intentionally break replication between a master and slave? Some examples...
    • delete data directly on a slave as a super user and then try to update the data on the master, replication will break when it tries to update the missing row on the slave
    • add data to the master but prevent it from writing to binary log and then update that data but allow it to be written to binary log, replication will break when it tries to update the missing row on the slave
    • cause slave lag on the slave by doing some long DDL change on the master but purge the binary logs on the master before the slave has enough time to catch up
    • Use limits to randomly delete rows with statement based replication which will probably eventually cause a conflict
    • Create a user or add data on a slave that is part of a primary key or unique index and then re-create the same data on the master, replication will break because of primary key conflicts
  • In MySQL, how could you perform an ALTER TABLE without downtime? (Percona online schema change, Use MySQL 5.6, perform the changes on a slave first and then promote it to master and then make the change on each node)
  • In what order does MySQL evaluate a multi column or composite index? (Left to right)
  • Tell me about how you would implement a backup strategy in MySQL.  LVM snapshots - describe the concepts and tools, Percona XtraBackup concepts and tools.  
  • How can you perform a backup or datadump in MySQL? (mysqldump utility, mydumper, save a CSV file, Percona XtraBackup)
  • How can you import data into MySQL? (Load data infile for CSV, at the command line re-direct a dump file into MySQL, myloader (parallel restores))
  • What experience do you have with Percona Toolkit? (or Maakit?)
  • In a MySQL query, how can you get a unique list of values from a column? (use a group by or distinct key word)
  • What are some ways to optimize a sub query in MySQL? (re-write as a join, separate into two queries)
  • What do you know about sharding MySQL databases? Have you ever set sharding? What is the difference between partitioning and sharding?
  • Explain the difference between horizontal scaling and vertical scaling. 
  • What monitoring tools have you used in MySQL? (Trend analysis tools, real time profiling tools, others)
  • What is your experience with MySQL load balancing? (Proxy servers, splitting, reads and writes, etc)
  • What is your experience with High Availability in MySQL? (Setting up replication, Master-Master, Master-Slave, Tungsten replicator, Galera Cluster, Percona Cluster, etc)
  • What privilege in MySQL is needed to see TRIGGERS? (TRIGGER)
  • What privilege in MySQL is needed to see decrypt data when you do not have a decryption key? (SUPER)
  • What privilege in MySQL is needed to see all the currently running queries? (PROCESS)
  • What privilege in MySQL is needed to see the results of SHOW SLAVE STATUS, view replication lag, or seconds behind master? (REPLICATION CLIENT)
  • In MySQL, what happens if you grant a privilege to a user that doesn't exist and the NO_AUTO_CREATE_USER mode is not enabled? (automatically creates a user with blank password)
  • If you a have a system that is experience deadlock errors, what are some ways to troubleshoot and possibly resolve deadlock scenarios?
  • Is a distributed MySQL cluster such as two nodes using Master-Master replication, is this considered ACID compliant? (It is not because it is asynchronous, there are no guarantees that queries executed on one server will be committed on the other. This results in eventual consistency)
  • What are some clustering options available in the MySQL eco-system? What are some of the pros and cons of these systems?
    • MySQL Cluster (NDB Cluster)
    • Galera Cluster/Percona Cluster
    • Master-Master
    • Master-Slave
    • MySQL Fabric
    • Google's Vitess and other specialized sharding frameworks
  • How does Master-Master replication in MySQL avoid primary key collisions?
  • Can you explain to me how indexes work in MySQL? (B+ tree, similar to a phone book lookup)
  • Suppose management complains of the application being slow and thinks the database might be a bottleneck. What actions would you take to prove that it is or is not the database?
  • Suppose you have identified a query that is taking several seconds to run. How would view the execution plan?
  • In an EXPLAIN plan in MySQL what column shows you if the query will be doing a full table scan? (type column)
  • In an EXPLAIN plan in MySQL what column will show the index length in bytes? (key_len column)
  • In an EXPLAIN plan, you see "Using Index" in the EXTRA column, what does that mean? (covering index)
  • How do you identify a redundant index in MySQL? Or can you give me an example of a redundant index in MySQL?
  • What steps could you perform to improve the performance of slow running query?
  • MySQL - describe the key difference of MySQL vs other databases (pluggable storage engines).  How can MySQL support multiple storage engines? Compare MySQL to MariaDB to Percona.
  • Data types. Describe MySQL datatypes that are available, what is the difference between timestamp  and datetime - they store identical data. What are ENUM and SET used for, how to store json and xml data in MySQL. Using text datatype vs varchar. BLOB and CLOB data and what it can be used for. Describe numeric datatypes. 
  •  MySQL storage engines - MYISAM, INNODB, BLACKHOLE (why would someone use it), CVS, ARCHIVE, TOKUDB.
  • INNODB. Locking mechanism, concurrency control, transactions and isolation, Btree index concepts, fragmentation, storage allocation and de allocation, statistics, monitoring, explain "show engine INNODB status output", critical INNODB parameters, INNODB tablespaces, logs and WAL (write ahead logs). 
  • Indexes and primary keys. INNODB indexing concepts, covering index, clustered index, index creation, optimization and MySQL explain plan. Why would someone force an index usage, table partitioning, table compression - advantages and disadvantages.
  • Security. How to reset forgotten MySQL root password, how to secure MySQL, what is SQL injection and how to protect against it. MySQL files permissions. MySQL grants to objects. SSL encryption, data obfuscation and related tools.
  • Logging. MySQL logs describe the types - WAL, error, SQL, binary etc...Describe logrotate facility and how to use it.
  • Character sets. UTF8, Latin1. How MySQL implements UTF8, time zones and why is it important to use NTP.
  • Replication. Two types asynchronous and synchronous. Async - MySQL replication, Tungsten. Synchronous - NDB, Galera, DRBD. What are the differences, latency, advantages and disadvantages of each. Describe the concept of MySQL log server and why use BLACKHOLE Engine. 
  • High Availability. Describe tools and differences. MySQL MHA, Continuent Tungsten, DRBD, Galera, MySQL Cluster. Alarms and monitoring.