Thursday, March 31, 2016

UNAUTHENTICATED USER causing mysql 5.6 to hang

I upgraded one of our alpha servers from MySQL 5.1 to 5.5 and then to Percona Server 5.6. This server has a few hundred databases and each databases has a few hundred tables. The next day the server kept hanging and would eventually become so sluggish that it had to be restarted. When attempting to establish a new connection it would indicate max connections had been reached. If I was able to log on before the max connections were all used up, I would see a few hundred "unauthenticated user" which would continue to build up without closing.

From searching online, I first saw some blogs telling you to make sure that skip-name-resolve is in the my.cnf file and that you are only using IP address for limiting user permissions.

Here is another site that talks about this problem:
http://codeinthehole.com/writing/solving-mysql-connection-problems-caused-by-a-dead-name-server/

I had already done that so skip-name-resolve wasn't the problem.

This problem continued to happen so I started pt-stalk and demonized it so that if anything started to build up, it would capture forensic data.

I was finally able to capture this happening by using pt-stalk and a threshold of threads_running at 10. At first I tried setting the threshold with threads_connected but the "unauthenticated user" did not count as a connection and I wasn't able to catch it happening.

pt-stalk generates a lot of files...I started reviewing each one of them.

I checked the error log and nothing looked like a problem.

I looked in processlist and saw a number of entries like this:

*************************** 110. row ***************************
           Id: 1021
         User: unauthenticated user
         Host: connecting host
           db: NULL
      Command: Connect
         Time: NULL
        State: login
         Info: NULL
    Rows_sent: 0
Rows_examined: 0

That was expected.

One of the files was called "2016_03_31_18_24_11-opentables2" and all it contained was "Too many open tables:".

That was interesting.

The next useful file was the "Output" file which was full of errors like this:
ERROR 2013 (HY000): Lost connection to MySQL server at 'sending authentication information', system error: 32

Those two errors were the key. I read up on some forums, such as this one:
http://stackoverflow.com/questions/17813630/mysql-5-6-headaches-on-mac-osx

Developers noted that they had this problem on their local testing MySQL instance because table_open_cache was too low.

I checked my system like this:

show global variables like '%table_open_cache%';


Table_open_cache was set at 64.

I increased it like this:

SET GLOBAL table_open_cache = 32768;

I also updated it in the my.cnf file.

Then I asked our quality engineering team to run some tests and I watched the open tables increase by running a command like this:

SHOW STATUS LIKE 'Open_tables'

Within a few minutes it was past 17,000 open tables and all the queries were finishing too quickly to even catch with SHOW PROCESSLIST. It didn't crash.

MySQL 5.6 can handle a much larger table_open_cache than MySQL 5.1 could.

I'm not convinced that is the only problem with this instance but it seems to be doing better.

We also found a number of app servers that had outdated MySQL client tools running on the app side (MySQL 5.1). Those various client tools were being used to connect to MySQL 5.6 and appeared to be contributing to the un-authenticated users hanging around and eventual hanging of MySQL. Since updating all the client tools to MySQL 5.6, haven't seen this issue anymore.

No comments:

Post a Comment