Monday, December 28, 2015

Finding international phone numbers mixed in with other numbers

Today I had a developer ask me to help write a "fun" query. He is new to MySQL wasn't sure how to do the same things he is familiar doing in PHP. The developer said he wanted to get a count for all the rows in a table where we store phone numbers to see which entries would meet the criteria for an international number. He wanted to first strip out all the extra stuff people might enter in a phone field such as commas, dashes, spaces, parenthesis, then filter out any places where the client had entered in words instead of numbers, blank numbers, null values, exclude numbers that start with +1 (because that is USA number),  then trim off any numbers that start with 1 but don't already have a + in the beginning and don't count numbers that are 10 digits after stripping out the extra characters (because those should be considered USA numbers).

Here is an example table and sample data:

CREATE TABLE `contacts` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `phone` varchar(40) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_phone` (`phone`)
) ENGINE=InnoDB;

INSERT INTO `contacts` (`id`, `phone`)
VALUES
(1, NULL),
(2, '(123)'),
(3, '+222'),
(4, '+44 20 7895 1232'),
(5, '+448717923642'),
(6, ',123'),
(7, '1 (818) 555-1211'),
(8, '119999999999'),
(9, '123numbers'),
(10, '1555555555555'),
(11, '55555555'),
(12, '5555555555'),
(13, 'numbers'),
(14, 'numbers123');

Here is the query that gets my count: 

SELECT database() as `database`, COUNT(*) as count
FROM
(    SELECT database() as `database`,
    phone,
    TRIM(
        TRIM(LEADING '1' FROM
            REPLACE(
                REPLACE(
                    REPLACE(
                        REPLACE(
                            REPLACE(
                                phone,
                            '(',''),
                        ')',''),
                    '-',''),
                ' ',''),
            ',','')
        )
    ) as phone_changed,
    CHAR_LENGTH(
        TRIM(
            TRIM(LEADING '1' FROM
                REPLACE(
                    REPLACE(
                        REPLACE(
                            REPLACE(
                                REPLACE(
                                    phone,
                                '(',''),
                            ')',''),
                        '-',''),
                    ' ',''),
                ',','')
            )
        )
    ) as phone_changed_length
    FROM contacts
    WHERE phone NOT LIKE '+1%'
    AND phone <> ''
    AND phone IS NOT NULL
    AND phone REGEXP '[0-9]+'
    AND phone REGEXP '^[+]?[-(), 0-9]+$'
    HAVING phone_changed_length <> 10
) as t

Trimming 1 off of the numbers may not be the perfect solution for getting rid of the 1 when someone puts it at the  beginning of a US number but it worked fine for our situation. There are not any US area codes that start with 1 and it is mainly the US that uses the 1 for dialing a US phone number so we were not concerned that this count would be wrong. Example 2 and 6 in the above sample data will still get trimmed even though they are obviously not a valid US number and also not a international number and will still get counted.

The inner query is longer than it actually needs to be just to get this count, however, running the inner query by itself will allow you to see how the TRIM and REPLACE is stripping out some of the data and compare it to the original.

And to give credit where credit is due (needed to reference a few examples):

http://stackoverflow.com/questions/12652234/sql-query-to-remove-special-characters
http://stackoverflow.com/questions/5064977/detect-if-value-is-number-in-mysql
https://techjourney.net/remove-or-trim-first-or-last-few-characters-in-mysql-database-with-sql/
http://stackoverflow.com/questions/8844880/mysql-regexp-removing-white-space-and-non-numeric-characters
http://dev.mysql.com/doc/refman/5.7/en/regexp.html
http://stackoverflow.com/questions/15005916/get-number-of-rows-returned-in-mysql-query
http://www.w3resource.com/mysql/string-functions/mysql-trim-function.php
http://www.w3resource.com/mysql/string-functions/mysql-char_length-function.php

No comments:

Post a Comment