Wednesday, October 28, 2015

Grouping by granular timestamp ranges - by day or by minute

I was asked to run some counts and make it granular to the minute. At first I was like, sure I will just do a group by on the MINUTE. That didn't exactly work as it was missing several hours, so I realized that since every hour follows the same pattern I'd have to add a group by for each hour of the day in addition to the minute...

I want to get a count up to the minute for the entire day:

SELECT timestamp, count(*) as count
FROM table_b
WHERE timestamp BETWEEN '2015-10-26 00:00:00' AND '2015-10-26 23:59:59'
GROUP BY HOUR(timestamp), MINUTE(timestamp);

If you don't add the HOUR as part of the GROUP BY, it will only give you one hour out of the day. 


I want to get a count up to the hour for the entire year:

SELECT timestamp, count(*) as count
FROM table_a
WHERE start BETWEEN '2015-01-01 00:00:00' AND '2015-12-31 23:59:59'
GROUP BY MONTH(timestamp), DAY(timestamp), HOUR(timestamp)

If you don't add the MONTH as part of the GROUP BY, it will only give you the results for a single month. 

I want to get a count up to the day that spans multiple years:

SELECT COUNT(*) as count, SUM(duration),  YEAR(`timestamp `) as year, MONTH(`timestamp `) as month, DAY(`timestamp `) as day
FROM table_a
WHERE
`start` BETWEEN '2013-11-20 00:00:00' AND '2015-12-31 23:59:59'
GROUP BY YEAR(`timestamp `) , MONTH(`start`), DAY(`timestamp `)
ORDER BY year, month, day


Initially it doesn't seem intuitive but once you think about it makes sense.

For all the purists out there, this is an illegal GROUP BY. However, MySQL extends the GROUP BY. Lifted from the manual (https://dev.mysql.com/doc/refman/5.6/en/group-by-handling.html):

MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause. You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate. 

No comments:

Post a Comment