7-day rolling average within SQL
In my opinion, the most straightforward way of doing data analysis is to pull the data from SQL, export into Excel, and do the analysis from there. However, Excel 2003 has a limit of roughly 65,000 rows while Excel 2007 and later have limits of about 1 million rows. This is far from enough relative to the billions of rows of data that many companies work with today.
Therefore, it is important to know how to manipulate and aggregate data within SQL before exporting it into Excel.
For example, suppose you have a table named "user_sessions" structured like the following. Each row represents a distinct user session on a given day. There can only be one user session per customer per day. But a given customer can have more than one user session over multiple days.
Now suppose you wanted to use this raw data to create a 7 day rolling average of active users per day. How would you do it in SQL?
Here is my solution using double joins.
Here is my solution using nested queries.
Therefore, it is important to know how to manipulate and aggregate data within SQL before exporting it into Excel.
For example, suppose you have a table named "user_sessions" structured like the following. Each row represents a distinct user session on a given day. There can only be one user session per customer per day. But a given customer can have more than one user session over multiple days.
user_id
|
date
|
6581
|
1/17/2012
|
3933
|
2/10/2012
|
10124
|
2/23/2012
|
1721
|
3/1/2012
|
11608
|
2/27/2012
|
11927
|
3/11/2012
|
10291
|
2/28/2012
|
1888
|
3/6/2012
|
5938
|
1/24/2012
|
Now suppose you wanted to use this raw data to create a 7 day rolling average of active users per day. How would you do it in SQL?
Here is my solution using double joins.
DROP TABLE IF EXISTS users_per_day; SELECT -- count all the users per given day by grouping by date count(user_id)as users ,date INTO TEMP users_per_day FROM user_sessions GROUP BY date ; SELECT -- double join with the past 7 days and aggregate sum(u2.users) ,u2.date FROM users_per_day u1 join users_per_day u2 on u1.date in (u2.date, u2.date - 1, u2.date - 2, u2.date - 3, u2.date - 4, u2.date - 5, u2.date - 6) GROUP BY u2.date ORDER BY 2 asc
Here is my solution using nested queries.
DROP TABLE IF EXISTS users_per_day; SELECT -- count all the users per given day by grouping by date count(user_id)as users ,date INTO TEMP users_per_day FROM user_sessions GROUP BY date ; SELECT (SELECT sum(u9.users) FROM users_per_day u9 WHERE date in (u.date, u.date - 1, u.date - 2, u.date - 3, u.date - 4, u.date - 5, u.date - 6)) ,u.date FROM users_per_day u ORDER BY 2 asc
Comments
Post a Comment