7-day rolling average within SQL

In my opinion, the most straightforward way of doing data analysis is to pull the data from SQL, export into Excel, and do the analysis from there.  However, Excel 2003 has a limit of roughly 65,000 rows while Excel 2007 and later have limits of about 1 million rows.  This is far from enough relative to the billions of rows of data that many companies work with today.

Therefore, it is important to know how to manipulate and aggregate data within SQL before exporting it into Excel.

For example, suppose you have a table named "user_sessions" structured like the following.  Each row represents a distinct user session on a given day.  There can only be one user session per customer per day.  But a given customer can have more than one user session over multiple days.


user_id
date
6581
1/17/2012
3933
2/10/2012
10124
2/23/2012
1721
3/1/2012
11608
2/27/2012
11927
3/11/2012
10291
2/28/2012
1888
3/6/2012
5938
1/24/2012


Now suppose you wanted to use this raw data to create a 7 day rolling average of active users per day.  How would you do it in SQL?


Here is my solution using double joins.

DROP TABLE IF EXISTS users_per_day;

SELECT                         -- count all the users per given day by grouping by date
   count(user_id)as users
  ,date
INTO TEMP users_per_day
FROM user_sessions
GROUP BY date
;

SELECT                         -- double join with the past 7 days and aggregate
   sum(u2.users)
  ,u2.date
FROM users_per_day u1 join users_per_day u2 on u1.date in (u2.date, u2.date - 1, u2.date - 2, u2.date - 3, u2.date - 4, u2.date - 5, u2.date - 6)
GROUP BY
   u2.date
ORDER BY 2 asc



Here is my solution using nested queries.

DROP TABLE IF EXISTS users_per_day;

SELECT                         -- count all the users per given day by grouping by date
   count(user_id)as users
  ,date
INTO TEMP users_per_day
FROM user_sessions
GROUP BY date
;

SELECT 
   (SELECT sum(u9.users) FROM users_per_day u9 WHERE date in (u.date, u.date - 1, u.date - 2, u.date - 3, u.date - 4, u.date - 5, u.date - 6))
  ,u.date
FROM users_per_day u
ORDER BY 2 asc

Comments

Popular posts from this blog

grandmaster level chess AI using python - Part 2 (the code)

building a chess ai - part 4: learning an evaluation function using deep learning (keras)

Brief intro to recurrent neural networks