~고군분투 인생살이~

[1321#_MEDIUM] Restaurant Growth 본문

SQL/LEETCODE

[1321#_MEDIUM] Restaurant Growth

소금깨 2022. 8. 5. 01:01

Table: Customer

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| customer_id   | int     |
| name          | varchar |
| visited_on    | date    |
| amount        | int     |
+---------------+---------+
(customer_id, visited_on) is the primary key for this table.
This table contains data about customer transactions in a restaurant.
visited_on is the date on which the customer with ID (customer_id) has visited the restaurant.
amount is the total paid by a customer.

 

You are the restaurant owner and you want to analyze a possible expansion (there will be at least one customer every day).

Write an SQL query to compute the moving average of how much the customer paid in a seven days window (i.e., current day + 6 days before). average_amount should be rounded to two decimal places.

Return result table ordered by visited_on in ascending order.

The query result format is in the following example.

 

Example 1:

Input: 
Customer table:
+-------------+--------------+--------------+-------------+
| customer_id | name         | visited_on   | amount      |
+-------------+--------------+--------------+-------------+
| 1           | Jhon         | 2019-01-01   | 100         |
| 2           | Daniel       | 2019-01-02   | 110         |
| 3           | Jade         | 2019-01-03   | 120         |
| 4           | Khaled       | 2019-01-04   | 130         |
| 5           | Winston      | 2019-01-05   | 110         | 
| 6           | Elvis        | 2019-01-06   | 140         | 
| 7           | Anna         | 2019-01-07   | 150         |
| 8           | Maria        | 2019-01-08   | 80          |
| 9           | Jaze         | 2019-01-09   | 110         | 
| 1           | Jhon         | 2019-01-10   | 130         | 
| 3           | Jade         | 2019-01-10   | 150         | 
+-------------+--------------+--------------+-------------+
Output: 
+--------------+--------------+----------------+
| visited_on   | amount       | average_amount |
+--------------+--------------+----------------+
| 2019-01-07   | 860          | 122.86         |
| 2019-01-08   | 840          | 120            |
| 2019-01-09   | 840          | 120            |
| 2019-01-10   | 1000         | 142.86         |
+--------------+--------------+----------------+
Explanation: 
1st moving average from 2019-01-01 to 2019-01-07 has an average_amount of (100 + 110 + 120 + 130 + 110 + 140 + 150)/7 = 122.86
2nd moving average from 2019-01-02 to 2019-01-08 has an average_amount of (110 + 120 + 130 + 110 + 140 + 150 + 80)/7 = 120
3rd moving average from 2019-01-03 to 2019-01-09 has an average_amount of (120 + 130 + 110 + 140 + 150 + 80 + 110)/7 = 120
4th moving average from 2019-01-04 to 2019-01-10 has an average_amount of (130 + 110 + 140 + 150 + 80 + 110 + 130 + 150)/7 = 142.86

 

문제 조건 

식당 사업 확대를 위해 데이터를 분석하기를 원하는 상황. (매일 최소 한 명의 고객이 있을 예정) 

고객이 7일 전 (즉, 현재 + 6일 전)에 지불한 금액의 이동 평균값을 계산하려 한다.

average_mount는 소수점 두 자리까지 반올림.

visite_on을 기준으로 오름차순 정렬

 

이동 평균이란?

데이터의 트렌드(시계열 분석에서)를 분석하는데 자주 사용됨.

시계열의 각 항에 대하여 그것을 중심으로 하는 전후 일정 항 수의 평균값을 연결하여 경향선을 구하는 방법.

 

이동 평균을 구하면 단순 평균 값을 보는 것보다 장기적인 트렌드를 읽어내는데 용이하다. 

하지만, 첫 항과 끝 항의 추세값을 계산할 수 없으므로 최근의 추세선을 구하기는 어려운 단점이 발생함.

 

문제 풀이

 

- 이동 평균

AVG(컬럼명) OVER(ORDER BY 컬럼명 ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)

 

- 이동 합계? 

SUM(컬럼명) OVER(ORDER BY 컬럼명 ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)

 

처음에 temp_1만 가지고 결과를 구하려고 했는데, 그렇게 하니까 이동 합을 이용해서 amount를 구하다 보니 

["visited_on", "amount", "average_amount", "cnt_day"]

["2019-01-01", 100, 100.00, 1]

["2019-01-02", 210, 105.00, 2]

["2019-01-03", 330, 110.00, 3]

["2019-01-04", 460, 115.00, 4]

["2019-01-05", 570, 114.00, 5]

["2019-01-06", 710, 118.33, 6]

["2019-01-07", 860, 122.86, 7]

["2019-01-08", 840, 120.00, 8]

["2019-01-09", 840, 120.00, 9]

["2019-01-10", 850, 121.43, 10] <- 여기서 문제가 발생함 (이유: 10일만 유일하게 값이 2개인데 합이 되지 않고 가장 가까운 값인 130만 더해짐, 그래서 temp를 하나 더 생성해서 날짜별 amount를 묶어주는 과정이 추가로 필요했음) 

 

WITH temp AS (
        SELECT visited_on
                ,SUM(amount) as amount
        FROM Customer
        GROUP BY visited_on)
, temp_1 AS (
        SELECT visited_on 
            ,SUM(amount) OVER(ORDER BY visited_on ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS amount 
            ,ROUND(AVG(amount) OVER(ORDER BY visited_on ROWS BETWEEN 6 PRECEDING AND CURRENT ROW),2) AS average_amount
            ,DENSE_RANK() OVER(order by visited_on) AS cnt_day
        FROM temp
        GROUP BY 1)

SELECT visited_on, 
        amount,
        average_amount
FROM temp_1
WHERE cnt_day >= 7
Comments