Data scientists deal with time series data on a daily basis and being able to manipulate and analyze this data is a necessary part of the job. The SQL window functions allow you to do just this and it’s a common question in data science interviews. So let’s talk about what time series data is, when to use it, and how to implement functions to help manage time series data.
What is time series data?
Time series data are variables within your data that have a time component. This means that each value in this attribute has a date value or a time value, sometimes they have both. Here are some examples of time series data:

• The daily price of company shares because each share price is associated with a specific day
• The average daily value of the stock index for the last few years because each value is assigned to a specific day
• Unique visits to a website during a month
• Platform logs every day
• Sales and monthly income
• Daily logins for an application
LAG and LEAD window functions
When dealing with time series data, a common calculation is to calculate growth or averages over time. This means that you will need to take either the future date or the previous date and their associated values.

Two WINDOW functions that allow you to accomplish this are LAG and LEAD, which are extremely useful for handling time-related data. The main difference between LAG and LEAD is that LAG gets data from previous rows while LEAD is the opposite, it gets data from next rows.

We can use either function to compare month-to-month growth, for example. As a data analytics professional, you will most likely work with time related data, and if you can use LAG or LEAD efficiently, you will be a very productive data scientist.

A data science interview question that requires a window function
Let’s discuss an advanced sql data science interview question that is about this window function. You’ll see that window functions are often part of interview questions, but you’ll also see them a lot in your day-to-day work, so it’s important to know how to use them.

Let’s discuss an Airbnb question called Airbnb growth. If you want to follow it interactively, you can do so here.

The question is to estimate Airbnb’s growth each year using the number of registered hosts as the growth metric. The growth rate is calculated by taking ((number of registered hosts in the current year – number of registered hosts in the previous year) / number of registered hosts in the previous year) * 100.

Output of the year, number of hosts in the current year, number of hosts in the previous year, and growth rate. Round the growth rate to the nearest percent and sort the result in ascending order by year.
Approach Step 1: Count the host for the current year
The first step is to count the hosts by year, so we’ll need to extract the year from the date values.

SELECT statement (year
FROM host_since::date) AS year,
count (id) host_current_year
FROM airbnb_search_details
WHERE host_since IS NOT NULL
GROUP BY statement (year
FROM host_from::date)
ORDER BY year
Approach Step 2: Count the host from the previous year.
This is where you will use the LAG window feature. Here you will create a view where we have the year, the number of hosts in that current year, and then the number of hosts from the previous year. Use a lag function for the previous year’s count and take the last year’s value and put it in the same row as this year’s count. This way you will have 3 columns in your view: year, current year host count, and last year host count. The LAG function allows you to easily extract the last year’s count of hosts in your queue. This makes it easy for you to implement any metric like a growth rate because you have all the values ​​you need in one row for SQL to easily calculate a metric. Here is the code for it:

SELECT year,
host_of_the_current_year,
LAG(host_current_year, 1) ABOUT (ORDER BY year) AS host_previous_year
OF
(SELECT statement (year
FROM host_since::date) AS year,
count (id) host_current_year
FROM airbnb_search_details
WHERE host_since IS NOT NULL
GROUP BY statement (year
FROM host_from::date)
ORDER BY year) t1) t2
Approach 3: Implement the growth metric
As mentioned above, it’s much easier to implement a metric like the following when all the values ​​are in one row. That is why it performs the LAG function. Implement growth rate calculation round (((host_current_year – host_previous_year)/(cast(host_previous_year AS numeric)))*100) estimated_growth

SELECTYear,
host_of_the_current_year,
previous_previous_host,
round(((current_year_host – prev_year_host)/(cast(prev_year_host AS numeric)))*100) estimated_growth
OF
(SELECT year,
host_of_the_current_year,
LAG(host_current_year, 1) ABOUT (ORDER BY year) AS host_previous_year
OF
(SELECT statement (year
FROM host_since::date) AS year,
count (id) host_current_year
FROM airbnb_search_details
WHERE host_since IS NOT NULL
GROUP BY statement (year
FROM host_from::date)
ORDER BY year) t1) t2