Helping you Solve Dynamic Calculation of Sums with Looker’s “Date_Filter” 

Data Analyst at the Data Projects team at Mews. You can find me always trying out new hobbies (currently upcycling clothes). One thing that never changes: my love of data and storytelling.

Using Looker’s templated filters to create dynamic sums based on date ranges can be a challenge. And if that’s already gibberish to you, there’s no need to worry. The first step: what’s the use case? 

Use cases

Why do you need this?

There are many different methods native to Looker that allow you to create aggregate statistics – the main one being measures (sum, avg, count, etc.).  

But wait – what if a user wants to see an aggregate measure calculated dynamically for different time periods? Well, turns out Looker can easily accomplish that too using built-in time filters. 

What Looker can’t do with traditional methods is this scenario: say you have data on user subscriptions per month and customers can sign up on any day of the month. This means if you want to answer a question like: “How many days in January did Customer A have an active subscription?” – you can’t answer that in Looker with built in measures/date filters. 

Why would you want to answer a question like that? If Customer A disputes the bill for their subscription, claiming to have only paid for half the month instead of the entire month, you would be able to have the data to either support or refute their claim. 

What we can accomplish with a bit of LookML is to get a dynamic look at activity between user-specified date intervals and get aggregates based on that activity. 

The main problem 

While there is some documentation out there already about using dynamic aggregation with date filters, the issue I kept running into (and spent some bonding time with Looker’s support on) was using these dynamic measures in a table visualization. Specifically, the sub-totals and totals kept calculating incorrectly.  

Because of this, I’ll be using a table calculation with totals and sub-totals as my main example. 

Anywho, here’s some data: 

The data 

You will need at least one of each to start: 

  • Nominal variable (which are variables that describe a name, label, etc. without a natural order)
    • This should be a category variable
  • Numeric variable 
  • Date variable 

I’ll be using Marvel Movie data I found on Kaggle for this example. Here’s what the first five records look like:

Here’s what I want to know: 

How many movies have been released in a specific time range per distributor? In addition, have each of these movies been out for the entire time range? 

The solution 

Step #1 

Create a filter parameter as a templated filter for the date. In combination with a liquid parameter, this lets the SQL query use a user input instead of just your standard dimension or measure.  

In this example, we want our calculation (our query) to take on any date that the user selects as the input for the query. Here are some example date values that a user may be interested in, which can be specific dates or date ranges: 

  • 1 complete month
  • 1/1/2022 – 15/1/2022

Looker Code Block 1:

What shows up in the explore is a date filter for the user: 

Step #2 

Liquid is a ‘templating language’ that helps Looker be more dynamic. If you’ve worked with R or Python, liquid can usually achieve what functions do. If you’re an SQL person, it is similar to declaring variables or using case statements. 

Liquid variables can be used in two ways, one of which is important for us: 

"Tag Syntax: This type of usage usually doesn’t insert text; instead, it is for logical comparisons and other Liquid operations. In this method, you enclose the Liquid variable in one curly brace and a single percent sign. For example: {% if value > 10000 %}" – Source 

The linked reference has a lot of helpful examples if you’re new to LVs or need a refresher. Onwards. 

Looker Code Block 2: 

date_start works with the templated filter by grabbing the beginning of whatever date interval the user specifies. 

Let’s look at an example of what this is doing: say a user filters for a date on or after 1-1-2022. Regardless of the movie release date, it is returning the first day of the range that the user is filtering for.

When null (the user doesn’t specify a date filter), the LookML for </span><span class="NormalTextRun SCXW99862723 BCX0">filter_start_date</span><span class="NormalTextRun SCXW99862723 BCX0"> says to take the release date, which we can see happen:

Step #3 

Now, let’s do the same but with the end date of the filter interval. 

Looker Code Block 3: 

The date_end parameter takes the last day of the time filter specified by the user.  

So, this is the same idea as </span><span class="NormalTextRun SCXW98721338 BCX0">date_start</span><span class="NormalTextRun SCXW98721338 BCX0">. When the user selects a range or date from the date filter, this </span><span class="NormalTextRun SCXW98721338 BCX0">filter_end_date</span><span class="NormalTextRun SCXW98721338 BCX0"> dimension will pluck out the last date in that range regardless of the movie. 

When the user doesn’t use the date filter, it takes today’s date for all movies:

Step #4  

We need to calculate the number of days the movie has been out given the lower and upper bounds of the interval specified by the user in the date filter. Here is an example of the first scenario:

Looker Code Block 4a: 

The code that results is the following:

When: 

  • The release date of the movie is on or before the start of the filter; and  
  • The last day of the filter is after the release date of the move 

Then: 

  • Calculate the difference as the days between the first day of the filter range and the last day of the filter range 

Take a look at what’s happening when no date is selected by the user:

When there is no filter, the code returns the number of days after the release of the film. This is equal to the difference between the current date of today and the release date (which is what we specified should happen in our case statements when the filter is null). However, there are some more scenarios we should consider.

Now, let’s add these cases to our case statement. 

Looker Code Block 4b:

Let’s give this a go. 

Here cases #1 and #2 are on display. The dimension is calculating the time inside the interval specified, which is 11/01/2021 – Today (6/6/2022).  

Since all movies are released before the first day of the filter besides two, Eternals and Spider-Man: No Way Home, the dimension will simply calculate the days between this interval as the number of “active” days.  

Interpretation: all movies, besides two, have been active during the entire interval specified.  

The two movies that came out after the first day of the specified interval will start the calculation from the day they were released.  

Interpretation: Eternals has been active 213 days in the specified interval. Spider-Man: No Way Home, 171 days.  

The errors 

This is that issue I mentioned in the beginning. Say we want to get, per distributor, the list of movies with their number of active days. However, if we try to plug in our dimension, it will not work because taking subtotals or totals is only possible with measures and you will get an error saying so: 

 We can try to make a measure out of our active days dimension instead. 

Looker Code Block 4c:

However, we get an error (query copy pasted into Databricks has the following error message):

This is because the active_days measure is using fields (the release date) that are not included among the dimensions we’ve selected (which are distributor, title, and active days).  

We are always looking for engineers.

Do you want to work with Kirill? 👀

Whenever Looker builds a query with a measure, it will always group by all the dimensions you select. Because we don’t want to see the release date (and therefore it’s not part of the group by dimensions x,y,z statement), we get this error. 

Step #5 

The simple solution is to make a measure using our active_days dimension that aggregates, or sums up, the number of days. 

Looker Code Block 5:

Not only that, but we can also get the "count" of movies per distributor that are "active" during the specified interval by building the following measure: 

Looker Code Block 6:

Let’s plug this into our table and take a look! (Make sure you have "subtotals" checked) 

Interpretation: all of 20th Century Fox’s Marvel movies were "active", or had already been released on the market for the entire interval (90 days). This means that all 17 of the distributor’s movies were active during that period. 

You can see that there are actually two Fantastic Four movies that 20th Century Fox distributed, which is reflected in the sum of active days and count of active movies. 

If we scroll further, we can see that one movie, ‘Spider-Man: No Way Home’, hadn’t been released yet. Therefore, the measures are both zero. This means the movie had no active days on the market in the specified interval of 90 days.

Additional analysis 

This is quite interesting, but beyond this we can use or modify our active_days measure to build quite meaningful analysis. Let’s say we’re interested in knowing the Marvel movie revenue per distributor between 2000-2010 (not including 2010). 

In this case, we would have to make a couple of new measures since we don’t want to include movies before the start date of the filter in our calculations. 

Looker Code Block 7:

This results in the following:

Interpretation: Between 2000-2010, 19 Marvel movies made a total of about 6.6 billion USD. You can see Sony, despite distributing only four Marvel movies, brought in the most revenue.  

Let’s compare this to Marvel movies released after 2010 (including 2010):

It’s quite interesting to see that the latter decade raked in 33 billion USD in worldwide revenue and released about twice as much Marvel related content as the former.  

You can easily build on this! One idea could be to create a histogram in Looker using the tiers parameter. Or better yet, a blog post exploring how to make tiers using liquid variables. Let us know what you think!

 

Data Analyst at the Data Projects team at Mews. You can find me always trying out new hobbies (currently upcycling clothes). One thing that never changes: my love of data and storytelling.
Share:

More About