Transform Your Table with a Materialized View
Part 3 of x of Sporting Stylish SQL with Stephen
Note: this was published originally on Medium in August 2019
In our previous projects, we created a table and an upload process for our Premier League results data, but now our manager tells us she is looking at the data we have and wants to know the answer to the operative question: Who does the Premier League table look like now?
Now, you might be looking at me wondering, “I’m new here, what’s she talking about? What in tarnation does a Premier League table even look like?” Never fear, we are going to get through this together:
Behold, the Premier League table for the 2017/2018 season.
The Premier League rules state that a team’s rank is based on the following factors in descending order of importance:
Points: Three points for a win plus one point for a draw
Goal Difference: Goals For minus Goals Against
Goals For
How do we take the data we have already in our xpert.england_view and use it to build this League Table? Let’s start by taking a look at a specific game. I’ll close my eyes and pick a game out of the hat…Wolverhampton vs Leicester City:
By my calculations, the columns in the above query are what we need to get the points, goal difference and goals scored for each team. When you really think about it, this game by itself actually represents two results in which we are interested: one, a loss for Leicester and the other, a win for Wolves — one game, two results. To represent them properly, we’re going to have to turn this one-row-per-game data, into two rows, one row per team result.
Let’s use the Wolves Leicester game as our working set, we’ll turn it into a CTE called ws. When you are exploring the data and building your queries, I find it useful to start with a small(er) working set, enough for you to see the changes you’re trying to make as you shape your query. It makes it easier to see what’s happening at a granular level, it taxes the database much less and when you’re done building whatever it is you’re building you can just unceremoniously get rid of it.
Now that we have a working set, what transformations do we want to perform upon it? Well, like I said, we’re trying to convert one row into two, so it sounds like we could use a UNION to stack our teams (home and away) on top of each other, along with some other information that we can use to join back to the working set:
a UNION where two columns become one
Unified Teams Query Notes
Don’t forget to indent the inside of your CTE.
I used an alias on the column name for the first query of the UNION, but not the second. As you can see from the results below, the resulting table retains the column names from the first query, and one alias is not only enough, but it’s simpler and better looking (IMO, of course)
I used id as the column that we are going to use to join back onto the original working set. We cannot have any ambiguity in the join that we are building towards which means we cannot simply join on the date of the game because there are often more than one game per day. We would have to join on date AND another column, but what single column would we join on, since we’re unifying the teams on one side? The id, by virtue of the fact that every row has a distinct id and every row is itself a distinct game, is in essence a game ID, and therefore is all we need to build our future join upon.
Let’s use this unified team table we have built to try and gather the relevant numbers to calculate the ranking factors. Let’s go up the list of those ranking factors and begin with goals scored.
After we turn the unified teams query into a CTE, what happens when we join it back into our working set?
As you can see here, we can’t just pull a column out of the working set to get our goals scored measure, one column is home goals and the other is away goals. What do we want to read off of the working set columns as we go down our team column in the unified teams CTE (the blue box above)? Ideally, in the CASE WHEN the team is the same as the home team, THEN we count the home goals, ELSE, we count the away goals:
Goals Scored Query Notes
Don’t forget to indent the insides of your CTEs!
Give the CTEs (that aren’t the working set) a proper name, save the abbreviation for later on in the query, when you can turn it into an alias (unified_teams -> ut)
The CASE & the WHEN-THENs & the ELSE & the END AS of the statement all get their own lines, it looks good and it helps in the event you need to comment conditionals out as you’re exploring the data.
To calculate goal difference, we’ll have to add goals conceded to our query, but that is pretty straightforward, we just have to tweak what we already wrote ever so slightly so that when the team is home, we count away goals and vice versa:
Which gives us:
To calculate the points, we need to add three more columns: wins, draws and losses. We have the full_time_result column on the table, so we can read that to figure out what happened to the team on the day. (QUICK REMINDER OF HOW full_time_result works: H means home win, A means away win and D means dtraw)
Let’s concern ourselves with winning. An A or an H can be a victory, so we’re going to need to reflect that in our CASE statement:
Wins Query Notes
Indent the OR statement so it lines up with the overarching WHEN.
When your CASE statement runs more than one statement, it looks better to put the THEN on its own line as well.
I made the wins column an integer count of the wins (1 or 0) so we will be able to aggregate it as a SUM more easily later.
Using this template, it’s easy to build out draws and losses using the same logic and format we just used for wins with a couple of small tweaks:
Which finally gives us…
Let’s quickly discuss the difference between the data we are using before and after the transform we do with this query we have built. The raw results data in our working set ws and then in xpert.england_view is broken out by game and is unique between one set of teams (home) and their opponents (away). The data from the transform query we have built is essentially ignorant of who the opponent is, it’s just aggregated results (soon-to-be aggregated results over time).
I would think it would be valuable to keep both of those tables in the database, but instead of a run of the mill view, I say we use a materialized view. The materialized view is essentially a table in the database, built using a query that is defined upon creation and is subsequently refreshed periodically. It is great in cases where you have a somewhat complex transform (check) that you think you will be querying often does not need to be updated that frequently (we’ll want to see the table every day, but it’ll only change every game day).
Let’s clean up our query a little bit, remove some stuff, add some things and get it ready to build our materialized view:
Materialized View Query Notes
Everything gets indented one last time, to be behind the CREATE statement.
I removed the working set CTE since we are no longer (only) concerned with the Wolves Leicester game.
I replaced references to the working set in two ways: first by replacing ws in the FROM statements to the actual view we are going to use, xpert.england_view and then I establish an alias for that view, xev, which I use to replace the references to ws in the SELECT statement.
I added game_date to our materialized view query, in case we want to build the table at any given date for some historical context, and then I ORDERed the table by the game_date and team. If we are actually going to use this table, ordering it properly will make it easier for the database to execute the query (not as effective as putting in an index, but we’ll save that for later).
QA!
This whole exercise was an attempt to figure out an easier way to build the Premier League table. Logically, it follows that for this whole endeavor to be viewed as successful, we should have a pretty straightforward way to query what we have built to conjure the standings. So that should be our QA, do we accurately build the table as it stands today, (reminder: today is February 12th, 2019).
Goals Scored, Goal Difference and Points
AND NOW THE MOMENT OF TRUTH:
By jove, I think we have got it! We took our results data and used it to accurately construct the Premier League standings. We can now feel comfortable to hand our QA query to whomever wishes to know how the sports are going, and we can get outta here and enjoy our well-earned beer!
Structured Beer-y Language
Getting these standings was the first thing I really wanted to do when I was a fledgling data-dude, and, as I mentioned at the end of our first edition of the Sporting Stylish SQL series, it took me an unfortunately long time to make the leap from what I had in front of me (12xpert’s data) to what I wanted given to me, unable to fathom building it for myself.
Only a couple of notes on the table at this point, a couple of good stories happening on opposite sides of the table:
This was the game week that Manchester City took the lead from Liverpool, a lead they would not relinquish. Still kind of amazing the Reds could lose only one game for a whole season and still not win the league, but sovereign wealth funds are hard things to contend with, even for the likes FSG and Lebron James.
Newcastle claws above Cardiff and out of the relegation zone, (or as Rog of the Men in Blazers like to call it, “the Premier League moon door”), managing to stay in the Prem at the end of the season, despite having maybe the worst owner in all of professional sports (nsfw).