Introduction to BigQuery
We currently use Google's BigQuery to host our Data Warehouse. BigQuery allows you to use SQL to explore the tables and answer questions about your games without worrying about infrastructure management.
If you're new to BigQuery and want to have a feel about it before testing Data Warehouse, you can follow their Get started with BigQuery and explore a sandbox environment with public datasets.
To follow along on this introduction, it's recommended to be familiar with SQL.
The main concepts you’ll need to understand while using BigQuery are:
-
Project: your organisation's base-level GCP entity
-
Dataset: top-level container used to organise and control access to your tables and views. By default, you’ll get access to a GameAnalytics dataset named “checkpoints” with your players' data. You are free to create other datasets if you wish to store the results of common queries.
-
Table: each table contains individual records organised in rows; each record is composed of columns (also called fields). By default, within the “checkpoints” dataset, you’ll find tables as per the above documentation. You may not create new tables within this dataset, but you are welcome to create tables in a new dataset.
-
View: a view is nothing but a virtual table defined by a SQL query. Instead of storing the data, it queries other tables and presents the data as per a chosen SQL query, avoiding unnecessary data duplication. In fact, the provided checkpoint tables are actually views.
Getting started with BigQuery (in Data Warehouse)
Once you get access Data Warehouse, you can access it by the link provided or in your projects in BigQuery's project selector. From there, ensure you're on the BigQuery section of Google Cloud by clicking on the menu on the top left and selecting it.
You should be able to see the project and datasets listed:
Data Warehouse consists of multiple tables, split by event type.
The only exception is the player_checkpoint
table, which has all the players' data like country_code
, install_ts
, build
and much more.
Check out the schemas for a full description.
Writing queries
To write a query in BigQuery, we can use the console or access programmatically. In this introduction, we'll focus on the console side of things.
On the same page, there is an open area to write our query.
If it's not showing for you, click on the +
sign.
First query - Daily Active Users
Let's start with a simple example to find the DAU.
SELECT
COUNT(DISTINCT player_id)
FROM
`<project_id>_checkpoints.player_checkpoint`
WHERE
checkpoint = "2022-01-01"
Remember to change <project_id>
by your project_id!
If you press the run button at the top, this will compute and show you the result at the bottom of the screen. Something like this:
Let's break down what's going on here:
- Calculate the count of unique player_ids that exist by using
SELECT COUNT(DISTINCT player_id)
. - Picking the table
player_checkpoint
by usingFROM <project_id>_checkpoints.player_checkpoint
.- Note that we're using
<project_id>_checkpoints
dataset. Queries to the dataset with no postfix will fail.
- Note that we're using
- Specify the date by adding the
WHERE checkpoint = "2022-01-01
.
The tables in Data Warehouse are partitioned by checkpoint, so it's crucial to add the WHERE
clause for optimisation purposes.
Learn more about partitioned tables here.
Breakdown by game
If you have just one game, the first query might work fine, but you might have more than one. For example, maybe you have a test/beta game and a live game. Or perhaps you have dozens or hundreds of games and don't want to mix them up!
Let's adapt the previous query to allow us to see the DAU per game.
SELECT
game_id,
COUNT(DISTINCT player_id) as dau
FROM
`<project_id>_checkpoints.player_checkpoint`
WHERE
checkpoint = "2022-01-01"
GROUP BY
1
Now the results show one game per row (in my case one android and one ios) and the DAU for each.
We can also add a name to the dau column, as f0
is not very descriptive. So what changed?
- Added a
game_id
to theSELECT
clause. - Named the second argument with
as dau
. - Added a clause
GROUP BY 1
to group the results based on thegame_id
.
Joining tables - adding the game name
No one will care for our tables or charts if we just list game ids. After all, people recognise the game's titles, not some number.
To find the title, we just need to join our previous query with the game_metadata
table and pick the game's title.
SELECT
m.title,
COUNT(DISTINCT c.player_id) as dau
FROM
`<project_id>_checkpoints.player_checkpoint` c
LEFT JOIN `<project_id>_checkpoints.game_metadata` m ON c.game_id = m.id
WHERE
c.checkpoint = "2022-01-01"
GROUP BY
1
The results are the same as the previous query, with the title instead of the id.
In this section:
- Add a
LEFT JOIN
with thegame_metadata
table on the game's id. - Replace the
game_id
with thetitle
of thegame_metadata
table. - Provided an alias for each of the tables.
Conclusion
Over the previous sections we covered the basics of Data Warehouse on BigQuery. There is a lot more to explore, and we have some more advanced examples in our Example Queries section.
Before moving on to the next page, we recommend reading about optimising your queries.