Skip to main content

Optimization & Quotas

Query Optimization

The Data Warehouse contains extensive datasets that, if queried inefficiently, can result in high costs due to scanning large volumes of data.

The following are some suggestions on how to improve your queries when using Data Warehouse. Please check BigQuery's best practices documentation for more general cost-reduction strategies.

Utilize Partition Fields

Using a WHERE clause to filter queries by the partition field significantly reduces the data scanned and helps lowering the costs.

Partitions in Data Warehouse:

  • <table>_checkpoint are partitioned by the field checkpoint
  • <table>_event are partitioned by event_date

Example:

SELECT ... FROM <table> WHERE checkpoint = "2024-01-01"

Clustering

Clustering can improve the performance and reduce costs, but might not be reflected in the query estimation step.

All Data Warehouse tables are clustered by game_id, so when possible try to include this field in your queries.

Example:

SELECT ...FROM <table> WHERE game_id = 1234

Specify Required Columns

BigQuery columnar storage format means only scanned columns incur costs.

Avoid SELECT * to prevent unnecessary data scanning.

Rething LIMIT for Cost Control

In BigQuery, LIMIT does not reduce the amount of data scanned, therefore it doesn't impact costs.

Optimize for Data Visualization Tools

Directly connecting data visualization and large datasets can quickly increase costs.

Use scheduled queries to create and update smaller aggregated tables that contain the data required for your dashboards.

Quotas

Quotas in BigQuery are used to avoid breaching a stipulated query cost. If a specific query estimation would cause the project to go over the quota, an error will be shown indicating that the query failed because the quota limit has been reached.

Due to the table sizes, it's possible that queries that scan a large time range can have a query estimation that reaches this threshold. The only workaround at the moment is to reduce the time range or increase the quota.

By default, the Data Warehouse project is set up with a quota of 6TB per day, which can be adjusted by contacting our support team.

info

Using explicit game_id in the WHERE section of the query should reduce estimated query scan size, and avoid hitting the quota limit.