BigQuery for SQL Analytics and ML — Hands-On Guide | Meteora Web • Meteora Web Agency

Your PostgreSQL database starts groaning above 100 GB. Analytical queries that span months of sales take minutes and lock the system. Or you run an e-commerce generating millions of log rows a day and can no longer extract sensible reports. At Meteora Web, we've been there. We manage ERP systems and high-traffic sites, and the solution we use when data outgrows traditional relational bottlenecks is BigQuery. It's not a regular database: it's a serverless analytics engine that processes terabytes in seconds, you pay only for the queries you run, and you can do machine learning with plain SQL.

Why Is BigQuery Different from Traditional Databases?

If you've worked with MySQL or PostgreSQL, you know you have to size CPU, RAM, storage for each project. When load grows, you upgrade, with time and cost. BigQuery removes infrastructure management: no servers to configure, no indexes to optimize manually. Data is stored in columnar compressed format and queries run on a distributed cluster that Google scales automatically.

Real example: One of our retail clients had a 50 GB MySQL database with 3 years of sales. A grouping query by category and month took 4 minutes and overloaded the server. With BigQuery, loading the same data, the same query dropped to 1.8 seconds. Cost? A few cents for scan. And never worrying about hardware upgrades.

Pricing Works Differently Too

You don't pay for a server running 24/7. You pay for the amount of data read per query (on-demand) or with fixed slots for predictable loads. Storage is very cheap: ~$0.02/GB/month for active data, less for older data. The key is to avoid SELECT * on huge tables: every column read is billed. We always recommend partitioning and clustering tables on date or common filter fields to minimize costs.

How Does Pay-As-You-Go Work?

BigQuery charges based on bytes processed by queries (on-demand pricing) or via slots. For most SMBs, on-demand is best: you only pay for what you use. The first 1 TB per month is free (1 TB of data processed), then about $5/TB.

Practical strategy: before writing a query, use SELECT COUNT(*) to estimate bytes scanned. At Meteora Web, we automated a check in our pipelines: if estimated bytes exceed a threshold, the query is blocked and flagged. One client cut their monthly bill from $120 to $15 simply by filtering dates better.

Partitioning and Clustering to Save

Split the table into partitions (e.g., by day) and cluster on columns often used in filters (e.g., product category). Example partitioned table creation:

CREATE TABLE `my_project.sales.invoices`
PARTITION BY DATE(invoice_date)
CLUSTER BY category AS
SELECT * FROM `my_project.raw.invoices`;

Now any query on a single date will scan only that partition, not the whole table. Cost decimated.

What Are the Best Practices for Writing SQL on BigQuery?

BigQuery uses standard SQL (GoogleSQL) with some extensions. Key differences include handling nested fields (arrays and structs) and syntax for public datasets. Here are rules we apply every day:

Never SELECT *: always specify the columns you need. Drastically reduces bytes processed.
Filter first: use WHERE on partitioned columns (e.g., DATE(data) BETWEEN '2024-01-01' AND '2024-12-31') to limit the scan.
Use window functions: for ranking, running totals, year-over-year comparisons. BigQuery supports them natively.
Use temporary tables: if you need multiple queries on the same data, use CREATE TEMP TABLE to avoid re-reading sources.

Example: Query on Public Dataset

BigQuery offers public datasets (weather, Google Trends, GitHub). Here's how to calculate average temperature in Italy for 2024:

SELECT
  state,
  AVG(avg_temp) AS avg_temperature
FROM `bigquery-public-data.ghcn_d.ghcnd_stations` AS s
JOIN `bigquery-public-data.ghcn_d.ghcnd_data` AS d
  ON s.id = d.id
WHERE s.state = 'IT'
  AND d.date BETWEEN '2024-01-01' AND '2024-12-31'
GROUP BY state;

Query completes in under 2 seconds on 1.2 TB of global data. Try it yourself from your free Google account.

How to Integrate Machine Learning with BigQuery ML?

Here's the real gem: you don't need to export data to a Python notebook to build predictions. BigQuery ML lets you create, train, and evaluate machine learning models using only SQL. It supports linear and logistic regression, k-means clustering, time series, matrix factorization, and even pre-trained Vertex AI models.

Practical example: Predict daily revenue of an e-commerce based on historical sales and holidays.

Create the model

CREATE MODEL `my_project.models.revenue_forecast`
OPTIONS(model_type='ARIMA_PLUS',
        time_series_timestamp_col='date',
        time_series_data_col='revenue',
        holiday_region='IT') AS
SELECT date, revenue
FROM `my_project.sales.daily`;

Forecast next 7 days

SELECT *
FROM ML.FORECAST(MODEL `my_project.models.revenue_forecast`,
                 STRUCT(7 AS horizon, 0.95 AS confidence_level));

Result: a table with dates, predicted value, and confidence interval. No export, no infrastructure. At Meteora Web, we used this approach for a fashion client: 10 lines of SQL gave them a seasonal forecasting model directly integrated into their Looker Studio report.

How to Connect BigQuery to BI and Reporting Tools?

Data in BigQuery is ready for visualization. Common tools:

Looker Studio (ex Data Studio): native connection, auto-refresh, interactive filters.
Python / pandas: using google-cloud-bigquery library, run queries and get a DataFrame in two lines.
Excel: via ODBC connector or direct connection (limited to 10k rows).

Python example

from google.cloud import bigquery
import pandas as pd

client = bigquery.Client()
sql = """
SELECT DATE(date) as day, ROUND(SUM(amount),2) as total
FROM `my_project.sales.invoices`
WHERE date >= '2025-01-01'
GROUP BY day
ORDER BY day
"""
df = client.query(sql).to_dataframe()
print(df.head())

With this code you can integrate BigQuery into any custom analysis pipeline or dashboard.

When Is BigQuery Right and When Not?

BigQuery is not a silver bullet. If you have datasets under 10 GB and transactional queries (many writes, point reads by key), a traditional relational database is more suitable and cheaper. But if you need to aggregate millions of rows, do time-series analysis, or train ML models on large volumes, BigQuery is the right tool.

A case where we advise against it: a management system requiring real-time updates with many small transactions. BigQuery is optimized for batch and analytical queries, not OLTP. For that we use PostgreSQL or MySQL.

What to Do Now

Activate a Google Cloud project (the first 1 TB of queries per month is free).
Upload a test dataset (e.g., a CSV with your sales) via console or bq load.
Run a simple query with filters and GROUP BY to see the speed.
Try BigQuery ML with the public Google Trends dataset or your own data for a forecast.
Connect BigQuery to Looker Studio and create a live dashboard.

To dive deeper into the entire GCP ecosystem for developers, read our pillar guide on Google Cloud Platform.

Official docs: Google BigQuery Documentation.