jason blahovec
open source · v0.2.2

nhl-bigquery

nhl-bigquery ingests NHL play-by-play into BigQuery, writing six related tables in lockstep per chunk. Its standout feature: each play carries the on-ice player IDs for both teams, reconstructed by merging the play-by-play feed against shift-charts that live on a different (legacy) NHL API host.

PythonBigQueryNHL APICLIrepo ↗PyPI ↗
architecture
nhl_playsone row per event, with home_on_ice_ids / away_on_ice_ids ARRAY<INT64>
gamesschedule dimension
game_officialsreferees + linesmen per game
boxscore_statsper-player per-game stats
shiftsper-shift per-player intervals
standingsdaily team-standings snapshots
per game: pull play-by-play + shift-charts (two API hosts) → merge on-ice arrays + derive strength_state → write all six tables in lockstep → verify against the NHL public API.
install & usage
pip install nhl-bigquery

gcloud auth application-default login
nhl-bigquery sync \
    --start 2024-10-01 --end 2025-06-30 \
    --plays-table myproject.mydataset.nhl_plays

# resumable backfill in monthly chunks
nhl-bigquery sync --start 2010-10-01 --end 2026-05-11 \
    --plays-table myproject.mydataset.nhl_plays
design decisions
On-ice arrays merged across two API hosts
`home_on_ice_ids`/`away_on_ice_ids` are reconstructed by reconciling the play-by-play feed (api-web) against shift-charts that only exist on the legacy `api.nhle.com/stats` host — a cross-source join most public NHL datasets don't attempt.
Six tables written in lockstep
Each chunk writes plays, games, officials, boxscores, shifts, and standings together, so the warehouse is never half-updated for a game; resumable monthly chunks make a 15-season backfill restartable.
Derived strength_state + ColumnSpec docs
Even-strength / power-play / empty-net / 4v4 / 3v3 is derived from skater counts + goalie flags; the same ColumnSpec → multi-renderer docs pattern as the sibling libraries ships LLM/markdown/dictionary docs.
interview talking points