jason blahovec
open source · v0.1.1

nhl-hut-bigquery

nhl-hut-bigquery snapshots EA NHL Ultimate Team player ratings into BigQuery, unifying HUT's separate skater and goalie endpoints into one snapshot-typed table, and ships the same SQL/LLM-agent docs as its sibling libraries. Its distinguishing feature is a cross-source resolver: every card carries an accent-folded normalized name and a match ladder that links HUT cards to NHL player IDs — so EA's subjective ratings can sit alongside real NHL play-by-play from nhl-bigquery.

PythonBigQueryEA NHL HUTCLIrepo ↗PyPI ↗
architecture
hut_player_ratingsone row per card per snapshot_date — a 57-column unified schema where goalie-only ratings are null for skaters and skater-only ratings are null for goalies
hut_player_xrefresolver output — each HUT card mapped to a resolved_nhl_player_id with match_method + confidence
_nhl_hut_ingest_runsappend-only log of every sync (cards seen, rows written, endpoint, status)
scrape both HUT endpoints (player_stats.php + goalie_stats.php) → parse, accent-fold names, tag goalies → snapshot-typed DELETE-then-INSERT into BigQuery → resolve cards to NHL player IDs via a 4-rung match ladder → verify what share of NHL ice-time the snapshot covers.
install & usage
pip install nhl-hut-bigquery

gcloud auth application-default login
# 1. snapshot today's HUT ratings (skaters + goalies)
nhl-hut-bigquery sync \
    --table myproject.mydataset.hut_player_ratings

# 2. resolve HUT cards to NHL player IDs
nhl-hut-bigquery resolve-ids \
    --xref-table myproject.mydataset.hut_player_xref \
    --hut-table myproject.mydataset.hut_player_ratings \
    --nhl-boxscore-table myproject.mydataset.boxscore_stats --season 2024

# 3. how much NHL ice-time does the snapshot cover?
nhl-hut-bigquery verify --aggregation hut-coverage \
    --xref-table myproject.mydataset.hut_player_xref \
    --nhl-boxscore-table myproject.mydataset.boxscore_stats --season 2024
design decisions
Two endpoints, one unified schema
Skaters (player_stats.php) and goalies (goalie_stats.php) expose different attribute sets; both collapse into one 57-column table — the 11 goalie-only ratings are null for skaters and the skater-only ratings are null for goalies, while six shared attributes are populated from whichever endpoint produced the row.
Snapshot-typed, not chunked
Each sync appends a full dated snapshot partitioned on snapshot_date; re-running the same date MERGE-replaces it atomically. A daily cron therefore accumulates a diffable rating-history, and a cross-snapshot-drift check flags suspicious overall-rating jumps between snapshots.
Cross-source name resolver
HUT exposes no NHL player_id, so every card carries an accent-folded player_full_name_normalized join key plus a 4-rung match ladder (direct id → name+team+season → name+position+season → name) to link cards to NHL player IDs against nhl-bigquery's player data — the bridge between EA's ratings and real NHL stats.
interview talking points