Building

Sports AI

Multi-sport prediction research system for game outcomes, scoring, odds, and betting-market edge.

PythonPolarsscikit-learnTensorFlow/KerasSports Data APIsOdds DataParquet

Overview

I explored a sports prediction and betting-research system for modeling games across major sports. The project focuses on turning historical team, player, schedule, event, and odds data into leak-safe prediction datasets, then comparing model probabilities against betting markets to study outcomes, scoring, closing-line value, and potential edge.

Data pipeline

The project started with scraping and normalizing team, player, schedule, and odds data across seasons. From there, my focus was turning raw, inconsistent records into a reliable game-level modeling dataset: aligning historical inputs correctly, engineering signal-bearing features, consolidating overlapping metrics, and removing noisy, redundant, or leakage-prone variables so the training examples reflected realistic prediction conditions.

Modeling approach

I compared simpler baselines and tree-based models against neural experiments. One experiment represented rosters and lineups as matrices for a CNN-style architecture, but the broader lesson was that cleaner features, validation, and baselines often mattered more than model complexity.

What I built

Scrapers for team, player, schedule, and odds data
Feature engineering pipeline for game-level predictions
Model training and comparison across baselines, tree models, and neural experiments
Leakage-aware validation and backtesting workflows

Related projects

Building

Sports AI

Overview

Data pipeline

Modeling approach

What I built

Related projects

Pulse

Runway

Rooutes

Soler London