
Sports AI
Sports prediction system built around scraped data, feature engineering, and model comparison.
Overview
I explored sports prediction using scraped multi-season team, player, schedule, and odds data. I built the data pipeline, engineered game-level features, and compared simpler baselines with tree-based and neural approaches.
Data pipeline
The project started with scraping and normalizing team, player, schedule, and odds data across seasons. From there, my focus was turning raw, inconsistent records into a reliable game-level modeling dataset: aligning historical inputs correctly, engineering signal-bearing features, consolidating overlapping metrics, and removing noisy, redundant, or leakage-prone variables so the training examples reflected realistic prediction conditions.
Modeling approach
I compared simpler baselines and tree-based models against neural experiments. One experiment represented rosters and lineups as matrices for a CNN-style architecture, but the broader lesson was that cleaner features, validation, and baselines often mattered more than model complexity.
What I built
- Scrapers for team, player, schedule, and odds data
- Feature engineering pipeline for game-level predictions
- Model training and comparison across baselines, tree models, and neural experiments
- Leakage-aware validation and backtesting workflows



