Prabuddha Tamhane | Data Scientist & Machine Learning Engineer

01

About Me

I'm a Data Scientist and Machine Learning Engineer specializing in building, validating, and deploying machine learning models, serverless MLOps pipelines, and financial optimization systems.

My background is uniquely positioned at the intersection of computer engineering and financial analytics. I hold a dual B.Tech in Computer Engineering & MBA in Finance from NMIMS, and am completing my Master of Data Science at the University of British Columbia (UBC).

I build production-grade AI systems, MLOps solutions, and quantitative tools—like my Paper Trader AI platform running 3 live strategies, and my automated property pre-fill pipeline built with foundation models at Square One Insurance.

Master of Data Science, UBC (2025-2026)

MBA Finance + B.Tech Computer Engineering

02

Featured Projects

Featured Project

Satellite Imagery - Property Attribute Prediction Pipeline

Automated prediction pipeline. Developed for Square One Insurance to pre-fill property attributes on insurance quote forms, reducing customer abandonment. Masked building footprints using Microsoft Building Footprints with fallback custom U-Net models, extracting features using Clay/DINOv2 mapping to an XGBoost classification head.

Performance: Achieved 96.83% precision at 95% confidence threshold with 35.46% spatial coverage. Runs 26x faster than Swin/Clay backbones.

96.83% Precision

35.46% Coverage

26x Speedup

95% Confidence

Python
PyTorch
DINOv2
U-Net
SAM
Clay
Docker

Enterprise Capstone

Technical Details & Risk Metrics

Model Pipeline

Feature Extractor: Meta DINOv2 self-supervised model
Fallback Decoder: Custom U-Net for boundary refinement
Building Footprints: Microsoft Building Footprints integration
Output Calibration: Isotonic regression & Platt scaling

Performance & Scale

Precision: 96.83% at 95% confidence threshold
Quote Pre-fill: 35.46% automated coverage
Throughput: 26x faster feature extraction vs Swin/Clay
Inference: Optimized batch sizes for large-scale raster tiles

Engineering Quality

Containerization: Dockerized inference container for AWS deployment
Pipelines: End-to-end data pipelines from TIFF raster processing
Testing: Cross-validated calibration curves to verify reliability
Artifacts: Standardized JSON metadata output for database ingestion

property-attribute-pipeline

# Running self-supervised feature extraction
$ python predict.py --input satellite_img.tif

Feature Extraction: 26x Speedup vs Clay
Building Footprints:MS Footprints + U-Net
Calibration:        96.83% Precision (35.46% Coverage)

ACTUARIAL THRESHOLD MET: Pre-fill payloads generated.

Featured Project

Paper Trader AI

I built a trading system that runs itself. Three strategies compete head-to-head on the S&P 500: momentum, XGBoost, and LSTM. Backtested on 9 years of data, now running live with automated daily trades.

Live results (Oct 2025 – May 2026): Momentum leads at +18.8% Alpha over SPY, 1.54 Sharpe. Paper trading only.

+18.8% Alpha over SPY

1.54 Sharpe Ratio

3 Models Live

4.3M+ Data Points

Python
XGBoost
TensorFlow
Pandas
GitHub Actions
Streamlit

Live Dashboard GitHub

Technical Details & Risk Metrics

Risk Management

Max Drawdown: -9.1% (vs SPY -5.1%)
Stop-Loss: 15% per position
Position Limits: 15% per stock, 30% per sector
Transaction Costs: 5 bps slippage modeled

Strategy Breakdown

Momentum: 12-1 month price factor (Fama-French)
ML: XGBoost + 15 features (RSI, MACD, etc.)
DL: LSTM with 60-day sequences
Universe: S&P 500 (503 stocks)

Engineering

Tests: 75 unit tests with CI/CD
Automation: 6 GitHub Actions workflows
Infrastructure: Docker-ready, SQLite cache
Data Quality: Point-in-time universe (no look-ahead bias)

paper-trader

# Paper Trader AI - Live Performance
$ python main.py --status

System:  LIVE since Oct 2025
Leader:  Momentum +18.8% Alpha (vs SPY +3.94%)
Sharpe:  1.54 | Excess: +14.86%

All 3 models running autonomously.
View live dashboard for real-time updates →

Featured Project

Amazon Hybrid RAG Assistant

High-performance product search assistant. Built an end-to-end Retrieval-Augmented Generation system using hybrid lexical-semantic retrieval (BM25 + FAISS) and Reciprocal Rank Fusion. Highly optimized data storage and indexing structures to minimize server footprint.

Performance: Indexed 94K+ products, reducing startup latency by 96% and lowering peak RAM from 6 GB to <2 GB.

-96% Latency Cut

< 2 GB Peak RAM

94K+ Products

BM25+FAISS Hybrid

Python
DuckDB
Parquet
FAISS
Ollama
Hugging Face
MLflow

GitHub

Technical Details & System Metrics

Hybrid Retrieval

Lexical Search: BM25 keyword matching for exact product IDs
Semantic Search: FAISS dense embeddings for conceptual queries
Rank Fusion: Reciprocal Rank Fusion (RRF) to merge score lists
Generator: Local Llama-3 models via Ollama server

Optimization

Startup Latency: Decreased by 96% through lazy-loaded vector files
RAM Footprint: Cut from ~6.1 GB to <2 GB via chunked loading
Storage: Structured data stored in optimized Parquet formats
Ingestion: In-memory DuckDB ETL pipelines for fast batching

MLOps & Reliability

Tracking: Logged hyperparameter runs and RRF alpha weights in MLflow
Validation: Ground-truth evaluation dataset for retrieval metrics
Caching: Redis-based caching of semantic embeddings
APIs: FastAPI server with async connection pooling

amazon-rag-assistant

# Optimizing hybrid search index
$ python index.py --products 94k

Retrieval:      BM25 + FAISS Hybrid
RAM Usage:      < 1.84 GB (was 6.1 GB)
Startup Time:   -96% Latency Cut

RRF ranking fusion initialized for 94K+ products.

03

Experience

Data Scientist (Capstone)

Square One Insurance Services | Vancouver, BC

Apr 2026 - Jun 2026

Developed an automated prediction pipeline using 0.5m x 0.5m resolution satellite imagery to pre-fill property attributes on insurance quote forms, reducing customer abandonment.
Masked building footprints using Microsoft Building Footprints, with a fallback to custom U-Net segmentation models when footprints are missing.
Extracted fixed features using Clay and DINOv2 foundation models, mapping them to an XGBoost classification head with data augmentation, fine-tuning, and regularization.
Performed prediction confidence calibration (Platt scaling, isotonic regression) to deliver high-certainty inputs necessary for actuarial pricing, tracking experiments via MLflow on AWS.

Python PyTorch DINOv2 U-Net Docker

Business Analyst Intern

Prudent Corporate Advisory Services

May 2024 - Sep 2024

Analyzed 5+ years of historical performance data for 50+ mutual funds to develop a data-driven model, improving product basket construction by 15%
Built a business expansion model using Python and SQL for geospatial and revenue forecast analysis across 10 markets
Conducted deep-dive revenue analysis across 15 branches and 4 sectors, identifying key performance drivers

Python SQL Financial Analysis

Software Development Intern

Datamatics Global Services

May 2023 - Jun 2023

Authored and optimized complex SQL stored procedures handling 10,000+ daily transactions, reducing query latency by 25%
Implemented dynamic data filtering and visualization using JavaScript and ASP.NET MVC, reducing report generation time by 40%

SQL JavaScript ASP.NET

04

Technical Skills

Languages

Python SQL R C++ C Bash JavaScript HTML/CSS

Machine Learning & Deep Learning

PyTorch TensorFlow XGBoost Scikit-learn DINOv2 U-Net SAM Clay Calibration (Platt, Isotonic) Time-Series Forecasting Computer Vision

Cloud, MLOps & Tools

AWS (Lambda, S3, ECR, EventBridge) Docker MLflow GitHub Actions (CI/CD) DuckDB Parquet FAISS Ollama Hugging Face Streamlit Git PostgreSQL SQLite Excel (VBA)

Financial Analytics & Quant

Algorithmic Trading Backtesting Factor Investing Risk Management (VaR, Drawdown) Mean-Variance Optimization

05

Get In Touch

I'm actively seeking opportunities in Data Science and AI/ML Engineering. Whether you have a role in mind, a project to discuss, or just want to connect, I'd love to hear from you.

prabuddha.tamhane@gmail.com