W19-W20: AutoML + Reinforcement Learning

CartPole + Q-Learning + PPO + Optuna HPO + A/B Experimentation

Master automated machine learning pipelines with reinforcement learning, hyperparameter optimization, and rigorous experimentation

Learning Path

W19D1: Baseline + Measurement
W19D2: OSS Workflow
W19D4: HPO with Optuna
W20D1: Implementation
W20D2: PR Process
W20D4: A/B Testing

🛠 Your Workflow

1
Learn CartPole
2
Build Config
3
Submit to Instructor
4
Analyze Results
5
Create PR

🧰 Interactive Tools

🤖

PPO Algorithm Explainer

W19D4 - Advanced

Interactive deep-dive into PPO. Covers RL basics, policy gradients, actor-critic, clipping mechanism, and hyperparameters.

Animated Diagrams Interactive Sliders Quiz
Learn PPO →
🏆

CartPole Solutions

Instructor Reference

15 different algorithms to solve CartPole! Compare classic control, value-based RL, policy gradients, and evolutionary methods.

15 Algorithms Live Animation Training Charts
Explore Algorithms →
👥

Team Role Assigner

W19D1

Assign and understand team roles: Runner, Maintainer, Analyst, Reviewer. Get responsibility checklists for each role.

Role Assignment Checklists
Assign Roles →
📊

Eval Protocol Builder

W19D1

Build your evaluation protocol. Configure metrics, seeds, timestep budgets, and export eval_protocol.md.

Metric Selection Seed Config Export MD
Build Protocol →
📖

Runbook Generator

W19D2

Generate docs/runbook.md for your project. Document runner assignment, environment setup, run steps, and artifacts.

Documentation Reproducibility
Generate Runbook →
🔬

HPO Builder (Optuna)

W19D4

Configure Optuna hyperparameter optimization. Define search space, trial budgets, and generate Python code.

Search Space Trial Planning Code Gen
Build HPO →
🚀

PR Workflow Guide

W20D1-D2

Interactive guide to Git branching, PRs, and code review. Learn best practices for OSS contribution.

Git Flow Review Tips
Learn Workflow →
🧪

Experiment Designer

W20D4

Design A/B experiments comparing baseline vs candidate. Configure sample sizes, decision rules, and understand bootstrap CIs.

A/B Design Bootstrap CI Export Brief
Design Experiment →
📚

Concepts Reference

Team Learning

5-team jigsaw activity. Each team learns and teaches key concepts. Includes quizzes and progress tracking.

Jigsaw Activity Quizzes
Study Concepts →

🐍 Scripts & Starters

Starter templates and instructor tools for hands-on training

vanilla_starter.py

W19D1: Start here! Random → Hand-coded → Q-Learning progression.

w19d2_starter.py

W19D2: Python Q-Learning with auto-venv, live graphs, HTML report. Teams fork this.

baseline_starter.py

W19D4: PPO training with Stable-Baselines3 (advanced).

hpo_starter.py

W19D4: Optuna hyperparameter optimization for PPO.

instructor_runner_qlearning.py

Instructor: Run student Q-Learning configs, generate leaderboard.

demo_player.py

Instructor: Visual CartPole with live training graphs.

💡 Key Concepts Overview

🎯 CartPole Environment

  • Observation space (4D)
  • Action space (left/right)
  • Reward structure (+1 per step)
  • Episode termination

📚 Q-Learning (W19D1)

  • Learning rate (alpha)
  • Discount factor (gamma)
  • Epsilon-greedy exploration
  • State discretization (bins)

🤖 PPO Algorithm (W19D4)

  • Policy gradient method
  • Clipped objective
  • Actor-critic architecture
  • Advantage estimation

🔬 Optuna HPO

  • Trials and studies
  • Objective functions
  • Search space definition
  • Pruning strategies

🧪 A/B Experimentation

  • Baseline vs candidate
  • Statistical significance
  • Bootstrap confidence intervals
  • Decision rules

🚀 OSS Workflow

  • Feature branches
  • Pull request process
  • Code review norms
  • Merge strategies

👥 Team Roles

  • Runner: Execute experiments
  • Maintainer: Manage repo
  • Analyst: Interpret results
  • Reviewer: Quality control

📦 Setup Requirements

# W19D1: Q-Learning (minimal)
pip install gymnasium numpy

# W19D4: PPO + HPO (advanced)
pip install gymnasium stable-baselines3 optuna tensorboard torch

# Optional: For visualization
pip install pygame matplotlib