CartPole + Q-Learning + PPO + Optuna HPO + A/B Experimentation
Master automated machine learning pipelines with reinforcement learning, hyperparameter optimization, and rigorous experimentation
Visual guide to the CartPole environment. Understand actions, observations, rewards, and episode termination conditions.
Deep dive into Q-Learning hyperparameters. Interactive demos for learning rate, discount factor, epsilon, episodes, and bins.
Interactive deep-dive into PPO. Covers RL basics, policy gradients, actor-critic, clipping mechanism, and hyperparameters.
15 different algorithms to solve CartPole! Compare classic control, value-based RL, policy gradients, and evolutionary methods.
Assign and understand team roles: Runner, Maintainer, Analyst, Reviewer. Get responsibility checklists for each role.
Build your evaluation protocol. Configure metrics, seeds, timestep budgets, and export eval_protocol.md.
Generate docs/runbook.md for your project. Document runner assignment, environment setup, run steps, and artifacts.
Python Q-Learning with live matplotlib graphs! Auto-manages venv, generates HTML report. Teams fork this and modify specific functions.
Step-by-step guide to Git/GitHub workflow. Learn branching, commits, PRs, and code review with interactive exercises.
4 improvement areas for Q-Learning. Each team member picks one: Learning Rate, Exploration, State Bins, or Reward Shaping.
Build your Q-Learning configuration with presets. Export JSON or copy Python code directly into vanilla_starter.py.
Configure Optuna hyperparameter optimization. Define search space, trial budgets, and generate Python code.
Interactive guide to Git branching, PRs, and code review. Learn best practices for OSS contribution.
Design A/B experiments comparing baseline vs candidate. Configure sample sizes, decision rules, and understand bootstrap CIs.
5-team jigsaw activity. Each team learns and teaches key concepts. Includes quizzes and progress tracking.
Starter templates and instructor tools for hands-on training
W19D1: Start here! Random → Hand-coded → Q-Learning progression.
W19D2: Python Q-Learning with auto-venv, live graphs, HTML report. Teams fork this.
W19D4: PPO training with Stable-Baselines3 (advanced).
W19D4: Optuna hyperparameter optimization for PPO.
Instructor: Run student Q-Learning configs, generate leaderboard.
Instructor: Visual CartPole with live training graphs.
# W19D1: Q-Learning (minimal)
pip install gymnasium numpy
# W19D4: PPO + HPO (advanced)
pip install gymnasium stable-baselines3 optuna tensorboard torch
# Optional: For visualization
pip install pygame matplotlib