W19-W20: AutoML + Reinforcement Learning

🎯

CartPole Explainer

W19D1 - Start Here

Visual guide to the CartPole environment. Understand actions, observations, rewards, and episode termination conditions.

Interactive Simulation Visual Learning

Explore Environment →

🔧

Hyperparameters Explained

W19D1 - Must Read

Deep dive into Q-Learning hyperparameters. Interactive demos for learning rate, discount factor, epsilon, episodes, and bins.

Interactive Demos Visual Explanations Analogies

Learn Parameters →

🤖

PPO Algorithm Explainer

W19D4 - Advanced

Interactive deep-dive into PPO. Covers RL basics, policy gradients, actor-critic, clipping mechanism, and hyperparameters.

Animated Diagrams Interactive Sliders Quiz

Learn PPO →

🏆

CartPole Solutions

Instructor Reference

15 different algorithms to solve CartPole! Compare classic control, value-based RL, policy gradients, and evolutionary methods.

15 Algorithms Live Animation Training Charts

Explore Algorithms →

👥

Team Role Assigner

W19D1

Assign and understand team roles: Runner, Maintainer, Analyst, Reviewer. Get responsibility checklists for each role.

Role Assignment Checklists

Assign Roles →

📊

Eval Protocol Builder

W19D1

Build your evaluation protocol. Configure metrics, seeds, timestep budgets, and export eval_protocol.md.

Metric Selection Seed Config Export MD

Build Protocol →

📖

Runbook Generator

W19D2

Generate docs/runbook.md for your project. Document runner assignment, environment setup, run steps, and artifacts.

Documentation Reproducibility

Generate Runbook →

🐍

W19D2 Python Starter

W19D2 - Team Activity

Python Q-Learning with live matplotlib graphs! Auto-manages venv, generates HTML report. Teams fork this and modify specific functions.

Auto Venv Live Graphs HTML Report

Download w19d2_starter.py →

🛠

GitHub Workflow Tutorial

W19D2 - Must Read

Step-by-step guide to Git/GitHub workflow. Learn branching, commits, PRs, and code review with interactive exercises.

7 Steps Commands Quizzes

Learn Git →

🔍

Research Topics

W19D2 - Pick One!

4 improvement areas for Q-Learning. Each team member picks one: Learning Rate, Exploration, State Bins, or Reward Shaping.

4 Topics Code Examples Resources

Choose Topic →

📦

Q-Learning Config Builder

W19D1 - Build & Run

Build your Q-Learning configuration with presets. Export JSON or copy Python code directly into vanilla_starter.py.

5 Presets Python Code JSON Export

Build Config →

🔬

HPO Builder (Optuna)

W19D4

Configure Optuna hyperparameter optimization. Define search space, trial budgets, and generate Python code.

Search Space Trial Planning Code Gen

Build HPO →

🚀

PR Workflow Guide

W20D1-D2

Interactive guide to Git branching, PRs, and code review. Learn best practices for OSS contribution.

Git Flow Review Tips

Learn Workflow →

🧪

Experiment Designer

W20D4

Design A/B experiments comparing baseline vs candidate. Configure sample sizes, decision rules, and understand bootstrap CIs.

A/B Design Bootstrap CI Export Brief

Design Experiment →

📚

Concepts Reference

Team Learning

5-team jigsaw activity. Each team learns and teaches key concepts. Includes quizzes and progress tracking.

Jigsaw Activity Quizzes

Study Concepts →

🐍 Scripts & Starters

Starter templates and instructor tools for hands-on training

vanilla_starter.py

W19D1: Start here! Random → Hand-coded → Q-Learning progression.

w19d2_starter.py

W19D2: Python Q-Learning with auto-venv, live graphs, HTML report. Teams fork this.

baseline_starter.py

W19D4: PPO training with Stable-Baselines3 (advanced).

hpo_starter.py

W19D4: Optuna hyperparameter optimization for PPO.

instructor_runner_qlearning.py

Instructor: Run student Q-Learning configs, generate leaderboard.

demo_player.py

Instructor: Visual CartPole with live training graphs.

💡 Key Concepts Overview

🎯 CartPole Environment

Observation space (4D)
Action space (left/right)
Reward structure (+1 per step)
Episode termination

📚 Q-Learning (W19D1)

Learning rate (alpha)
Discount factor (gamma)
Epsilon-greedy exploration
State discretization (bins)

🤖 PPO Algorithm (W19D4)

Policy gradient method
Clipped objective
Actor-critic architecture
Advantage estimation

🔬 Optuna HPO

Trials and studies
Objective functions
Search space definition
Pruning strategies

🧪 A/B Experimentation

Baseline vs candidate
Statistical significance
Bootstrap confidence intervals
Decision rules

🚀 OSS Workflow

Feature branches
Pull request process
Code review norms
Merge strategies

👥 Team Roles

Runner: Execute experiments
Maintainer: Manage repo
Analyst: Interpret results
Reviewer: Quality control