The Big Idea
W19: Q-Table
Store Q(s,a) for every state-action pair
q_table[state][action] = value
W20: Neural Network
LEARN to predict Q(s,a) for any state
q_network(state) → [Q(a0), Q(a1)]
Same goal (find optimal actions), different method (function approximation)
1 Why Q-Tables Break
In W19, we discretized CartPole's continuous states into bins. This works for small problems, but...
CartPole (4 variables)
10 bins each = 10,000 states
Q-table works fine!
Atari Game (84x84 pixels)
256 values per pixel = 2567056 states
More states than atoms in universe!
The Problem: Q-tables require visiting every state. For complex environments,
we need generalization — making good decisions for states we've never seen before.
2 What is a Neural Network?
A neural network is a function that learns patterns from data. Think of it as:
pos
vel
ang
a.v
Input Layer
(4 state values)
→
h1
h2
h3
...
Hidden Layer
(learns patterns)
→
Q(L)
Q(R)
Output Layer
(Q-values)
How It Works:
1
Forward Pass: Input flows through layers, each neuron computes
output = activation(weights · inputs + bias)
2
Prediction: Final layer outputs Q-value for each action
3
Learning: Compare prediction to target, adjust weights to reduce error
(backpropagation + gradient descent)
class QNetwork(nn.Module):
def __init__(self):
self.fc1 = nn.Linear(4, 128)
self.fc2 = nn.Linear(128, 128)
self.fc3 = nn.Linear(128, 2)
def forward(self, state):
x = F.relu(self.fc1(state))
x = F.relu(self.fc2(x))
return self.fc3(x)
3 From Q-Table to Q-Network
Q-Learning (W19)
discrete_state = discretize(state)
q_value = q_table[discrete_state][action]
q_table[s][a] += lr * (target - q_table[s][a])
Deep Q-Learning (W20)
state_tensor = torch.tensor(state)
q_values = q_network(state_tensor)
loss = (target - q_values[action])²
loss.backward()
Key Insight: The neural network generalizes! After seeing some states,
it can predict Q-values for similar states it's never encountered.
4 DQN Improvements (Toggle Switches!)
Just like W19, we have improvement switches to explore:
USE_EXPERIENCE_REPLAY
False
Store past experiences, sample random batches
USE_TARGET_NETWORK
False
Separate network for stable Q-targets
USE_DOUBLE_DQN
False
Reduce Q-value overestimation
Why These Matter:
1
Experience Replay: Without it, we train on correlated, sequential data.
Replay breaks correlations by sampling random past experiences.
2
Target Network: Without it, we're chasing a moving target (the same
network predicts AND provides targets). A frozen copy provides stable targets.
3
Double DQN: Standard DQN overestimates Q-values. Use one network to
select actions, another to evaluate them.
5 Tonight's Lab
$
python workbook.py
The workbook provides a guided interface: configure techniques, run experiments,
and compare results — all in one place. Your progress is saved automatically.
Your Mission:
1
Run Baseline: Start with all techniques OFF — observe unstable training (loss explodes)
2
Enable Core Techniques: Turn on Experience Replay + Target Network TOGETHER — they're synergistic!
3
View Results: Compare experiments side-by-side in the results view
4
Explore Advanced: Try Double DQN, Prioritized Replay, and other techniques
6 Resources
RL Fundamentals
State, Action, Reward - the three pillars of reinforcement learning
Start Here →
Q-Table Explained
Interactive demo of Q-learning with visual Q-table updates
Explore Q-Tables →
How a Neuron Works
Step-by-step animation of a single neuron's computation
Learn the Basics →
Neural Network Explainer
Interactive visualization of how neural networks learn
Open Visualization →
DQN Techniques Reference
Complete guide to all DQN improvements with interactive visuals
Explore Techniques →
Reading Learning Graphs
How to interpret training curves and diagnose performance
Interpret Results →
DQN Paper (2015)
The original DeepMind paper that started it all
Nature Article →
Session Agenda
6:30 - 6:50
The Scaling Problem: Why Q-Tables Break
6:50 - 7:20
Neural Networks 101: Forward Pass & Learning
7:20 - 7:50
From Q-Table to Q-Network: The Translation
7:50 - 8:30
Build Block: Implement DQN Agent
8:30 - 9:00
DQN Improvements: Replay & Target Network
9:00 - 9:30
Compare Results + Wrap-up