From Q-Tables to Neural Networks

The leap that enabled AI to play Atari games

W20D1 - Deep Q-Learning

The Big Idea

W19: Q-Table

Store Q(s,a) for every state-action pair

q_table[state][action] = value

W20: Neural Network

LEARN to predict Q(s,a) for any state

q_network(state) → [Q(a0), Q(a1)]

Same goal (find optimal actions), different method (function approximation)

1 Why Q-Tables Break

In W19, we discretized CartPole's continuous states into bins. This works for small problems, but...

CartPole (4 variables)

10 bins each = 10,000 states

Q-table works fine!

Atari Game (84x84 pixels)

256 values per pixel = 2567056 states

More states than atoms in universe!

The Problem: Q-tables require visiting every state. For complex environments, we need generalization — making good decisions for states we've never seen before.

2 What is a Neural Network?

A neural network is a function that learns patterns from data. Think of it as:

pos
vel
ang
a.v
Input Layer
(4 state values)
h1
h2
h3
...
Hidden Layer
(learns patterns)
Q(L)
Q(R)
Output Layer
(Q-values)

How It Works:

1
Forward Pass: Input flows through layers, each neuron computes output = activation(weights · inputs + bias)
2
Prediction: Final layer outputs Q-value for each action
3
Learning: Compare prediction to target, adjust weights to reduce error (backpropagation + gradient descent)
class QNetwork(nn.Module): def __init__(self): # 4 inputs (state) → 128 hidden → 2 outputs (Q-values) self.fc1 = nn.Linear(4, 128) # First layer self.fc2 = nn.Linear(128, 128) # Hidden layer self.fc3 = nn.Linear(128, 2) # Output: Q(left), Q(right) def forward(self, state): x = F.relu(self.fc1(state)) # ReLU activation x = F.relu(self.fc2(x)) return self.fc3(x) # Q-values for all actions

3 From Q-Table to Q-Network

Q-Learning (W19)

# Discretize state discrete_state = discretize(state) # Look up Q-value q_value = q_table[discrete_state][action] # Update rule q_table[s][a] += lr * (target - q_table[s][a])

Deep Q-Learning (W20)

# Use raw continuous state state_tensor = torch.tensor(state) # Network predicts Q-values q_values = q_network(state_tensor) # Update via backprop loss = (target - q_values[action])² loss.backward() # Adjust weights

Key Insight: The neural network generalizes! After seeing some states, it can predict Q-values for similar states it's never encountered.

4 DQN Improvements (Toggle Switches!)

Just like W19, we have improvement switches to explore:

USE_EXPERIENCE_REPLAY False Store past experiences, sample random batches
USE_TARGET_NETWORK False Separate network for stable Q-targets
USE_DOUBLE_DQN False Reduce Q-value overestimation

Why These Matter:

1
Experience Replay: Without it, we train on correlated, sequential data. Replay breaks correlations by sampling random past experiences.
2
Target Network: Without it, we're chasing a moving target (the same network predicts AND provides targets). A frozen copy provides stable targets.
3
Double DQN: Standard DQN overestimates Q-values. Use one network to select actions, another to evaluate them.

5 Tonight's Lab

$ python workbook.py

The workbook provides a guided interface: configure techniques, run experiments, and compare results — all in one place. Your progress is saved automatically.

Your Mission:

1
Run Baseline: Start with all techniques OFF — observe unstable training (loss explodes)
2
Enable Core Techniques: Turn on Experience Replay + Target Network TOGETHER — they're synergistic!
3
View Results: Compare experiments side-by-side in the results view
4
Explore Advanced: Try Double DQN, Prioritized Replay, and other techniques

6 Resources

RL Fundamentals

State, Action, Reward - the three pillars of reinforcement learning

Start Here →

Q-Table Explained

Interactive demo of Q-learning with visual Q-table updates

Explore Q-Tables →

How a Neuron Works

Step-by-step animation of a single neuron's computation

Learn the Basics →

Neural Network Explainer

Interactive visualization of how neural networks learn

Open Visualization →

DQN Techniques Reference

Complete guide to all DQN improvements with interactive visuals

Explore Techniques →

Reading Learning Graphs

How to interpret training curves and diagnose performance

Interpret Results →

Experiment Results

View all your training graphs and data for analysis

Open Results Gallery →

DQN Paper (2015)

The original DeepMind paper that started it all

Nature Article →

Session Agenda

6:30 - 6:50
The Scaling Problem: Why Q-Tables Break
6:50 - 7:20
Neural Networks 101: Forward Pass & Learning
7:20 - 7:50
From Q-Table to Q-Network: The Translation
7:50 - 8:30
Build Block: Implement DQN Agent
8:30 - 9:00
DQN Improvements: Replay & Target Network
9:00 - 9:30
Compare Results + Wrap-up