From Q-Tables to Neural Networks

The leap that enabled AI to play Atari games

W20D1 - Deep Q-Learning

The Big Idea

W19: Q-Table

Store Q(s,a) for every state-action pair

q_table[state][action] = value

W20: Neural Network

LEARN to predict Q(s,a) for any state

q_network(state) → [Q(a0), Q(a1)]

Same goal (find optimal actions), different method (function approximation)

1 Why Q-Tables Break

In W19, we discretized CartPole's continuous states into bins. This works for small problems, but...

CartPole (4 variables)

10 bins each = 10,000 states

Q-table works fine!

Atari Game (84x84 pixels)

256 values per pixel = 256⁷⁰⁵⁶ states

More states than atoms in universe!

The Problem: Q-tables require visiting every state. For complex environments, we need generalization — making good decisions for states we've never seen before.

2 What is a Neural Network?

A neural network is a function that learns patterns from data. Think of it as:

pos

vel

ang

a.v

Input Layer
(4 state values)

→

...

Hidden Layer
(learns patterns)

→

Q(L)

Q(R)

Output Layer
(Q-values)

How It Works:

Forward Pass: Input flows through layers, each neuron computes output = activation(weights · inputs + bias)

Prediction: Final layer outputs Q-value for each action

Learning: Compare prediction to target, adjust weights to reduce error (backpropagation + gradient descent)

class QNetwork(nn.Module):
    def __init__(self):
        # 4 inputs (state) → 128 hidden → 2 outputs (Q-values)
        self.fc1 = nn.Linear(4, 128)   # First layer
        self.fc2 = nn.Linear(128, 128) # Hidden layer
        self.fc3 = nn.Linear(128, 2)   # Output: Q(left), Q(right)

    def forward(self, state):
        x = F.relu(self.fc1(state))  # ReLU activation
        x = F.relu(self.fc2(x))
        return self.fc3(x)           # Q-values for all actions
            

3 From Q-Table to Q-Network

Q-Learning (W19)

# Discretize state
discrete_state = discretize(state)

# Look up Q-value
q_value = q_table[discrete_state][action]

# Update rule
q_table[s][a] += lr * (target - q_table[s][a])

Deep Q-Learning (W20)

# Use raw continuous state
state_tensor = torch.tensor(state)

# Network predicts Q-values
q_values = q_network(state_tensor)

# Update via backprop
loss = (target - q_values[action])²
loss.backward()  # Adjust weights

Key Insight: The neural network generalizes! After seeing some states, it can predict Q-values for similar states it's never encountered.

4 DQN Improvements (Toggle Switches!)

Just like W19, we have improvement switches to explore:

USE_EXPERIENCE_REPLAY False Store past experiences, sample random batches

USE_TARGET_NETWORK False Separate network for stable Q-targets

USE_DOUBLE_DQN False Reduce Q-value overestimation

Why These Matter:

Experience Replay: Without it, we train on correlated, sequential data. Replay breaks correlations by sampling random past experiences.

Target Network: Without it, we're chasing a moving target (the same network predicts AND provides targets). A frozen copy provides stable targets.

Double DQN: Standard DQN overestimates Q-values. Use one network to select actions, another to evaluate them.

5 Tonight's Lab

$ python workbook.py

The workbook provides a guided interface: configure techniques, run experiments, and compare results — all in one place. Your progress is saved automatically.

Your Mission:

Run Baseline: Start with all techniques OFF — observe unstable training (loss explodes)

Enable Core Techniques: Turn on Experience Replay + Target Network TOGETHER — they're synergistic!

View Results: Compare experiments side-by-side in the results view

Explore Advanced: Try Double DQN, Prioritized Replay, and other techniques

6 Resources

RL Fundamentals

State, Action, Reward - the three pillars of reinforcement learning

Start Here →

Q-Table Explained

Interactive demo of Q-learning with visual Q-table updates

Explore Q-Tables →

How a Neuron Works

Step-by-step animation of a single neuron's computation

Learn the Basics →

Neural Network Explainer

Interactive visualization of how neural networks learn

Open Visualization →

DQN Techniques Reference

Complete guide to all DQN improvements with interactive visuals

Explore Techniques →

Reading Learning Graphs

How to interpret training curves and diagnose performance

Interpret Results →

Experiment Results

View all your training graphs and data for analysis

Open Results Gallery →

DQN Paper (2015)

The original DeepMind paper that started it all

Nature Article →

Session Agenda

6:30 - 6:50

The Scaling Problem: Why Q-Tables Break

6:50 - 7:20

Neural Networks 101: Forward Pass & Learning

7:20 - 7:50

From Q-Table to Q-Network: The Translation

7:50 - 8:30

Build Block: Implement DQN Agent

8:30 - 9:00

DQN Improvements: Replay & Target Network

9:00 - 9:30

Compare Results + Wrap-up