← Back to W23D4 Hub

Federated Averaging (FedAvg) -- Round Visualization

An interactive walkthrough of how FedAvg coordinates distributed model training across multiple clients without sharing raw data.

Animated FedAvg Round

Server
Selected Client
Inactive Client
Weight Transfer
Round 0 / 10 Idle
1x

Global Model AUC Over Rounds

Weighted Averaging Demo

w_global = Σ (n_k / n_total) × w_k
The global model is a weighted average of client models, where each client's contribution is proportional to its dataset size.

Client Sample Counts

Client A 500
Client B 1200
Client C 300
Client D 800

Proportional Weights

Numerical Example

Given client model weights: w_A=0.3, w_B=-0.1, w_C=0.5, w_D=0.2

📖 Algorithm Reference

FedAvg Pseudocode

// Server executes:
function ServerUpdate():
initialize w_0
for each round t = 1, 2, ... do
S_t ← random subset of C × K clients // select fraction C of K clients
for each client kS_t in parallel do
w_t_kClientUpdate(k, w_t)
w_{t+1} ← Σk (n_k / n) × w_t_k // weighted average

// Client k executes:
function ClientUpdate(k, w):
B ← split local data into batches of size B
for each local epoch e = 1 to E do
for each batch bB do
ww - η × ∇L(w; b) // SGD step
return w

Key Parameters

K (Total Clients)
Total number of clients available in the federation. Typically ranges from tens to millions.
C (Client Fraction)
Fraction of clients selected per round. C=1 means all clients participate; C=0.1 means 10% are sampled.
E (Local Epochs)
Number of local training epochs each client performs before sending updates. Higher E = more local computation.
B (Batch Size)
Mini-batch size used for local SGD. Smaller B adds more stochasticity and can improve generalization.
η (Learning Rate)
Step size for local SGD updates. May need tuning differently than centralized training.
T (Rounds)
Total number of communication rounds. Each round involves one broadcast-train-aggregate cycle.

Common Failure Modes

Non-IID Data Distribution

When client data is highly non-identically distributed (e.g., each client has different label distributions), local models diverge significantly, causing slow convergence or even divergence of the global model.

Client Drift

Too many local epochs (high E) cause client models to drift far from the global optimum, especially with heterogeneous data. This is sometimes called "weight divergence."

Straggler Effect

Synchronous FedAvg waits for the slowest client each round. Clients with limited compute or bandwidth become bottlenecks. Solutions include async aggregation or timeout-based dropping.

Communication Bottleneck

Sending full model weights each round is expensive for large models. Compression, gradient quantization, or federated distillation can help reduce communication costs.

Byzantine / Poisoning Attacks

Malicious clients can send corrupted updates to manipulate the global model. Robust aggregation methods (e.g., trimmed mean, Krum) mitigate this but add overhead.