FedAvg Round Visualization Explainer

⚙ Animated FedAvg Round

Server

Selected Client

Inactive Client

Weight Transfer

Round 0 / 10 Idle

Speed: 1x

Global Model AUC Over Rounds

⚖ Weighted Averaging Demo

w_global = Σ (n_k / n_total) × w_k

The global model is a weighted average of client models, where each client's contribution is proportional to its dataset size.

Client Sample Counts

Client A 500

Client B 1200

Client C 300

Client D 800

Proportional Weights

Numerical Example

Given client model weights: w_A=0.3, w_B=-0.1, w_C=0.5, w_D=0.2

📖 Algorithm Reference

FedAvg Pseudocode

// Server executes:

function ServerUpdate():

initialize w_0

for each round t = 1, 2, ... do

S_t ← random subset of C × K clients // select fraction C of K clients

for each client k ∈ S_t in parallel do

w_t_k ← ClientUpdate(k, w_t)

w_{t+1} ← Σ_k (n_k / n) × w_t_k // weighted average

// Client k executes:

function ClientUpdate(k, w):

B ← split local data into batches of size B

for each local epoch e = 1 to E do

for each batch b ∈ B do

w ← w - η × ∇L(w; b) // SGD step

return w

Key Parameters

K (Total Clients)

Total number of clients available in the federation. Typically ranges from tens to millions.

C (Client Fraction)

Fraction of clients selected per round. C=1 means all clients participate; C=0.1 means 10% are sampled.

E (Local Epochs)

Number of local training epochs each client performs before sending updates. Higher E = more local computation.

B (Batch Size)

Mini-batch size used for local SGD. Smaller B adds more stochasticity and can improve generalization.

η (Learning Rate)

Step size for local SGD updates. May need tuning differently than centralized training.

T (Rounds)

Total number of communication rounds. Each round involves one broadcast-train-aggregate cycle.

Common Failure Modes

Non-IID Data Distribution

When client data is highly non-identically distributed (e.g., each client has different label distributions), local models diverge significantly, causing slow convergence or even divergence of the global model.

Client Drift

Too many local epochs (high E) cause client models to drift far from the global optimum, especially with heterogeneous data. This is sometimes called "weight divergence."

Straggler Effect

Synchronous FedAvg waits for the slowest client each round. Clients with limited compute or bandwidth become bottlenecks. Solutions include async aggregation or timeout-based dropping.

Communication Bottleneck

Sending full model weights each round is expensive for large models. Compression, gradient quantization, or federated distillation can help reduce communication costs.

Byzantine / Poisoning Attacks

Malicious clients can send corrupted updates to manipulate the global model. Robust aggregation methods (e.g., trimmed mean, Krum) mitigate this but add overhead.