An interactive walkthrough of how FedAvg coordinates distributed model training across multiple clients without sharing raw data.
Given client model weights: w_A=0.3, w_B=-0.1, w_C=0.5, w_D=0.2
When client data is highly non-identically distributed (e.g., each client has different label distributions), local models diverge significantly, causing slow convergence or even divergence of the global model.
Too many local epochs (high E) cause client models to drift far from the global optimum, especially with heterogeneous data. This is sometimes called "weight divergence."
Synchronous FedAvg waits for the slowest client each round. Clients with limited compute or bandwidth become bottlenecks. Solutions include async aggregation or timeout-based dropping.
Sending full model weights each round is expensive for large models. Compression, gradient quantization, or federated distillation can help reduce communication costs.
Malicious clients can send corrupted updates to manipulate the global model. Robust aggregation methods (e.g., trimmed mean, Krum) mitigate this but add overhead.