Back to W23D4 Hub

Client Data Distributions

Toggle between IID (Independent and Identically Distributed) and Non-IID modes to see how data is partitioned across 6 federated clients. Each client represents a user with their own local dataset of movie ratings.

IID Distribution Non-IID Distribution
IID MODE

Global Dataset Distribution (Reference)

Convergence Impact

Compare how IID and Non-IID data distributions affect the global model's convergence during federated training. Non-IID data leads to client drift, causing oscillations and slower improvement in global model performance.

IID Training (smooth convergence)
Non-IID Training (oscillating)
IID Convergence: When data is uniformly distributed, each client's local update approximates the global gradient well. Aggregation produces consistent improvement with minimal variance between rounds.
Non-IID Challenge: Heterogeneous data causes each client to optimize toward its own local distribution. When aggregated, these conflicting updates create oscillation and "client drift," slowing convergence significantly.

Key Concepts

Understanding the theoretical foundations and practical implications of data distribution in federated learning.

What Does IID Mean?

Independent and Identically Distributed means each client's data is drawn from the same underlying distribution, and samples are independent of each other. In federated learning, IID data means every client has a representative subset of the full dataset -- all genres, demographics, and patterns appear proportionally on each device.

Why Real-World Data Is Non-IID

In practice, user data is almost never IID. People have unique preferences, behaviors, and contexts. A horror fan's watch history looks nothing like a rom-com enthusiast's. Geographic, demographic, and temporal factors all create systematic differences between clients, making non-IID the default condition for federated systems.

Client Drift Problem

When a client trains on skewed local data, its model parameters "drift" away from the optimal global model. After several local SGD steps, the client's model becomes specialized for its own distribution. Aggregating divergent models produces a global model that may perform poorly for all clients, not just some.

Label Distribution Skew

The most common form of non-IID in recommendation systems is label distribution skew, where different clients have different proportions of each class/genre. Other forms include feature skew (same label, different features), quantity skew (varying dataset sizes), and concept drift (distributions changing over time).

Strategies to Handle Non-IID Data

FedProx Adds a proximal regularization term to the local objective, penalizing large deviations from the global model. This constrains client drift by keeping local updates close to the global parameters. The strength is controlled by a hyperparameter mu.
Client Weighting Instead of simple averaging during aggregation, weight each client's contribution based on dataset size, data quality, or distribution similarity. Clients with more representative data can be given higher influence on the global model.
Data Sharing Share a small subset of globally representative data with all clients to anchor local distributions. Even a modest shared dataset (1-5% of total data) can significantly improve convergence by reducing the effective heterogeneity.
SCAFFOLD Uses control variates to correct for client drift during local updates. Each client maintains a correction term that estimates and compensates for the difference between local and global gradients, enabling faster and more stable convergence.

Connection to MovieLens

The MovieLens dataset is a natural example of non-IID federated data. When we treat each user as a federated client, the inherent non-IID nature becomes clear:

  • Taste Profiles: User A loves sci-fi and rates 80% sci-fi films. User B is a drama enthusiast with 70% drama ratings. Their local data distributions are fundamentally different.
  • Rating Patterns: Some users are generous raters (mean rating 4.2) while others are critical (mean rating 2.8). This creates feature distribution skew even within the same genre.
  • Activity Levels: Power users may have 500+ ratings while casual users have fewer than 20, creating significant quantity skew across clients.
  • Temporal Patterns: Users who joined in 2010 have different movie pools than users from 2020, introducing temporal concept drift.