Aggregation Strategies
The Aggregator is the mathematical core of the federated synchronization step. It dictates exactly how divergent client weights are mathematically fused into a consensus model.
FedPilot ships with two primary aggregation algorithms, both living in src/core/aggregator/.
1. FedAvg — Federated Averaging
The industry standard baseline. It performs a weighted average of all incoming model updates, where the weight for each client is proportional to the size of its local dataset.
\[w_{\text{global}} = \sum_{i=1}^{N} \frac{n_i}{N_{\text{total}}} \cdot w_i\]Where $n_i$ is the number of local samples for client $i$, and $N_{\text{total}} = \sum n_i$.
Characteristics:
- Best For: IID (Independent and Identically Distributed) data, where clients hold roughly balanced class distributions.
- Speed: Extremely fast — a single weighted sum over the parameter tensors.
- Weakness: If clients have highly heterogeneous (non-IID) data, local gradients point in completely different directions. Averaging them directly causes client drift — the global model oscillates and can fail to converge entirely.
Config:
aggregation_strategy: "FedAvg"
aggregation_sample_scaling: true # Scale by n_i / N_total (recommended)
2. FedProx — Federated Proximal
Designed specifically to combat non-IID data divergence. FedProx modifies the local training objective by adding a proximal regularization term that anchors each client to the global model:
\[\min_w \sum_{i=1}^{N} F_i(w) + \frac{\mu}{2} \| w - w_{\text{global}} \|^2\]This forces the local PyTorch optimizer to “stay close” to the global model during its local epochs, preventing aggressive overfitting to a weird local data slice.
Characteristics:
- Best For: Highly heterogeneous data (e.g.,
data_distribution_kind: "90"for extreme non-IID). - Speed: Slightly slower — the proximal penalty is computed and added to the loss at each local step.
- Hyperparameter: Requires tuning the $\mu$ (mu) proximal coefficient.
Config:
aggregation_strategy: "FedProx"
mu: 0.01 # Start here; increase for stronger regularization on severe non-IID
When to Use Which
| Scenario | Recommended Strategy |
|---|---|
| IID data, same dataset across clients | FedAvg |
Mild non-IID (distribution_kind: "20") | FedAvg |
Moderate non-IID (distribution_kind: "50") | Either — benchmark both |
Severe non-IID (distribution_kind: "90") | FedProx |
Dirichlet heterogeneity (distribution_kind: "dir") | FedProx |
| FEMNIST or Shakespeare (natural non-IID) | FedProx |
Pro tip: Enable data-driven clustering alongside
FedAvg. By grouping clients with similar distributions into ICRF communities, you reduce within-community heterogeneity enough thatFedAvgcan converge effectively — without theFedProxoverhead.
Adding Custom Aggregators
Unlike topologies and metrics, the aggregation layer does not use a decorator-based registry. There is no @register_aggregator decorator in the codebase.
To integrate a novel algorithm (e.g., FedNova, SCAFFOLD, or geometric median aggregation):
- Inherit from
AggregatorBaseand implement theaggregate()method:
from src.core.aggregator.aggregator_base import AggregatorBase
class FedNovaAggregator(AggregatorBase):
def aggregate(self, updates: list, weights: list) -> dict:
# Your normalized averaging logic here
...
- Instantiate directly inside the relevant Application or Schema executor (found in
src/applications/) — the place where the aggregator is called during each FL round. No other files need to change.
For the four registries that do use decorator-based discovery, see the Registries Layer.
See also: Inter-Cluster Ray Fabric (ICRF) for how data-driven placement reduces the non-IID problem at the infrastructure level before aggregation even runs.