FedPilot: A Topology-Aware Platform for Federated Learning
Welcome to the FedPilot documentation hub. FedPilot is a Ray-backed platform for topology-aware federated learning and distributed system research.
Unlike traditional FL frameworks that hide low-level scheduling behind FL-specific abstractions, FedPilot’s central design principle is to decouple logical federation design from physical execution. Researchers define schemas, virtual nodes, topology descriptors, and adaptation policies before the system materializes actors and placement groups.
The Four Core Contributions
As outlined in the architectural design of the platform, FedPilot contributes four key systems-level innovations:
- Layered Architecture: A clean separation of Schema, Core, Communication, Infrastructure, and Observability layers.
- First-Class Systems Abstractions: Lazy virtual-node materialization, topology-aware message routing, and the Inter-Cluster Ray Fabric (ICRF) as a core infrastructure primitive — not an optional add-on.
- Data-Driven Topology Adaptation: Data-driven clustering based on real label distributions drives ICRF placement decisions, unifying convergence improvement and horizontal scaling in one mechanism.
- Grounded Observability: A side-channel telemetry stack (OpenTelemetry, Prometheus, Grafana, Streamlit) that treats resource pressure and network I/O as critical experimental artifacts, not post-hoc add-ons.
The Inter-Cluster Ray Fabric (ICRF)
The ICRF is the spine of FedPilot’s multi-cluster capability. It is a hybrid communication layer that maintains a single logical federation graph while automatically routing messages through:
- Ray shared memory — for nodes co-located on the same physical cluster (intra-cluster).
- HTTP via Ray Serve gateways — for nodes spanning separate physical clusters (inter-cluster).
Every part of the platform is aware of the ICRF: clustering determines its wiring, the HybridAdjacencyMatrix encodes its routing table, and the HybridTopologyManager enforces it at runtime.
→ Read the deep-dive: Inter-Cluster Ray Fabric (ICRF)
Documentation Layers
To make the platform understandable, the architecture has been broken down into its operational layers. Choose a layer to dive into the technical details:
1. Entry & Configuration
Everything starts with how you boot up the framework.
2. Orchestration & Infrastructure
How the framework scales across physical hardware using Ray.
3. Schemas & Applications
How the federated paradigms are defined and mapped to execution engines.
4. Federated Core & Communication
The mathematical heart of the framework and the Inter-Cluster Ray Fabric.
- Inter-Cluster Ray Fabric (ICRF) ← Start here for multi-cluster deployments
- Federated Base
- Aggregators
- Model Compression & Chunking
- Shapley Value Analysis
5. Tool Registries
How to inject custom logic without editing core files. The platform has four decorator-based plugin registries:
6. Security & Privacy
Protecting distributed data from inference attacks.
7. Dashboards & Telemetry
Deep visibility into your experiments and production networks.