FedPilot Platform Logo

FedPilot: A Topology-Aware Platform for Federated Learning

Welcome to the FedPilot documentation hub. FedPilot is a Ray-backed platform for topology-aware federated learning and distributed system research.

Unlike traditional FL frameworks that hide low-level scheduling behind FL-specific abstractions, FedPilot’s central design principle is to decouple logical federation design from physical execution. Researchers define schemas, virtual nodes, topology descriptors, and adaptation policies before the system materializes actors and placement groups.


The Four Core Contributions

As outlined in the architectural design of the platform, FedPilot contributes four key systems-level innovations:

  1. Layered Architecture: A clean separation of Schema, Core, Communication, Infrastructure, and Observability layers.
  2. First-Class Systems Abstractions: Lazy virtual-node materialization, topology-aware message routing, and the Inter-Cluster Ray Fabric (ICRF) as a core infrastructure primitive — not an optional add-on.
  3. Data-Driven Topology Adaptation: Data-driven clustering based on real label distributions drives ICRF placement decisions, unifying convergence improvement and horizontal scaling in one mechanism.
  4. Grounded Observability: A side-channel telemetry stack (OpenTelemetry, Prometheus, Grafana, Streamlit) that treats resource pressure and network I/O as critical experimental artifacts, not post-hoc add-ons.

The Inter-Cluster Ray Fabric (ICRF)

The ICRF is the spine of FedPilot’s multi-cluster capability. It is a hybrid communication layer that maintains a single logical federation graph while automatically routing messages through:

  • Ray shared memory — for nodes co-located on the same physical cluster (intra-cluster).
  • HTTP via Ray Serve gateways — for nodes spanning separate physical clusters (inter-cluster).

Every part of the platform is aware of the ICRF: clustering determines its wiring, the HybridAdjacencyMatrix encodes its routing table, and the HybridTopologyManager enforces it at runtime.

→ Read the deep-dive: Inter-Cluster Ray Fabric (ICRF)


Documentation Layers

To make the platform understandable, the architecture has been broken down into its operational layers. Choose a layer to dive into the technical details:

1. Entry & Configuration

Everything starts with how you boot up the framework.

2. Orchestration & Infrastructure

How the framework scales across physical hardware using Ray.

3. Schemas & Applications

How the federated paradigms are defined and mapped to execution engines.

4. Federated Core & Communication

The mathematical heart of the framework and the Inter-Cluster Ray Fabric.

5. Tool Registries

How to inject custom logic without editing core files. The platform has four decorator-based plugin registries:

6. Security & Privacy

Protecting distributed data from inference attacks.

7. Dashboards & Telemetry

Deep visibility into your experiments and production networks.