Scalable, Resilient Federated Learning
Jan 1, 2025
·
1 min read
SRFL targets scalable and resilient federated learning systems across heterogeneous compute and network environments. The project includes:
- FedDES, a discrete-event based performance simulation framework for federated learning systems.
- FedMECA, a memory-efficient and concurrent aggregation approach for scalable federated learning.
- Long-haul RDMA studies for geo-distributed federated learning, including simulation, modeling, and real-world testbed validation.
Related publications:
- FedDES: Discrete Event Based Performance Simulation for Federated Learning Systems
- FedMECA: Scalable Federated Learning via Memory-Efficient and Concurrent Aggregation
- When RDMA Goes Long-Haul: Characterization, Modeling, and Verbs-Level Emulation with Implications for Federated Learning
- Can Long-Haul RDMA Benefit Federated Learning?