Characterization, modeling, and verbs-level emulation of long-haul RDMA with implications for federated learning.
Jan 1, 2026
Scalable federated learning via memory-efficient and concurrent aggregation.
Jan 1, 2026
SRFL targets scalable and resilient federated learning systems across heterogeneous compute and network environments. The project includes: FedDES, a discrete-event based performance simulation framework for federated learning systems. FedMECA, a memory-efficient and concurrent aggregation approach for scalable federated learning. Long-haul RDMA studies for geo-distributed federated learning, including simulation, modeling, and real-world testbed validation.
Jan 1, 2025
This project investigates long-haul RDMA for geo-distributed machine learning systems. The project includes: Characterization, modeling, and verbs-level emulation of long-haul RDMA behavior. Evaluation of whether long-haul RDMA can improve geo-distributed federated learning, including simulation and validation on a real-world testbed.
Jan 1, 2025
Discrete-event based performance simulation for federated learning systems across heterogeneous compute/network settings.
Jan 1, 2025
Communication-centric study of long-haul RDMA for geo-distributed federated learning.
Jan 1, 2025