Zhonghao Chen

Zhonghao Chenhttps://diogeneschen.github.io/Zhonghao ChenHugo Blox Builder (https://hugoblox.com)en-usMon, 24 Oct 2022 00:00:00 +0000https://diogeneschen.github.io/media/icon_hu18301096222111465208.pngZhonghao Chenhttps://diogeneschen.github.io/FedMECA: Scalable Federated Learning via Memory-Efficient and Concurrent Aggregationhttps://diogeneschen.github.io/publication/fedmeca-cais-2026/Thu, 01 Jan 2026 00:00:00 +0000https://diogeneschen.github.io/publication/fedmeca-cais-2026/<p>FedMECA improves federated learning scalability by making aggregation more memory-efficient and concurrent, targeting complex FL workflows with large model and client counts.</p>SPARe: Stacked Parallelism with Adaptive Reordering for Fault-Tolerant LLM Pretraining Systems with 100k+ GPUshttps://diogeneschen.github.io/publication/spare-icml-2026/Thu, 01 Jan 2026 00:00:00 +0000https://diogeneschen.github.io/publication/spare-icml-2026/<p>SPARe studies fault-tolerant LLM pretraining at extreme scale, combining stacked parallelism with adaptive reordering to improve resilience and efficiency for 100k+ GPU systems.</p>When RDMA Goes Long-Haul: Characterization, Modeling, and Verbs-Level Emulation with Implications for Federated Learninghttps://diogeneschen.github.io/publication/long-haul-rdma-hpdc-2026/Thu, 01 Jan 2026 00:00:00 +0000https://diogeneschen.github.io/publication/long-haul-rdma-hpdc-2026/<p>This work characterizes long-haul RDMA behavior, develops modeling and verbs-level emulation support, and studies the implications for geo-distributed federated learning systems.</p>Building form optimization for renewable energy-economic utility of flexible solar cells as building integrated photovoltaicshttps://diogeneschen.github.io/publication/flexible-solar-cells-scs-2025/Wed, 01 Jan 2025 00:00:00 +0000https://diogeneschen.github.io/publication/flexible-solar-cells-scs-2025/<p>Journal article on optimizing building form for renewable energy-economic utility of flexible solar cells as building integrated photovoltaics.</p>Can Long-Haul RDMA Benefit Federated Learning?https://diogeneschen.github.io/publication/long-haul-rdma-fl-sc25/Wed, 01 Jan 2025 00:00:00 +0000https://diogeneschen.github.io/publication/long-haul-rdma-fl-sc25/<p>This work studies the potential of long-haul RDMA for federated learning workloads and compares RDMA and TCP/IP under geo-distributed settings.</p>FedDES: Discrete Event Based Performance Simulation for Federated Learning Systemshttps://diogeneschen.github.io/publication/feddes-sec25/Wed, 01 Jan 2025 00:00:00 +0000https://diogeneschen.github.io/publication/feddes-sec25/<p>FedDES models federated learning training, communication, and aggregation as lightweight events, enabling systematic performance analysis of complex FL workflows under diverse networking conditions.</p>Geometry and Material Criteria for Low-Carbon Design of I/H-Beams in Sustainable Steel Structures Considering Both Mechanical Properties and Carbon Emissionshttps://diogeneschen.github.io/publication/ih-beams-materials-2025/Wed, 01 Jan 2025 00:00:00 +0000https://diogeneschen.github.io/publication/ih-beams-materials-2025/<p>Journal article on geometry and material criteria for low-carbon design of I/H-beams in sustainable steel structures.</p>HPC-AI Convergencehttps://diogeneschen.github.io/project/hpc-ai/Wed, 01 Jan 2025 00:00:00 +0000https://diogeneschen.github.io/project/hpc-ai/<p>This project targets HPC-AI convergence for efficient large-scale machine learning, including scheduling, optimization, characterization, and fault-tolerant training systems. The project includes:</p> <ul> <li>HPC-R1, a characterization of inference and distillation performance for large reasoning models on HPC-scale GPU clusters and interconnects.</li> <li>SPARe, a fault-tolerant LLM pretraining system for 100k+ GPU scale using stacked parallelism and adaptive reordering.</li> </ul> <p>Related publications:</p> <ul> <li><a href="https://diogeneschen.github.io/publication/hpc-r1-sc25/">HPC-R1: Characterizing R1-like Large Reasoning Models on HPC</a></li> <li><a href="https://diogeneschen.github.io/publication/spare-icml-2026/">SPARe: Stacked Parallelism with Adaptive Reordering for Fault-Tolerant LLM Pretraining Systems with 100k+ GPUs</a></li> </ul>HPC-R1: Characterizing R1-like Large Reasoning Models on HPChttps://diogeneschen.github.io/publication/hpc-r1-sc25/Wed, 01 Jan 2025 00:00:00 +0000https://diogeneschen.github.io/publication/hpc-r1-sc25/<p>HPC-R1 characterizes inference and distillation performance of R1-like reasoning models on HPC platforms, identifying system bottlenecks and scalable deployment strategies.</p>Long-Haul RDMAhttps://diogeneschen.github.io/project/long-haul-rdma/Wed, 01 Jan 2025 00:00:00 +0000https://diogeneschen.github.io/project/long-haul-rdma/<p>This project investigates long-haul RDMA for geo-distributed machine learning systems. The project includes:</p> <ul> <li>Characterization, modeling, and verbs-level emulation of long-haul RDMA behavior.</li> <li>Evaluation of whether long-haul RDMA can improve geo-distributed federated learning, including simulation and validation on a real-world testbed.</li> </ul> <p>Related publications:</p> <ul> <li><a href="https://diogeneschen.github.io/publication/long-haul-rdma-fl-sc25/">Can Long-Haul RDMA Benefit Federated Learning?</a></li> <li><a href="https://diogeneschen.github.io/publication/long-haul-rdma-hpdc-2026/">When RDMA Goes Long-Haul: Characterization, Modeling, and Verbs-Level Emulation with Implications for Federated Learning</a></li> </ul>Scalable, Resilient Federated Learninghttps://diogeneschen.github.io/project/srfl/Wed, 01 Jan 2025 00:00:00 +0000https://diogeneschen.github.io/project/srfl/<p>SRFL targets scalable and resilient federated learning systems across heterogeneous compute and network environments. The project includes:</p> <ul> <li>FedDES, a discrete-event based performance simulation framework for federated learning systems.</li> <li>FedMECA, a memory-efficient and concurrent aggregation approach for scalable federated learning.</li> <li>Long-haul RDMA studies for geo-distributed federated learning, including simulation, modeling, and real-world testbed validation.</li> </ul> <p>Related publications:</p> <ul> <li><a href="https://diogeneschen.github.io/publication/feddes-sec25/">FedDES: Discrete Event Based Performance Simulation for Federated Learning Systems</a></li> <li><a href="https://diogeneschen.github.io/publication/fedmeca-cais-2026/">FedMECA: Scalable Federated Learning via Memory-Efficient and Concurrent Aggregation</a></li> <li><a href="https://diogeneschen.github.io/publication/long-haul-rdma-hpdc-2026/">When RDMA Goes Long-Haul: Characterization, Modeling, and Verbs-Level Emulation with Implications for Federated Learning</a></li> <li><a href="https://diogeneschen.github.io/publication/long-haul-rdma-fl-sc25/">Can Long-Haul RDMA Benefit Federated Learning?</a></li> </ul>Projectshttps://diogeneschen.github.io/projects/Sun, 19 May 2024 00:00:00 +0000https://diogeneschen.github.io/projects/Risk and Energy Based Optimization for Fire Monitoring System in Utility Tunnel Using Cellular Automatahttps://diogeneschen.github.io/publication/fire-monitoring-utility-tunnel-sustainability-2024/Mon, 01 Jan 2024 00:00:00 +0000https://diogeneschen.github.io/publication/fire-monitoring-utility-tunnel-sustainability-2024/<p>Journal article on risk and energy based optimization for fire monitoring in utility tunnels using cellular automata.</p>Experiencehttps://diogeneschen.github.io/experience/Tue, 24 Oct 2023 00:00:00 +0000https://diogeneschen.github.io/experience/Deep Learning Techniques for EEG-Based BCI: Analysis and Applicationshttps://diogeneschen.github.io/publication/eeg-bci-cisp-bmei-2023/Sun, 01 Oct 2023 00:00:00 +0000https://diogeneschen.github.io/publication/eeg-bci-cisp-bmei-2023/<p>Conference paper on deep learning techniques for EEG-based brain-computer interfaces (BCI).</p>Is There Any Social Principle for LLM-Based Agents?https://diogeneschen.github.io/publication/social-principle-llm-agents-arxiv-2023/Tue, 01 Aug 2023 00:00:00 +0000https://diogeneschen.github.io/publication/social-principle-llm-agents-arxiv-2023/<p>Preprint exploring potential social principles for LLM-based agents.</p>An algorithm and system design of Deep Learning based edge-cloud scheduling for neuro-electrophysiological signalshttps://diogeneschen.github.io/publication/edge-cloud-scheduling-neuro-2023/Thu, 01 Jun 2023 00:00:00 +0000https://diogeneschen.github.io/publication/edge-cloud-scheduling-neuro-2023/<p>Technical report on algorithm and system design of deep learning based edge–cloud scheduling for neuro-electrophysiological signal workloads.</p>Edge–Cloud Scheduling for Neuro-Electrophysiological Signalshttps://diogeneschen.github.io/project/edge-cloud-scheduling/Thu, 01 Jun 2023 00:00:00 +0000https://diogeneschen.github.io/project/edge-cloud-scheduling/<p>Analysis and applications of deep learning techniques for EEG-based brain-computer interfaces (BCI). Algorithm and system design for deep learning based edge–cloud scheduling targeting neuro-electrophysiological signal workloads.</p> <p>Related publication: <a href="https://diogeneschen.github.io/publication/edge-cloud-scheduling-neuro-2023/">Deep Learning based edge–cloud scheduling for neuro-electrophysiological signals</a>.</p> <p>Related publication: <a href="https://diogeneschen.github.io/publication/eeg-bci-cisp-bmei-2023/">Deep Learning Techniques for EEG-Based BCI: Analysis and Applications</a>.</p>