Skip to content

联邦学习模块化框架,支持各类FL。A universal federated learning framework, free to switch thread and process modes

License

Notifications You must be signed in to change notification settings

NUAA-SmartSensing/FedModule

Repository files navigation

GitHub code size license python torch

This document is also available in: 中文 | English

keywords: federated-learning, asynchronous, synchronous, semi-asynchronous, personalized

Table of Contents

Brief

One code adapts to multiple operating modes: thread, process, timeslice, distributed.

One-click start; change the experimental environment without modifying the code.

Support random seeds for reproducible experiments.

Redesigned the FL framework to be module with high extensibility, supporting various mainstream federated learning paradigms: synchronous, asynchronous, semi-asynchronous, personalized, etc.

With wandb, synchronize experimental data to the cloud, avoiding data loss.

For more project information, please see the wiki.

Requirements

python3.8 + pytorch + linux

It has been validated on macOS.

It supports single GPU and Multi-GPU.

Getting Started

Environment

Install dependencies on an existing python environment using pip install -r requirements.txt

or

Create a new python environment using conda:

conda env create -f environment.yml

Experiments

You can run python main.py (the main file in the fl directory) directly. The program will automatically read the config.json file in the root directory and store the results in the specified path under results, along with the configuration file.

You can also specify the configuration file by python main.py ../../config.json. Please note that the path of config.json is relative to the main.py.

The config folder in the root directory provides some algorithm configuration files proposed in papers. The following algorithm implementations are currently available:

Centralized Learning
FedAvg
FedAsync
FedProx
FedAT
FedLC
FedDL
M-Step AsyncFL
FedBuff
FedAdam
FedNova
FedBN
TWAFL

more methods to refer to the wiki

Docker

Now you can directly pull and run a Docker image, the command is as follows:

docker pull desperadoccy/async-fl
docker run -it async-fl config/FedAvg-config.json

Similarly, it supports passing a config file path as a parameter. You can also build the Docker image yourself.

cd docker
docker build -t async-fl .
docker run -it async-fl config/FedAvg-config.json 

Features

  • Asynchronous Federated Learning
  • Support model and dataset replacement
  • Support scheduling algorithm replacement
  • Support aggregation algorithm replacement
  • Support loss function replacement
  • Support client replacement
  • Synchronous federated learning
  • Semi-asynchronous federated learning
  • Provide test loss information
  • Custom label heterogeneity
  • Custom data heterogeneity
  • Support Dirichlet distribution
  • wandb visualization
  • Support for multiple GPUs
  • Docker deployment
  • Process thread switching

Add new methods

Please refer to the wiki

Existing Bugs

Currently, there is a core issue in the framework that the communication between clients and servers is implemented using the multiprocessing queues. However, when a CUDA tensor is received by the queue and retrieved by other threads, it can cause a memory leak and may cause the program to crash.

This bug is caused by PyTorch and the multiprocessing queue, and the current solution is to upload non-CUDA tensors to the queue and convert them to CUDA tensors during aggregation. Therefore, when adding aggregation algorithms, the following code will be needed:

updated_parameters = {}
for key, var in client_weights.items():
    updated_parameters[key] = var.clone()
    if torch.cuda.is_available():
        updated_parameters[key] = updated_parameters[key].cuda()

Contributors

desperadoccy
Desperadoccy
jzj007
Jzj007
cauchyguo
Cauchy

Citation

Please cite our paper in your publications if this code helps your research.

@misc{chen2024fedmodulemodularfederatedlearning,
      title={FedModule: A Modular Federated Learning Framework}, 
      author={Chuyi Chen and Zhe Zhang and Yanchao Zhao},
      year={2024},
      eprint={2409.04849},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2409.04849}, 
}

Contact us

We created a QQ group to discuss the asyncFL framework and FL, welcome everyone to join~~

Here is the group number:

895896624

group_number

QQ: 527707607

email: [email protected]

Welcome to provide suggestions for the project~

if you'd like contribute to this project, please contact us.