|
| 1 | +.. _dataset-partition: |
| 2 | + |
| 3 | +************************************* |
| 4 | +Federated Dataset and DataPartitioner |
| 5 | +************************************* |
| 6 | + |
| 7 | +Sophisticated in real world, FL need to handle various kind of data distribution scenarios, including |
| 8 | +iid and non-iid scenarios. Though there already exists some datasets and partition schemes for published data benchmark, |
| 9 | +it still can be very messy and hard for researchers to partition datasets according to their specific |
| 10 | +research problems, and maintain partition results during simulation. FedLab provides :class:`fedlab.utils.dataset.partition.DataPartitioner` that allows you to use pre-partitioned datasets as well as your own data. :class:`DataPartitioner` stores sample indices for each client given a data partition scheme. Also, FedLab provides some extra datasets that are used in current FL researches while not provided by official Pytorch :class:`torchvision.datasets` yet. |
| 11 | + |
| 12 | +.. note:: |
| 13 | + |
| 14 | + Current implementation and design of this part are based on LEAF :cite:p:`caldas2018leaf`, :cite:t:`acar2020federated`, :cite:t:`yurochkin2019bayesian` and NIID-Bench :cite:p:`li2021federated`. |
| 15 | + |
| 16 | +Vision Data |
| 17 | +=========== |
| 18 | + |
| 19 | +CIFAR10 |
| 20 | +^^^^^^^ |
| 21 | + |
| 22 | +FedLab provides a number of pre-defined partition schemes for some datasets (such as CIFAR10) that subclass :class:`fedlab.utils.dataset.partition.DataPartitioner` and implement functions specific to particular partition scheme. They can be used to prototype and benchmark your FL algorithms. |
| 23 | + |
| 24 | +Tutorial for :class:`CIFAR10Partitioner`: :ref:`CIFAR10 tutorial <data-cifar10>`. |
| 25 | + |
| 26 | + |
| 27 | +CIFAR100 |
| 28 | +^^^^^^^^ |
| 29 | + |
| 30 | +Notebook tutorial for :class:`CIFAR100Partitioner`: `CIFAR100 tutorial <https://github.com/SMILELab-FL/FedLab-benchmarks/blob/master/fedlab_benchmarks/datasets/cifar100/data_partitioner.ipynb>`_. |
| 31 | + |
| 32 | + |
| 33 | + |
| 34 | +FMNIST |
| 35 | +^^^^^^ |
| 36 | + |
| 37 | +Notebook tutorial for data partition of FMNIST (FashionMNIST) : `FMNIST tutorial <https://github.com/SMILELab-FL/FedLab-benchmarks/blob/master/fedlab_benchmarks/datasets/fmnist/fmnist_tutorial.ipynb>`_. |
| 38 | + |
| 39 | + |
| 40 | +MNIST |
| 41 | +^^^^^ |
| 42 | + |
| 43 | +MNIST is very similar with FMNIST, please check `FMNIST tutorial <https://github.com/SMILELab-FL/FedLab-benchmarks/blob/master/fedlab_benchmarks/datasets/fmnist/fmnist_tutorial.ipynb>`_. |
| 44 | + |
| 45 | + |
| 46 | +CelebA |
| 47 | +^^^^^^ |
| 48 | + |
| 49 | +Data partition for CelebA: `CelebA tutorial <https://github.com/SMILELab-FL/FedLab-benchmarks/tree/master/fedlab_benchmarks/datasets/celeba>`_. |
| 50 | + |
| 51 | + |
| 52 | + |
| 53 | +FEMNIST |
| 54 | +^^^^^^^ |
| 55 | + |
| 56 | +Data partition of FEMNIST: `FEMNIST tutorial <https://github.com/SMILELab-FL/FedLab-benchmarks/tree/master/fedlab_benchmarks/datasets/femnist>`_. |
| 57 | + |
| 58 | + |
| 59 | +Text Data |
| 60 | +========= |
| 61 | + |
| 62 | +Shakespeare |
| 63 | +^^^^^^^^^^^ |
| 64 | + |
| 65 | +Data partition of Shakespeare dataset: `Shakespeare tutorial <https://github.com/SMILELab-FL/FedLab-benchmarks/tree/master/fedlab_benchmarks/datasets/shakespeare>`_. |
| 66 | + |
| 67 | + |
| 68 | +Sent140 |
| 69 | +^^^^^^^ |
| 70 | + |
| 71 | +Data partition of Sent140: `Sent140 tutorial <https://github.com/SMILELab-FL/FedLab-benchmarks/tree/master/fedlab_benchmarks/datasets/sent140>`_. |
| 72 | + |
| 73 | +Reddit |
| 74 | +^^^^^^ |
| 75 | +Data partition of Reddit: `Reddit tutorial <https://github.com/SMILELab-FL/FedLab-benchmarks/tree/master/fedlab_benchmarks/datasets/reddit>`_. |
| 76 | + |
| 77 | + |
| 78 | +Tabular Data |
| 79 | +============ |
| 80 | + |
| 81 | +Adult |
| 82 | +^^^^^ |
| 83 | + |
| 84 | +Adult is from `LIBSVM Data <https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html>`_. Its original source is from `UCI <http://archive.ics.uci.edu/ml/index.php>`_/Adult. FedLab provides both ``Dataset`` and :class:`DataPartitioner` for Adult. Notebook tutorial for Adult: `Adult tutorial <https://github.com/SMILELab-FL/FedLab-benchmarks/blob/master/fedlab_benchmarks/datasets/adult/adult_tutorial.ipynb>`_. |
| 85 | + |
| 86 | + |
| 87 | +Covtype |
| 88 | +^^^^^^^ |
| 89 | + |
| 90 | +Covtype is from `LIBSVM Data <https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html>`_. Its original source is from `UCI <http://archive.ics.uci.edu/ml/index.php>`_/Covtype. FedLab provides both ``Dataset`` and :class:`DataPartitioner` for Covtype. Notebook tutorial for Covtype: `Covtype tutorial <https://github.com/SMILELab-FL/FedLab-benchmarks/blob/master/fedlab_benchmarks/datasets/covtype/covtype_tutorial.ipynb>`_. |
| 91 | + |
| 92 | + |
| 93 | +RCV1 |
| 94 | +^^^^ |
| 95 | + |
| 96 | +RCV1 is from `LIBSVM Data <https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html>`_. Its original source is from `UCI <http://archive.ics.uci.edu/ml/index.php>`_/RCV1. FedLab provides both ``Dataset`` and :class:`DataPartitioner` for RCV1. Notebook tutorial for RCV1: `RCV1 tutorial <https://github.com/SMILELab-FL/FedLab-benchmarks/blob/master/fedlab_benchmarks/datasets/rcv1/rcv1_tutorial.ipynb>`_. |
| 97 | + |
| 98 | + |
| 99 | +Synthetic Data |
| 100 | +============== |
| 101 | + |
| 102 | +FCUBE |
| 103 | +^^^^^ |
| 104 | + |
| 105 | +FCUBE is a synthetic dataset for federated learning. FedLab provides both ``Dataset`` and :class:`DataPartitioner` for FCUBE. Tutorial for FCUBE: :ref:`FCUBE tutorial <fcube-tutorial>`. |
| 106 | + |
| 107 | + |
| 108 | +LEAF-Synthetic |
| 109 | +^^^^^^^^^^^^^^ |
| 110 | + |
| 111 | +LEAF-Synthetic is a federated dataset proposed by LEAF. Client number, class number and feature dimensions can all be customized by user. |
| 112 | + |
| 113 | +Please check `LEAF-Synthetic <https://github.com/SMILELab-FL/FedLab-benchmarks/tree/master/fedlab_benchmarks/datasets/synthetic>`_ for more details. |
0 commit comments