For each experiment, we use one sweep file, e.g., fed_iid/digits.yaml
, for training and one,
e.g., fed_iid/digits_test.yaml
, for testing.
Restuls in Sec. 5.1.
Plots:
- Visualize convergence curves (Fig 1,4,13): ../ipynb/Convergence curve.ipynb
- Benchmark tables (Table 1): ../ipynb/Convergence%20curve.ipynb
- DomainNet per domain acc (Fig. 6): ../ipynb/DomainNet%20per%20domain%20acc.ipynb
- Client-wise statistics of test acc, training and communication efficiency (Fig. 5): ../ipynb/client_stat.ipynb
NOTE
- By default, we use
--no_track_stat
to disable tracking statistics in BN, except in specific ablation study. - For class-niid experiments, the test set and training set may not share the same classes. Since we
do not track BN statistics (
--no_track_stat
), the split does not affect final evaluation. To enforce consistent test classes, use--con_test_cls
during training and testing.
- Cifar10 100% or 50% train data
wandb sweep sweeps/fed_iid/cifar10.yaml
# => [02/25] https://wandb.ai/jyhong/SplitMix_release/sweeps/m8tc32eg
# @GPU8 4 width_scale, 100%
# => [03/01] https://wandb.ai/jyhong/SplitMix_release/sweeps/biuwy5xb
# @GPU9 50% data
wandb sweep sweeps/fed_iid/cifar10_test.yaml
# => [02/26] https://wandb.ai/jyhong/SplitMix_release/sweeps/d6ua8kbt
# @GPU8 4 width_scale, 100%
# => [03/02] https://wandb.ai/jyhong/SplitMix_release/sweeps/jbn34q4n
# @GPU9 50% data
- Cifar10 full set class-niid
wandb sweep sweeps/fed_class_niid/cifar10.yaml
# => [02/27] https://wandb.ai/jyhong/SplitMix_release/sweeps/dlxe994l
# @GPU8
wandb sweep sweeps/fed_class_niid/cifar10_test.yaml
# => [02/28] https://wandb.ai/jyhong/SplitMix_release/sweeps/6ua8jh9x
- Digits
wandb sweep sweeps/fed_niid/digits.yaml
# => [02/24] https://wandb.ai/jyhong/SplitMix_release/sweeps/057l05ow
# @GPU8 4 width_scale
wandb sweep sweeps/fed_niid/digits_test.yaml
# => [02/26] https://wandb.ai/jyhong/SplitMix_release/sweeps/8g8s7kp4
- DomainNet
wandb sweep sweeps/fed_niid/domainnet.yaml
# => [02/25] https://wandb.ai/jyhong/SplitMix_release/sweeps/wf20oh8r
# @GPU8 4 width_scale
wandb sweep sweeps/fed_niid/domainnet_test.yaml
# => [03/01] https://wandb.ai/jyhong/SplitMix_release/sweeps/y489wn02
- Cifar10 100% or 50% train data
wandb sweep sweeps/fed_iid/cifar10_SHeteroFL.yaml
# => [02/25] https://wandb.ai/jyhong/SplitMix_release/sweeps/35idaxc0
# @GPU8
# => [03/01] https://wandb.ai/jyhong/SplitMix_release/sweeps/l0w7i3qn
# @GPU8 record all width acc. 100%
# => [03/01] https://wandb.ai/jyhong/SplitMix_release/sweeps/8bia6u1y
# @GPU9 Use 50% training data
wandb sweep sweeps/fed_iid/cifar10_SHeteroFL_test.yaml
# => [02/26] https://wandb.ai/jyhong/SplitMix_release/sweeps/13li9grh
# @GPU8 100%
# => [03/02] https://wandb.ai/jyhong/SplitMix_release/sweeps/6bbo3mwi
# @GPU9 50%
- Cifar10 full set, 10%val, class nidd
wandb sweep sweeps/fed_class_niid/cifar10_SHeteroFL.yaml
# => [02/26] https://wandb.ai/jyhong/SplitMix_release/sweeps/i2bp6jtm
# @GPU8
# => [03/09] https://wandb.ai/jyhong/SplitMix_release/sweeps/54zqzzno
# @GPU8 to record all width performance
wandb sweep sweeps/fed_class_niid/cifar10_SHeteroFL_test.yaml
# => [02/26] https://wandb.ai/jyhong/SplitMix_release/sweeps/fvg0045z
- Digits
wandb sweep sweeps/fed_niid/digits_SHeteroFL.yaml
# => [02/25] https://wandb.ai/jyhong/SplitMix_release/sweeps/zujysna2
# @GPU8
# => [03/01] https://wandb.ai/jyhong/SplitMix_release/sweeps/ufwuoldc
# @GPU8 record all-width val acc
# => [03/03] https://wandb.ai/jyhong/SplitMix_release/sweeps/wdu8cof1
# @GPU8 repeat 3 times
wandb sweep sweeps/fed_niid/digits_SHeteroFL_test.yaml
# => [02/25] https://wandb.ai/jyhong/SplitMix_release/sweeps/0lh7d73x
- DomainNet (From group_size)
wandb sweep sweeps/fed_niid/domainnet_SHeteroFL.yaml
# => [02/25] https://wandb.ai/jyhong/SplitMix_release/sweeps/cs6y3ir0
# @GPU8
# => [03/01] https://wandb.ai/jyhong/SplitMix_release/sweeps/dqfo7crn
# @GPU8 extend to 1k iters and record all width val_acc
wandb sweep sweeps/fed_niid/domainnet_SHeteroFL_test.yaml
# => [02/28] https://wandb.ai/jyhong/SplitMix_release/sweeps/shs7yw8p
We shift the atom model in budget-sufficient users until reach the max width.
- Cifar10 full set
wandb sweep sweeps/fed_iid/cifar10_SplitMix.yaml
# => [03/03] https://wandb.ai/jyhong/SplitMix_release/sweeps/a7oujc4r
# @GPU8 try lr_sch = multi_step100, 100%
# => [03/02] https://wandb.ai/jyhong/SplitMix_release/sweeps/yypftd0t
# @GPU8 try lr_sch = multi_step100, 50%
wandb sweep sweeps/fed_iid/cifar10_SplitMix_test.yaml
# => [03/02] https://wandb.ai/jyhong/SplitMix_release/sweeps/fjt4nczs
# @GPU8 multi_step50
# => [03/02] https://wandb.ai/jyhong/SplitMix_release/sweeps/y6e7r33c
# @GPU9 multi_step50, 50%
# => https://wandb.ai/jyhong/SplitMix_release/sweeps/32jld004
# @GPU9 multi_step100, 50%
# => [03/07] https://wandb.ai/jyhong/SplitMix_release/sweeps/u0g831ex
# @GPU8 multi_step100, 100%
- Cifar10 class niid
wandb sweep sweeps/fed_class_niid/cifar10_SplitMix.yaml
# => [02/26] https://wandb.ai/jyhong/SplitMix_release/sweeps/80p3ha4u
# @GPU8 ens_preresnet18
# => [03/09] https://wandb.ai/jyhong/SplitMix_release/sweeps/np49m5hp
# @GPU8 to record all
wandb sweep sweeps/fed_class_niid/cifar10_SplitMix_test.yaml
# => [02/28] https://wandb.ai/jyhong/SplitMix_release/sweeps/g71nb2yv
# @GPU8
- Digits
wandb sweep sweeps/fed_niid/digits_SplitMix.yaml
# => [03/03] https://wandb.ai/jyhong/SplitMix_release/sweeps/9ecf1cy1
# @GPU8 repeat 3 times
# ens_digit, seed=1 is runned in ablation study.
wandb sweep sweeps/fed_niid/digits_SplitMix_test.yaml
# => [02/28] https://wandb.ai/jyhong/SplitMix_release/sweeps/3wr7bsxb
- DomainNet
wandb sweep sweeps/fed_niid/domainnet_SplitMix.yaml
# => [02/23] https://wandb.ai/jyhong/SplitMix_release/sweeps/naglzvcl
# @GPU8 re-run with new ens_net
wandb sweep sweeps/fed_niid/domainnet_SplitMix_test.yaml
# => [02/28] https://wandb.ai/jyhong/SplitMix_release/sweeps/2kxrau5h
Results in Sec. A.3.
Plots:
- Ablation table (Table 4): ../ipynb/Ablation study.ipynb
Explanation of BN types in the table:
- batch average: Use
--no_track_stat
. - post average: Use
--no_track_stat
and use--test_refresh_bn
.--test_refresh_bn
is only needed in test yaml file. - tracked: Remove
--no_track_stat
- locally tracked: Remove
--no_track_stat
and use--lbn=true
You can also change --rescale_init
and --rescale_layer
to be true
or false
.
- Digits
wandb sweep sweeps/ablation/digits_SplitMix.yaml
# => [02/23] https://wandb.ai/jyhong/SplitMix_release/sweeps/ybief82d
# @GPU8 All-in-one ablation
# => [03/06] https://wandb.ai/jyhong/SplitMix_release/sweeps/3oe7oq7t
# @GPU8 All-in-one ablation + track bn stat
# => [03/07] https://wandb.ai/jyhong/SplitMix_release/sweeps/accu8vm4
# @GPU10 All-in-one ablation + track bn stat + LBN
wandb sweep sweeps/ablation/digits_SplitMix_test.yaml
# => [02/23] https://wandb.ai/jyhong/SplitMix_release/sweeps/cpzoxxq9
# @GPU8 All-in-one ablation
# => [04/07] https://wandb.ai/jyhong/SplitMix_release/sweeps/tr78ctgv
# @GPU8 All-in-one ablation + track bn stat
# => [04/07] https://wandb.ai/jyhong/SplitMix_release/sweeps/2di8eygl
# @GPU8 All-in-one ablation + refresh BN before test
# => [04/07] https://wandb.ai/jyhong/SplitMix_release/sweeps/x11krsq4
# @GPU8 All-in-one ablation + track bn stat + LBN
Results in Sec. A.5. Vary the users per round (pr_nuser
).
Plots:
- Budget distribution and per-width performance comparisons (Fig 11): ../ipynb/Digits vary budget distribution.ipynb
FedAvg: As FedAvg are individually trained for each width, there is no need to test with varying budgets.
- Skewed groups: We let more groups to be very sufficient or insufficient.
- Non-exp distribution: For example, step increase by 0.25
SplitMix:
- Digits
wandb sweep sweeps/vary_budget_dist/digits_SplitMix.yaml
# => [03/02] https://wandb.ai/jyhong/SplitMix_release/sweeps/0bzucbfp
# @GPu10
# => [03/07] https://wandb.ai/jyhong/SplitMix_release/sweeps/wrp3hoh9
# @GPu10 slim_ratios=ln0.5_0.4
wandb sweep sweeps/vary_budget_dist/digits_SplitMix_test.yaml
# => [03/03] https://wandb.ai/jyhong/SplitMix_release/sweeps/8g0irs68
# @GPU10
# => [03/09] https://wandb.ai/jyhong/SplitMix_release/sweeps/wz10puq8
# @GPu10 slim_ratios=ln0.5_0.4
SHeteroFL:
- Digits
wandb sweep sweeps/vary_budget_dist/digits_SHeteroFL.yaml
# => [03/02] https://wandb.ai/jyhong/SplitMix_release/sweeps/aym3td6d
# @GPU10
# => [03/07] https://wandb.ai/jyhong/SplitMix_release/sweeps/f87p0v6r
# @GPU10 slim_ratios=ln0.5_0.4
wandb sweep sweeps/vary_budget_dist/digits_SHeteroFL_test.yaml
# => [03/03] https://wandb.ai/jyhong/SplitMix_release/sweeps/jbak4jzs
# @GPU10
# => [03/09] https://wandb.ai/jyhong/SplitMix_release/sweeps/a36ramy7
# @GPU10 slim_ratios=ln0.5_0.4
Results in Sec. A.6. Vary the users per round (pr_nuser
).
Plots:
- Budget distribution and per-width performance comparisons (Fig 11): ../ipynb/Digits vary budget distribution.ipynb
SplitMix:
- Digits
wandb sweep sweeps/vary_pr_nuser/digits_SplitMix.yaml
# => [03/02] https://wandb.ai/jyhong/SplitMix_release/sweeps/cw23nwuv
# @GPU8
# => [03/03] https://wandb.ai/jyhong/SplitMix_release/sweeps/9ecf1cy1
# @GPU8 pr_nuser=-1
wandb sweep sweeps/vary_pr_nuser/digits_SplitMix_test.yaml
# => [03/09] https://wandb.ai/jyhong/SplitMix_release/sweeps/80ewd3yq
# @GPU8
SHeteroFL:
- Digits
wandb sweep sweeps/vary_pr_nuser/digits_SHeteroFL.yaml
# => [03/02] https://wandb.ai/jyhong/SplitMix_release/sweeps/hpij2kgi
# @GPU8
# => [03/03] https://wandb.ai/jyhong/SplitMix_release/sweeps/wdu8cof1
# @GPU8 repeat 3 times, pr_nuser=-1
wandb sweep sweeps/vary_pr_nuser/digits_SHeteroFL_test.yaml
# => [03/03] https://wandb.ai/jyhong/SplitMix_release/sweeps/la52g84k
# @GPU8 missing pr_nuser=-1 (seed>1)
# => [03/09] https://wandb.ai/jyhong/SplitMix_release/sweeps/0qjd6qdr
# @GPU8