UniFed-leaderboard
A Benchmark for Federated Learning Frameworks

In UniFed leaderboard, with 15 evaluation scenarios, we present both qualitative and quantitative evaluation results of 11 existing popular open-sourced FL frameworks, from the perspectives of functionality, usability, and system performance. Please find more details in our paper here.

Evaluation Scenarios

UniFed borrow 15 datasets from existing works to cover different FL settings, modalities, task types, and workload sizes.

Setting Scenario name Modality Task type Performance metrics Client number Sample number
Horizontal FL cross-device celeba Image Binary Classification (Smiling vs. Not smiling) Accuracy 894 20,028
femnist Image Multiclass Classification (62 classes) Accuracy 178 40,203
reddit Text Next-word Prediction Accuracy 813 27,738
cross-silo breast_horizontal Medical Binary Classification AUC 2 569
default_credit_horizontal Tabular Binary Classification AUC 2 22,000
give_credit_horizontal Tabular Binary Classification AUC 2 150,000
student_horizontal Tabular Regression (Grade Estimation) MSE 2 395
vehicle_scale_horizontal Image Multiclass Classification (4 classes) Accuracy 2 846
Setting Scenario name Modality Task type Performance metrics Vertical split details
Vertical
FL
breast_vertical Medical Binary Classification AUC A party: 10 features 1 label B party: 20 features
default_credit_vertical Tabular Binary Classification AUC A party: 13 features 1 label B party: 10 features
dvisits_vertical Tabular Regression (Number of consultations Estimation) MSE A party: 3 features 1 label B party: 9 features
give_credit_vertical Tabular Binary Classification AUC A party: 5 features 1 label B party: 5 features
motor_vertical Sensor data Regression (Temperature Estimation) MSE A party: 4 features 1 label B party: 7 features
student_vertical Tabular Regression (Grade Estimation) MSE A party: 6 features 1 label B party: 7 features
vehicle_scale_vertical Image Multiclass Classification (4 classes) Accuracy A party: 9 features 1 label B party: 9 features
Available Leaderboard
1
1
Leaderboard: Functionality support
Functionality All-in-one frameworks Horizontal-only frameworks Specialized frameworks
FATE FedML PaddleFL Fedlearner FederatedScope TFF Flower FLUTE FedScale CrypTen FedTree
Model support
for Horizontal FL
Regression N/A
Neural network N/A
Tree-based model N/A
Model support
for Vertical FL
Regression
Neural network
Tree-based model
Deployment support Single-host simulation
Multi-host simulation (one machine)
Multi-host simulation (multiple machines) N/A
Edge-devices deployment (one machine)
Networking Protocol Customized MPI, gRPC, TorchDistributed gRPC gRPC gRPC gRPC gRPC MPI, TorchDistributed gRPC TorchDistributed gRPC
Privacy protection
against semi-honest server
Does not require a 3rd party aggregator
Aggregator does not access to individual gradient/update N/A
Privacy protection
against semi-honest peer clients
Clients do not learn gradients from other clients N/A N/A N/A N/A N/A
Model params are partially revealed to clients N/A N/A
Privacy protection
in the final model
Private training mechanisms
Utility GPU support
Rich optimizers Customized Customized Only SGD
ML backend PyTorch, TF PyTorch PaddlePaddle TF PyTorch, TF TF, JAX PyTorch, TF, JAX PyTorch PyTorch, TF PyTorch scikit-learn

◯ indicate a claimed support for certain functionalities that are missing or cannot run in the open-source implementation.

1
1
Leaderboard: Usability feature
Functionality All-in-one frameworks Horizontal-only frameworks Specialized frameworks
FATE FedML PaddleFL Fedlearner FederatedScope TFF Flower FLUTE FedScale CrypTen FedTree
Documentation Detailed tutorial
Code example
API documentation
Engineering Native test & benchmark
Built-in ML building block CNN
Transformer
1
1
Leaderboard: FedAvg with different models on Horizontal FL
Setting Model FATE FedML PaddleFL Fedlearner FederatedScope TFF Flower FLUTE FedScale
femnist
cross-device (Accuracy ↑)
logistic_regression / 0.0443
± 0.0114
0.0656
± 0.0082
/ 0.0039
± 0.0028
0.0640
± 0.0142
0.0382
± 0.0040
0.0370
± 0.0015
0.1265
± 0.0171
mlp_128 / 0.8081
± 0.0063
0.7699
± 0.0108
/ 0.8339
± 0.0008
0.8161
± 0.0593
0.8161
± 0.0047
0.7435
± 0.0268
0.6327
± 0.0145
mlp_128_128_128 / 0.6470
± 0.0044
0.5331
± 0.0347
/ 0.6550
± 0.0057
0.6756
± 0.0072
0.6524
± 0.0106
0.6019
± 0.0084
0.6735
± 0.0000
lenet / 0.6986
± 0.0046
0.5935
± 0.0149
/ 0.6980
± 0.0057
0.7569
± 0.0620
0.6902
± 0.0157
0.5885
± 0.0087
0.6368
± 0.0266
breast_horizontal
cross-silo (AUC ↑)
logistic_regression 0.6239
± 0.0414
0.9821
± 0.0021
0.9813
± 0.0013
0.9862
± 0.0012
0.9813
± 0.0028
0.9879
± 0.0006
0.9818
± 0.0016
0.9816
± 0.0047
/
mlp_128 0.9842
± 0.0020
0.9871
± 0.0025
0.9822
± 0.0019
0.9879
± 0.00071
0.9859
± 0.0025
0.9831
± 0.0011
0.98694
± 0.0014
0.9850
± 0.0035
/
mlp_128_128_128 0.9856
± 0.0021
0.9867
± 0.0012
0.9819
± 0.0022
0.9864
± 0.0014
0.9873
± 0.0011
0.9860
± 0.0014
0.9874
± 0.0026
0.9861
± 0.0025
/
give_credit_horizontal
cross-silo (AUC ↑)
logistic_regression 0.6736
± 0.0067
0.7885
± 0.0046
0.7775
± 0.0008
0.7856
± 0.0068
0.7793
± 0.0111
0.7756
± 0.0140
0.7611
± 0.0555
0.7853
± 0.0000
0.7853
± 0.0000
mlp_128 0.8281
± 0.0016
0.8307
± 0.0010
0.8282
± 0.0001
0.8319
± 0.0011
0.8312
± 0.0008
0.8356
± 0.0066
0.8293
± 0.0039
0.8302
± 0.0008
0.8296
± 0.0000
mlp_128_128_128 0.8301
± 0.0008
0.8337
± 0.0003
0.8273
± 0.0002
0.8339
± 0.0016
0.8339
± 0.0008
0.8331
± 0.0070
0.8351
± 0.0006
0.8342
± 0.0008
0.8331
± 0.0000
1
1
Leaderboard: Best algorithm and model combinations
Setting Name 1st 2nd 3rd
algorithm & model performance algorithm & model performance algorithm & model performance
cross-silo
horizontal
breast_horizontal
(AUC ↑)
HistSecAgg
gbdt_64_64_6
1.0000
± 0.0000
FedAvg
mlp_128_128_128
0.9874
± 0.0026
FedAvg
mlp_128
0.9869
± 0.0014
default_credit_horizontal
(AUC ↑)
HistSecAgg
gbdt_64_64_6
0.7772
± 0.0000
FedAvg
mlp_128_128_128
0.7762
± 0.0014
FedAvg
mlp_128
0.7649
± 0.0056
give_credit_horizontal
(AUC ↑)
HistSecAgg
gbdt_64_64_6
0.8610
± 0.0000
FedAvg
mlp_128
0.8357
± 0.0006
FedAvg
mlp_128_128_128
0.8351
± 0.0060
student_horizontal
(MSE ↓)
HistSecAgg
gbdt_64_64_6
22.56
± 0.98
FedAvg
mlp_128_128_128
23.16
± 0.44
FedAvg
mlp_128
23.41
± 0.39
vehicle_scale_horizontal
(Accuracy ↑)
FedAvg
mlp_128_128_128
1.0000
± 0.0000
FedAvg
mlp_128
1.0000
± 0.0000
HistSecAgg
gbdt_64_64_6
0.9906
± 0.0000
1
1
Decision Tree for Framework Selection