In UniFed leaderboard, with 15 evaluation scenarios, we present both qualitative and quantitative evaluation results of 11 existing popular open-sourced FL frameworks, from the perspectives of functionality, usability, and system performance. Please find more details in our paper here.
UniFed borrow 15 datasets from existing works to cover different FL settings, modalities, task types, and workload sizes.
Setting | Scenario name | Modality | Task type | Performance metrics | Client number | Sample number | |
---|---|---|---|---|---|---|---|
Horizontal FL | cross-device | celeba | Image | Binary Classification (Smiling vs. Not smiling) | Accuracy | 894 | 20,028 |
femnist | Image | Multiclass Classification (62 classes) | Accuracy | 178 | 40,203 | ||
Text | Next-word Prediction | Accuracy | 813 | 27,738 | |||
cross-silo | breast_horizontal | Medical | Binary Classification | AUC | 2 | 569 | |
default_credit_horizontal | Tabular | Binary Classification | AUC | 2 | 22,000 | ||
give_credit_horizontal | Tabular | Binary Classification | AUC | 2 | 150,000 | ||
student_horizontal | Tabular | Regression (Grade Estimation) | MSE | 2 | 395 | ||
vehicle_scale_horizontal | Image | Multiclass Classification (4 classes) | Accuracy | 2 | 846 | ||
Setting | Scenario name | Modality | Task type | Performance metrics | Vertical split details | ||
Vertical FL |
breast_vertical | Medical | Binary Classification | AUC | A party: 10 features 1 label | B party: 20 features | |
default_credit_vertical | Tabular | Binary Classification | AUC | A party: 13 features 1 label | B party: 10 features | ||
dvisits_vertical | Tabular | Regression (Number of consultations Estimation) | MSE | A party: 3 features 1 label | B party: 9 features | ||
give_credit_vertical | Tabular | Binary Classification | AUC | A party: 5 features 1 label | B party: 5 features | ||
motor_vertical | Sensor data | Regression (Temperature Estimation) | MSE | A party: 4 features 1 label | B party: 7 features | ||
student_vertical | Tabular | Regression (Grade Estimation) | MSE | A party: 6 features 1 label | B party: 7 features | ||
vehicle_scale_vertical | Image | Multiclass Classification (4 classes) | Accuracy | A party: 9 features 1 label | B party: 9 features |
Functionality | All-in-one frameworks | Horizontal-only frameworks | Specialized frameworks | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
FATE | FedML | PaddleFL | Fedlearner | FederatedScope | TFF | Flower | FLUTE | FedScale | CrypTen | FedTree | ||
Model support for Horizontal FL |
Regression | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | N/A | ✘ |
Neural network | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | N/A | ✘ | |
Tree-based model | ✔ | ◯ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | N/A | ✔ | |
Model support for Vertical FL |
Regression | ✔ | ✔ | ◯ | ✘ | ✔ | ◯ | ✘ | ✘ | ✘ | ✔ | ✘ |
Neural network | ✔ | ✔ | ◯ | ✔ | ✘ | ◯ | ✘ | ✘ | ✘ | ✔ | ✘ | |
Tree-based model | ✔ | ◯ | ✘ | ✔ | ✔ | ✘ | ✘ | ✘ | ✘ | ✘ | ✔ | |
Deployment support | Single-host simulation | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ |
Multi-host simulation (one machine) | ✔ | ✔ | ✔ | ✔ | ✔ | ✘ | ✔ | ✔ | ✔ | ✔ | ✔ | |
Multi-host simulation (multiple machines) | ✘ | ✔ | ✔ | ✔ | ✔ | ✘ | ✔ | ✔ | ✔ | N/A | ✔ | |
Edge-devices deployment (one machine) | ✘ | ✔ | ✘ | ✘ | ✘ | ✔ | ✘ | ✘ | ✘ | ✘ | ✘ | |
Networking Protocol | Customized | MPI, gRPC, TorchDistributed | gRPC | gRPC | gRPC | gRPC | gRPC | MPI, TorchDistributed | gRPC | TorchDistributed | gRPC | |
Privacy protection against semi-honest server |
Does not require a 3rd party aggregator | ✔ | ✔ | ✔ | ✔ | ✘ | ✔ | ✘ | ✘ | ✘ | ✔ | ✔ |
Aggregator does not access to individual gradient/update | ✔ | ✔ | ✔ | ✘ | ✔ | ✔ | ✘ | ✘ | ✔ | N/A | ✘ | |
Privacy protection against semi-honest peer clients |
Clients do not learn gradients from other clients | ✔ | ✔ | ✔ | N/A | ✔ | N/A | N/A | N/A | N/A | ✔ | ✔ |
Model params are partially revealed to clients | ✔ | ✔ | ✔ | ✘ | ✔ | ✔ | N/A | ✘ | N/A | ✔ | ✔ | |
Privacy protection in the final model |
Private training mechanisms | ✘ | ✔ | ✔ | ✘ | ✔ | ✔ | ✔ | ✔ | ✔ | ✘ | ✔ |
Utility | GPU support | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✘ |
Rich optimizers | Customized | ✔ | Customized | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | Only SGD | ✘ | |
ML backend | PyTorch, TF | PyTorch | PaddlePaddle | TF | PyTorch, TF | TF, JAX | PyTorch, TF, JAX | PyTorch | PyTorch, TF | PyTorch | scikit-learn |
◯ indicate a claimed support for certain functionalities that are missing or cannot run in the open-source implementation.
Functionality | All-in-one frameworks | Horizontal-only frameworks | Specialized frameworks | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
FATE | FedML | PaddleFL | Fedlearner | FederatedScope | TFF | Flower | FLUTE | FedScale | CrypTen | FedTree | ||
Documentation | Detailed tutorial | ✔ | ✔ | ✔ | ✘ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ |
Code example | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | |
API documentation | ✔ | ✘ | ✘ | ✘ | ✔ | ✔ | ✔ | ✘ | ✔ | ✔ | ✔ | |
Engineering | Native test & benchmark | ✔ | ✔ | ✘ | ✘ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ |
Built-in ML building block | CNN | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✘ |
Transformer | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✘ | ✘ |
Setting | Model | FATE | FedML | PaddleFL | Fedlearner | FederatedScope | TFF | Flower | FLUTE | FedScale |
---|---|---|---|---|---|---|---|---|---|---|
femnist cross-device (Accuracy ↑) |
logistic_regression | / | 0.0443± 0.0114 | 0.0656± 0.0082 | / | 0.0039± 0.0028 | 0.0640± 0.0142 | 0.0382± 0.0040 | 0.0370± 0.0015 | 0.1265± 0.0171 |
mlp_128 | / | 0.8081± 0.0063 | 0.7699± 0.0108 | / | 0.8339± 0.0008 | 0.8161± 0.0593 | 0.8161± 0.0047 | 0.7435± 0.0268 | 0.6327± 0.0145 | |
mlp_128_128_128 | / | 0.6470± 0.0044 | 0.5331± 0.0347 | / | 0.6550± 0.0057 | 0.6756± 0.0072 | 0.6524± 0.0106 | 0.6019± 0.0084 | 0.6735± 0.0000 | |
lenet | / | 0.6986± 0.0046 | 0.5935± 0.0149 | / | 0.6980± 0.0057 | 0.7569± 0.0620 | 0.6902± 0.0157 | 0.5885± 0.0087 | 0.6368± 0.0266 | |
breast_horizontal cross-silo (AUC ↑) |
logistic_regression | 0.6239± 0.0414 | 0.9821± 0.0021 | 0.9813± 0.0013 | 0.9862± 0.0012 | 0.9813± 0.0028 | 0.9879± 0.0006 | 0.9818± 0.0016 | 0.9816± 0.0047 | / |
mlp_128 | 0.9842± 0.0020 | 0.9871± 0.0025 | 0.9822± 0.0019 | 0.9879± 0.00071 | 0.9859± 0.0025 | 0.9831± 0.0011 | 0.98694± 0.0014 | 0.9850± 0.0035 | / | |
mlp_128_128_128 | 0.9856± 0.0021 | 0.9867± 0.0012 | 0.9819± 0.0022 | 0.9864± 0.0014 | 0.9873± 0.0011 | 0.9860± 0.0014 | 0.9874± 0.0026 | 0.9861± 0.0025 | / | |
give_credit_horizontal cross-silo (AUC ↑) |
logistic_regression | 0.6736± 0.0067 | 0.7885± 0.0046 | 0.7775± 0.0008 | 0.7856± 0.0068 | 0.7793± 0.0111 | 0.7756± 0.0140 | 0.7611± 0.0555 | 0.7853± 0.0000 | 0.7853± 0.0000 |
mlp_128 | 0.8281± 0.0016 | 0.8307± 0.0010 | 0.8282± 0.0001 | 0.8319± 0.0011 | 0.8312± 0.0008 | 0.8356± 0.0066 | 0.8293± 0.0039 | 0.8302± 0.0008 | 0.8296± 0.0000 | |
mlp_128_128_128 | 0.8301± 0.0008 | 0.8337± 0.0003 | 0.8273± 0.0002 | 0.8339± 0.0016 | 0.8339± 0.0008 | 0.8331± 0.0070 | 0.8351± 0.0006 | 0.8342± 0.0008 | 0.8331± 0.0000 |
Setting | Name | 1st | 2nd | 3rd | |||
---|---|---|---|---|---|---|---|
algorithm & model | performance | algorithm & model | performance | algorithm & model | performance | ||
cross-silo horizontal |
breast_horizontal (AUC ↑) |
HistSecAgg gbdt_64_64_6 |
1.0000± 0.0000 | FedAvg mlp_128_128_128 |
0.9874± 0.0026 | FedAvg mlp_128 |
0.9869± 0.0014 |
default_credit_horizontal (AUC ↑) |
HistSecAgg gbdt_64_64_6 |
0.7772± 0.0000 | FedAvg mlp_128_128_128 |
0.7762± 0.0014 | FedAvg mlp_128 |
0.7649± 0.0056 | |
give_credit_horizontal (AUC ↑) |
HistSecAgg gbdt_64_64_6 |
0.8610± 0.0000 | FedAvg mlp_128 |
0.8357± 0.0006 | FedAvg mlp_128_128_128 |
0.8351± 0.0060 | |
student_horizontal (MSE ↓) |
HistSecAgg gbdt_64_64_6 |
22.56± 0.98 | FedAvg mlp_128_128_128 |
23.16± 0.44 | FedAvg mlp_128 |
23.41± 0.39 | |
vehicle_scale_horizontal (Accuracy ↑) |
FedAvg mlp_128_128_128 |
1.0000± 0.0000 | FedAvg mlp_128 |
1.0000± 0.0000 | HistSecAgg gbdt_64_64_6 |
0.9906± 0.0000 |