UniFed: A Benchmark for Federated Learning Frameworks

In UniFed leaderboard, with 15 evaluation scenarios, we present both qualitative and quantitative evaluation results of 11 existing popular open-sourced FL frameworks, from the perspectives of functionality, usability, and system performance. Please find more details in our paper here.

Evaluation Scenarios

UniFed borrow 15 datasets from existing works to cover different FL settings, modalities, task types, and workload sizes.

Setting		Scenario name	Modality	Task type	Performance metrics	Client number	Sample number
Horizontal FL	cross-device	celeba	Image	Binary Classification (Smiling vs. Not smiling)	Accuracy	894	20,028
		femnist	Image	Multiclass Classification (62 classes)	Accuracy	178	40,203
		reddit	Text	Next-word Prediction	Accuracy	813	27,738
	cross-silo	breast_horizontal	Medical	Binary Classification	AUC	2	569
		default_credit_horizontal	Tabular	Binary Classification	AUC	2	22,000
		give_credit_horizontal	Tabular	Binary Classification	AUC	2	150,000
		student_horizontal	Tabular	Regression (Grade Estimation)	MSE	2	395
		vehicle_scale_horizontal	Image	Multiclass Classification (4 classes)	Accuracy	2	846
Setting		Scenario name	Modality	Task type	Performance metrics	Vertical split details
Vertical FL		breast_vertical	Medical	Binary Classification	AUC	A party: 10 features 1 label	B party: 20 features
		default_credit_vertical	Tabular	Binary Classification	AUC	A party: 13 features 1 label	B party: 10 features
		dvisits_vertical	Tabular	Regression (Number of consultations Estimation)	MSE	A party: 3 features 1 label	B party: 9 features
		give_credit_vertical	Tabular	Binary Classification	AUC	A party: 5 features 1 label	B party: 5 features
		motor_vertical	Sensor data	Regression (Temperature Estimation)	MSE	A party: 4 features 1 label	B party: 7 features
		student_vertical	Tabular	Regression (Grade Estimation)	MSE	A party: 6 features 1 label	B party: 7 features
		vehicle_scale_vertical	Image	Multiclass Classification (4 classes)	Accuracy	A party: 9 features 1 label	B party: 9 features

Available Leaderboard

Metrics:

1
1

Leaderboard: Functionality support

Functionality		All-in-one frameworks						Horizontal-only frameworks			Specialized frameworks
Functionality		FATE	FedML	PaddleFL	Fedlearner	FederatedScope	TFF	Flower	FLUTE	FedScale	CrypTen	FedTree
Model support for Horizontal FL	Regression	✔	✔	✔	✔	✔	✔	✔	✔	✔	N/A	✘
	Neural network	✔	✔	✔	✔	✔	✔	✔	✔	✔	N/A	✘
	Tree-based model	✔	◯	✘	✘	✘	✘	✘	✘	✘	N/A	✔
Model support for Vertical FL	Regression	✔	✔	◯	✘	✔	◯	✘	✘	✘	✔	✘
	Neural network	✔	✔	◯	✔	✘	◯	✘	✘	✘	✔	✘
	Tree-based model	✔	◯	✘	✔	✔	✘	✘	✘	✘	✘	✔
Deployment support	Single-host simulation	✔	✔	✔	✔	✔	✔	✔	✔	✔	✔	✔
	Multi-host simulation (one machine)	✔	✔	✔	✔	✔	✘	✔	✔	✔	✔	✔
	Multi-host simulation (multiple machines)	✘	✔	✔	✔	✔	✘	✔	✔	✔	N/A	✔
	Edge-devices deployment (one machine)	✘	✔	✘	✘	✘	✔	✘	✘	✘	✘	✘
	Networking Protocol	Customized	MPI, gRPC, TorchDistributed	gRPC	gRPC	gRPC	gRPC	gRPC	MPI, TorchDistributed	gRPC	TorchDistributed	gRPC
Privacy protection against semi-honest server	Does not require a 3rd party aggregator	✔	✔	✔	✔	✘	✔	✘	✘	✘	✔	✔
Privacy protection against semi-honest server	Aggregator does not access to individual gradient/update	✔	✔	✔	✘	✔	✔	✘	✘	✔	N/A	✘
Privacy protection against semi-honest peer clients	Clients do not learn gradients from other clients	✔	✔	✔	N/A	✔	N/A	N/A	N/A	N/A	✔	✔
Privacy protection against semi-honest peer clients	Model params are partially revealed to clients	✔	✔	✔	✘	✔	✔	N/A	✘	N/A	✔	✔
Privacy protection in the final model	Private training mechanisms	✘	✔	✔	✘	✔	✔	✔	✔	✔	✘	✔
Utility	GPU support	✔	✔	✔	✔	✔	✔	✔	✔	✔	✔	✘
	Rich optimizers	Customized	✔	Customized	✔	✔	✔	✔	✔	✔	Only SGD	✘
	ML backend	PyTorch, TF	PyTorch	PaddlePaddle	TF	PyTorch, TF	TF, JAX	PyTorch, TF, JAX	PyTorch	PyTorch, TF	PyTorch	scikit-learn

◯ indicate a claimed support for certain functionalities that are missing or cannot run in the open-source implementation.

1
1

Leaderboard: Usability feature

Functionality		All-in-one frameworks						Horizontal-only frameworks			Specialized frameworks
Functionality		FATE	FedML	PaddleFL	Fedlearner	FederatedScope	TFF	Flower	FLUTE	FedScale	CrypTen	FedTree
Documentation	Detailed tutorial	✔	✔	✔	✘	✔	✔	✔	✔	✔	✔	✔
	Code example	✔	✔	✔	✔	✔	✔	✔	✔	✔	✔	✔
	API documentation	✔	✘	✘	✘	✔	✔	✔	✘	✔	✔	✔
Engineering	Native test & benchmark	✔	✔	✘	✘	✔	✔	✔	✔	✔	✔	✔
Built-in ML building block	CNN	✔	✔	✔	✔	✔	✔	✔	✔	✔	✔	✘
Built-in ML building block	Transformer	✔	✔	✔	✔	✔	✔	✔	✔	✔	✘	✘

1
1

Leaderboard: FedAvg with different models on Horizontal FL

Setting	Model	FATE	FedML	PaddleFL	Fedlearner	FederatedScope	TFF	Flower	FLUTE	FedScale
femnist cross-device (Accuracy ↑)	logistic_regression	/	0.0443 ± 0.0114	0.0656 ± 0.0082	/	0.0039 ± 0.0028	0.0640 ± 0.0142	0.0382 ± 0.0040	0.0370 ± 0.0015	0.1265 ± 0.0171
	mlp_128	/	0.8081 ± 0.0063	0.7699 ± 0.0108	/	0.8339 ± 0.0008	0.8161 ± 0.0593	0.8161 ± 0.0047	0.7435 ± 0.0268	0.6327 ± 0.0145
	mlp_128_128_128	/	0.6470 ± 0.0044	0.5331 ± 0.0347	/	0.6550 ± 0.0057	0.6756 ± 0.0072	0.6524 ± 0.0106	0.6019 ± 0.0084	0.6735 ± 0.0000
	lenet	/	0.6986 ± 0.0046	0.5935 ± 0.0149	/	0.6980 ± 0.0057	0.7569 ± 0.0620	0.6902 ± 0.0157	0.5885 ± 0.0087	0.6368 ± 0.0266
breast_horizontal cross-silo (AUC ↑)	logistic_regression	0.6239 ± 0.0414	0.9821 ± 0.0021	0.9813 ± 0.0013	0.9862 ± 0.0012	0.9813 ± 0.0028	0.9879 ± 0.0006	0.9818 ± 0.0016	0.9816 ± 0.0047	/
	mlp_128	0.9842 ± 0.0020	0.9871 ± 0.0025	0.9822 ± 0.0019	0.9879 ± 0.00071	0.9859 ± 0.0025	0.9831 ± 0.0011	0.98694 ± 0.0014	0.9850 ± 0.0035	/
	mlp_128_128_128	0.9856 ± 0.0021	0.9867 ± 0.0012	0.9819 ± 0.0022	0.9864 ± 0.0014	0.9873 ± 0.0011	0.9860 ± 0.0014	0.9874 ± 0.0026	0.9861 ± 0.0025	/
give_credit_horizontal cross-silo (AUC ↑)	logistic_regression	0.6736 ± 0.0067	0.7885 ± 0.0046	0.7775 ± 0.0008	0.7856 ± 0.0068	0.7793 ± 0.0111	0.7756 ± 0.0140	0.7611 ± 0.0555	0.7853 ± 0.0000	0.7853 ± 0.0000
	mlp_128	0.8281 ± 0.0016	0.8307 ± 0.0010	0.8282 ± 0.0001	0.8319 ± 0.0011	0.8312 ± 0.0008	0.8356 ± 0.0066	0.8293 ± 0.0039	0.8302 ± 0.0008	0.8296 ± 0.0000
	mlp_128_128_128	0.8301 ± 0.0008	0.8337 ± 0.0003	0.8273 ± 0.0002	0.8339 ± 0.0016	0.8339 ± 0.0008	0.8331 ± 0.0070	0.8351 ± 0.0006	0.8342 ± 0.0008	0.8331 ± 0.0000

1
1

Leaderboard: Best algorithm and model combinations

Setting	Name	1st		2nd		3rd
Setting	Name	algorithm & model	performance	algorithm & model	performance	algorithm & model	performance
cross-silo horizontal	breast_horizontal (AUC ↑)	HistSecAgg gbdt_64_64_6	1.0000 ± 0.0000	FedAvg mlp_128_128_128	0.9874 ± 0.0026	FedAvg mlp_128	0.9869 ± 0.0014
	default_credit_horizontal (AUC ↑)	HistSecAgg gbdt_64_64_6	0.7772 ± 0.0000	FedAvg mlp_128_128_128	0.7762 ± 0.0014	FedAvg mlp_128	0.7649 ± 0.0056
	give_credit_horizontal (AUC ↑)	HistSecAgg gbdt_64_64_6	0.8610 ± 0.0000	FedAvg mlp_128	0.8357 ± 0.0006	FedAvg mlp_128_128_128	0.8351 ± 0.0060
	student_horizontal (MSE ↓)	HistSecAgg gbdt_64_64_6	22.56 ± 0.98	FedAvg mlp_128_128_128	23.16 ± 0.44	FedAvg mlp_128	23.41 ± 0.39
	vehicle_scale_horizontal (Accuracy ↑)	FedAvg mlp_128_128_128	1.0000 ± 0.0000	FedAvg mlp_128	1.0000 ± 0.0000	HistSecAgg gbdt_64_64_6	0.9906 ± 0.0000

1
1

Decision Tree for Framework Selection