Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Splitting the data is wrong !? #194

Open
Amrusama opened this issue Jul 2, 2024 · 3 comments
Open

Splitting the data is wrong !? #194

Amrusama opened this issue Jul 2, 2024 · 3 comments

Comments

@Amrusama
Copy link

Amrusama commented Jul 2, 2024

I generated a non-iid version of FashionMINST for 15 clients using the following command
python generate_FashionMNIST.py noniid - pat
The output of the data distribution on the terminal is as follows:

Screenshot 2024-07-02 141328

I printed the number of data points for each class in every client and I plotted the distribution of every client class and it didn't match the output on the terminal.

Screenshot 2024-07-02 140351
Screenshot 2024-07-02 140116

@TsingZ0
Copy link
Owner

TsingZ0 commented Jul 3, 2024

Please differentiate between the "entire set" and the "training set." The training set is almost 75% of the entire set for each client.

@Amrusama
Copy link
Author

Amrusama commented Jul 4, 2024

@TsingZ0 Thank you for the explanation. I suggest including this information in the dataset generation section. For instance, the entire training set of FashionMNIST comprises 70,000 samples. My intention was for PFLib to provide a non-iid version of the entire training set.

@TsingZ0
Copy link
Owner

TsingZ0 commented Jul 11, 2024

The test set can be reshuffled after it has been split.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants