Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_covid_data_dict.py Missing Classifications? #2

Open
beyerch opened this issue Apr 12, 2021 · 5 comments
Open

get_covid_data_dict.py Missing Classifications? #2

beyerch opened this issue Apr 12, 2021 · 5 comments

Comments

@beyerch
Copy link

beyerch commented Apr 12, 2021

While reviewing the metadata.csv file for the covid-chestxray-dataset-master information, it appears that are many classifications that are not accounted for. While some make sense, others would appear that they should be included. This could be due to a change in the format of this file; however, could you confirm this is intended?

In the image below, the distinct findings are shown on the right hand side. The yellow highlighted items are forms of pneumonia which would not get picked up by the python script's logic. As they are not included in the lists defined in the processing script.

image

@qxiaobu
Copy link
Owner

qxiaobu commented Apr 12, 2021

Yeah, there are rich finding in the latest covid dataset. Unfortunately, I used a older version from March 2020 and it just had less than 200 images and had raw labels in findings. Maybe, you can try it using the latest data. Looking forward to you update

@beyerch
Copy link
Author

beyerch commented Apr 12, 2021

o

Happy to update; however, just want to confirm how you think they should be handled.

Assume anything under Pneumonia/Bacterial/.... goes to pneumonia_bacteria
Assume anything under Pneumonia/Viral/... goes to pneumonia_virus

Should the other items simply be excluded at this point? (e.g. Tuberculosis, unknown, todo, Pneumonia/Fungal, Pneumonia/Aspiration) ?

@beyerch
Copy link
Author

beyerch commented Apr 12, 2021

Also, do you know if there is a way to acquire the March 2020 dataset? In the short-term, I'm looking to recreate the work that you did and having the same starting point would make it easier.

@qxiaobu
Copy link
Owner

qxiaobu commented Apr 13, 2021

In early stage, labeled images are few and coarse, hence we aimed to distinguish/detect the Covid-images from Pneumonia(virus + bacterial) images and Healthy images. Then, we select Kaggle data as the complementary data (which has rich Pneumonia images and Healthy images) for the task. Your data has richer and fine-grained labels, it is more significant for covid recognition.

I am not sure whether the other-items have rich images for training? If have, maybe it is better to make multi-class classification with larger label size (far more than 4 classes); If not, maybe it is also reasonable to exclude other items and just focus on Pneumonia and Covid.

@qxiaobu
Copy link
Owner

qxiaobu commented Apr 13, 2021

I have uploaded the metadata.csv in FLANNEL/original data/. Maybe it is helpful to find the corresponding images.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants