get_covid_data_dict.py Missing Classifications? #2

beyerch · 2021-04-12T01:29:20Z

While reviewing the metadata.csv file for the covid-chestxray-dataset-master information, it appears that are many classifications that are not accounted for. While some make sense, others would appear that they should be included. This could be due to a change in the format of this file; however, could you confirm this is intended?

In the image below, the distinct findings are shown on the right hand side. The yellow highlighted items are forms of pneumonia which would not get picked up by the python script's logic. As they are not included in the lists defined in the processing script.

qxiaobu · 2021-04-12T13:50:49Z

Yeah, there are rich finding in the latest covid dataset. Unfortunately, I used a older version from March 2020 and it just had less than 200 images and had raw labels in findings. Maybe, you can try it using the latest data. Looking forward to you update

beyerch · 2021-04-12T16:31:23Z

o

Happy to update; however, just want to confirm how you think they should be handled.

Assume anything under Pneumonia/Bacterial/.... goes to pneumonia_bacteria
Assume anything under Pneumonia/Viral/... goes to pneumonia_virus

Should the other items simply be excluded at this point? (e.g. Tuberculosis, unknown, todo, Pneumonia/Fungal, Pneumonia/Aspiration) ?

beyerch · 2021-04-12T16:32:11Z

Also, do you know if there is a way to acquire the March 2020 dataset? In the short-term, I'm looking to recreate the work that you did and having the same starting point would make it easier.

qxiaobu · 2021-04-13T02:08:57Z

In early stage, labeled images are few and coarse, hence we aimed to distinguish/detect the Covid-images from Pneumonia(virus + bacterial) images and Healthy images. Then, we select Kaggle data as the complementary data (which has rich Pneumonia images and Healthy images) for the task. Your data has richer and fine-grained labels, it is more significant for covid recognition.

I am not sure whether the other-items have rich images for training? If have, maybe it is better to make multi-class classification with larger label size (far more than 4 classes); If not, maybe it is also reasonable to exclude other items and just focus on Pneumonia and Covid.

qxiaobu · 2021-04-13T02:17:15Z

I have uploaded the metadata.csv in FLANNEL/original data/. Maybe it is helpful to find the corresponding images.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

get_covid_data_dict.py Missing Classifications? #2

get_covid_data_dict.py Missing Classifications? #2

beyerch commented Apr 12, 2021

qxiaobu commented Apr 12, 2021

beyerch commented Apr 12, 2021

beyerch commented Apr 12, 2021

qxiaobu commented Apr 13, 2021

qxiaobu commented Apr 13, 2021

get_covid_data_dict.py Missing Classifications? #2

get_covid_data_dict.py Missing Classifications? #2

Comments

beyerch commented Apr 12, 2021

qxiaobu commented Apr 12, 2021

beyerch commented Apr 12, 2021

beyerch commented Apr 12, 2021

qxiaobu commented Apr 13, 2021

qxiaobu commented Apr 13, 2021