Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Script to take BIDS dataset and generate MONAI-compatible JSON list of subjects for training #21

Open
jcohenadad opened this issue Jul 11, 2023 · 4 comments
Labels

Comments

@jcohenadad
Copy link
Member

jcohenadad commented Jul 11, 2023

Talking with @louisfb01 I realized that the lab does not have a procedure for training MONAI models from BIDS dataset, and instead convert the data physically, which is problematic because:

  • it duplicates the data (more space on HD) EDIT: actually it does not (see Naga's comment below)
  • the dataset used for training is not synced anymore (which defeats the purpose of version-tracking our datasets for keeping provenance in trained models) EDIT: again wrong-- see Naga's comment below

I know the MONAI folks have been working with BIDS compatibility. Can people please link in this GH discussion thread all the existing ressources, and also discuss strategies for the lab to come up with a unified protocol/script for preparing a JSON file for MONAI training.

The solution should accommodate the aggregation of multiple BIDS datasets.

Some resource:

@naga-karthik
Copy link
Member

naga-karthik commented Jul 11, 2023

Thanks for opening the issue! It seems that there's some misunderstanding in what conversion scripts are doing.

it duplicates the data (more space on HD)

No, it does NOT duplicate the data. The MSD conversion script is just a pointer to the original, version-tracked bids dataset. This line shows that the output is just a .json file containing the paths to the image and labels of the original bids dataset

the dataset used for training is not synced anymore

Based on what I wrote above, since the output is json file point to the bids dataset, the script only takes the latest paths to the bids dataset. There is NO duplication of the datasets anywhere.

What do I mean by "pointing to the original bids dataset"? here's a screenshot of how the json file looks:

example MSD json Screen Shot 2023-07-11 at 11 52 20 AM

As you can see, we're referring to images in the root folder of the dataset and the labels in the derivatives folder.

Hope this clarifies some things a bit!

@jcohenadad
Copy link
Member Author

Hope this clarifies some things a bit!

It does! Thanks a lot @naga-karthik ! Your solution is exactly what we need. I just would like to make it more visible to the lab, eg create a template script in this repos maybe?

@naga-karthik
Copy link
Member

I created something like that here (and the students in the lab do know that the conversion scripts exist).

create a template script in this repos maybe?

It's pretty hard to create a template script that just works in a plug-and-play manner. The suffixes, contrasts, sessions, etc. are just too different with the kind of the datasets we have so the script I linked above is just meant to be a starting off point. The students would have to look at the code, make tiny modifications depending on how their data looks (I also make it a bit easier by adding TODOs for where to add stuff).

@jcohenadad
Copy link
Member Author

The fact that @louisfb01 started of with @naga-karthik 's script (instead of starting from scratch) is evidence that having at least a script to start from is better than no script at all, and therefore is a justification to put something in this repos and redirect students to it (and, importantly, improve that script over time).

@valosekj valosekj added the MONAI label Jul 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants