Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to read and manipulate file meta #184

Merged
merged 4 commits into from
Sep 6, 2021
Merged

Conversation

vsoch
Copy link
Member

@vsoch vsoch commented Sep 2, 2021

This is the start of work to read/parse file_meta to address #183. I made some time this evening and I think the implementation was easier than I anticipated, but I have a few questions / concerns to address!

Remaining questions:

Do file meta fields overlap with fields in the dicom?

The simplest implementation I could do is to look for another section in the recipe, filemeta, that gets parsed just like the header section. I simply create DicomField with is_filemeta = True. However, I realized that if the identifiers you find in file_meta are unique to that (and vice versa with the dicom) then we don't need to check this value - we can simply replace/add/ etc based on the unique identifier just as we would. So this is how I implemented it to start. However, if it's the case that the same Field can appear in file meta OR the regular dicom, then we'll need to add flag actions that are just for filemeta, and add checks all over the place that essentially say:

# Process if it's an action for file meta, and this is a file meta field
if is_filemeta_action and field.is_filemeta:
   # process

# Process if it's an action not for filemeta, and this is a regular header field
elif not is_filemeta and not field.is_filemeta:
   # process

# otherwise not

TLDR: If fields are not unique to the sections we need to be checking each field if it is a filemeta or not before any action is taken. As implemented now, the fields are parsing over equivalently and it is assumed that file meta fields do not appear in the dicom and vice versa.

Do we need to make any special changes to the file meta, e.g., perhaps the size?

I have never manipulated it before, and I am hoping that pydicom save handles these changes, but if not we should do it manually. We can check with some of the other maintainers of the pydicom org that know pydicom well if we aren't sure here, and after we've addressed the point above.

cc @wetzelj and @adildahlan for your feedback!

Signed-off-by: vsoch [email protected]

Description

Related issues: # (issue)

Please include a summary of the change(s) and if relevant, any related issues above.

Checklist

  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings
  • My code follows the style guidelines of this project

Open questions

See above!

@vsoch vsoch force-pushed the add/support-file-meta branch from b175e5b to eb7d76d Compare September 2, 2021 00:15
Remaining questions:
1. Do file meta fields overlap with fields in the dicom? If so, we need to be checking each field if it is a filemeta or not before replace. As implemented now, the fields are parsing over equivalently and it is assumed that file meta fields do not appear in the dicom and vice versa.
2. Do we need to make any special changes to the file meta, e.g., perhaps the size? I have never manipulated it before, and I am hoping that pydicom save handles these changes, but if not we should do it manually.

Signed-off-by: vsoch <[email protected]>
@vsoch vsoch force-pushed the add/support-file-meta branch from eb7d76d to 39d8e01 Compare September 2, 2021 00:18
@adildahlan
Copy link

Hi @vsoch,

Many thanks for working on this very quickly!

I like the idea of having a section within the deid.dicom recipe file. Just like we have %Header section (https://pydicom.github.io/deid/examples/recipe/), maybe we could have something like %File_meta section for the file meta.
My understanding is that the attributes (data elements) within the dicom file meta are different to those of the dicom file.
Here is a link to the official page of DICOM files where all the attributes within the file meta are listed (of course not all dicom files meta have all these attributes, but it could be). NB: All the dicom file meta have a tag starting with 0002.

My intuition would be that we wouldn't require any changes to be done to the file meta unless the user has that as part of the deid.dicom recipe. I am not sure TBH about that...

Many thanks again!

Regards

Adil

@vsoch
Copy link
Member Author

vsoch commented Sep 2, 2021

%File_meta section for the file meta.

Yes that's correct! The PR here adds %filemeta (take a look at the example https://github.com/pydicom/deid/blob/39d8e019c1d3c85047a2c5a20b514d141d6e8535/examples/dicom/header-manipulation/file-meta/deid.dicom)

My understanding is that the attributes (data elements) within the dicom file meta are different to those of the dicom file.
Here is a link to the official page of DICOM files where all the attributes within the file meta are listed (of course not all dicom files meta have all these attributes, but it could be). NB: All the dicom file meta have a tag starting with 0002.

If this is the case, then my strategy was right on! Do you want to test the branch here?

The last remaining bit is if we need to do something special for save. Let's wait for @wetzelj feedback then ping some of the other maintainers to ask.

@wetzelj
Copy link
Contributor

wetzelj commented Sep 2, 2021

I also like the idea of having a separate %file_meta section. We would, however, probably want to ensure that standard %header actions don't overlap and act against filemeta attributes (thinking of expanders and %values based rules).

I don't think we would want a recipe set up as below to remove the MediaStorageSOPInstanceUID that it just replaced:

%filemeta
REPLACE MediaStorageSOPInstanceUID new-id

%header
REMOVE endswith:UID 

Instead, if we did want to remove fields ending with UID from both sections of the file, we'd want to build a recipe like:

%filemeta
REMOVE endswith:UID

%header
REMOVE endswith:UID 

@adildahlan
Copy link

Yes that's correct! The PR here adds %filemeta (take a look at the example https://github.com/pydicom/deid/blob/39d8e019c1d3c85047a2c5a20b514d141d6e8535/examples/dicom/header-manipulation/file-meta/deid.dicom)

Oh yes that's perfect!

@adildahlan
Copy link

%filemeta
REPLACE MediaStorageSOPInstanceUID new-id

%header
REMOVE endswith:UID


Instead, if we did want to remove fields ending with UID from both sections of the file, we'd want to build a recipe like:

%filemeta
REMOVE endswith:UID

%header
REMOVE endswith:UID

I am wondering why do we need to have REMOVE endswith:UID as part of the recipe?

@wetzelj
Copy link
Contributor

wetzelj commented Sep 2, 2021

That was just a sample to indicate the way we would/would not want the actions to be applied - that wasn't anything we'd really want to do. :)

@vsoch
Copy link
Member Author

vsoch commented Sep 2, 2021

That’s a good point - but @wetzelj to go in the other direction if fields are unique why even have the second section? A rule to replace all ending with UID should hit those in file meta, why should they be special? But I do think we perhaps want to protect some of the fields like the size and transfer syntax from change?

@adildahlan
Copy link

That was just a sample to indicate the way we would/would not want the actions to be applied - that wasn't anything we'd really want to do. :)

Oh I see, that makes sense

@adildahlan
Copy link

That’s a good point - but @wetzelj to go in the other direction if fields are unique why even have the second section? A rule to replace all ending with UID should hit those in file meta, why should they be special? But I do think we perhaps want to protect some of the fields like the size and transfer syntax from change?

My initial intuitive understanding was that because the data elements part of the file_meta are inherently separate for the dicom file data elements, it would probably make sense to have a seperate section for them within the recipe (to access the dicom data elements it is enough to do ds.DATAELEMENT whereas to access file_meta data elements, we would need to do ds.file_meta.DATAELEMENT) I don't have a strong opinion on this to be honest.

@wetzelj
Copy link
Contributor

wetzelj commented Sep 2, 2021

I guess you're right. It probably doesn't make much sense to split it out. My statement was coming from my line of thinking that filemeta fields were special, and should be somewhat protected from "bad"/overly wide recipes... but really, that's not the case.

Honestly, I'm with @adildahlan and don't have a strong opinion on this either.

@vsoch
Copy link
Member Author

vsoch commented Sep 2, 2021

(to access the dicom data elements it is enough to do ds.DATAELEMENT whereas to access file_meta data elements, we would need to do ds.file_meta.DATAELEMENT) I don't have a strong opinion on this to be honest.

We can actually handle this easily because the dicom.file_meta is just another kind of dataset, so we add it as a dataset to parse (and save fields for). Since the fields we add to our lookup data structure are the same ones that point back into the dicom and they are accessed easily, we can actually just parse them akin to any other header.

Okay so here is my thoughts for the recipe:

  1. I think file_meta fields should just work, and not have their own section. It makes sense only up until we consider statements like REPLACE endswith:uid and in this case I think we either have to only parse one of the header or file_meta (and add a lot of custom logic that is going to slow things down) OR allow the parsing to be seamless, and to treat the file meta fields just like the header fields. I actually think it's more risky that people miss file meta fields with unique ids that maybe should be replaced because they have to specify it twice. So I want to remove the filemeta section.
  2. Given the point above and looking here we need to protect some fields in the file_meta. I think we'd want to protect:
FileMetaInformationGroupLength
FileMetaInformationVersion
TransferSyntaxUID
ImplementationClassUID

So those are my thoughts for moving forward - remove the special section and treat the fields like any other, and then protect ones that shouldn't be changed. Let me know what you think!

@adildahlan
Copy link

adildahlan commented Sep 2, 2021

So those are my thoughts for moving forward - remove the special section and treat the fields like any other, and then protect ones that shouldn't be changed. Let me know what you think!

Sounds perfect!

@wetzelj
Copy link
Contributor

wetzelj commented Sep 2, 2021

Agreed. This solution will be great!

@vsoch
Copy link
Member Author

vsoch commented Sep 2, 2021

okay fantastic! I'll get in a PR after the end of the work day. Thanks for the quick, fun discussion y'all!

instead of having a special section, we can include file-meta parsing alongside the
header parsing, as the names are unique. The one tiny detail is that we want to be
careful with fields that should not be touched, so those are now added to the
set of "skip." This set also includes other like PixelData that was not previously
being loaded, which I would consider a bug (that should now be fixed)

Signed-off-by: vsoch <[email protected]>
@vsoch
Copy link
Member Author

vsoch commented Sep 3, 2021

Okay @wetzelj @adildahlan this should be ready for review! Let me know if there are tests you think we should add, or if your use case is not addressed.

@adildahlan
Copy link

adildahlan commented Sep 5, 2021

Hi @vsoch !
Many thanks for working on this.
So I created a new conda environment and installed the current branch of the died library using the following code:
pip install git+https://github.com/pydicom/deid.git@add/support-file-meta

And I have my recipe as follows:


%header

REPLACE PatientName var:new_PatientName
REPLACE PatientID var:new_PatientID
ADD SOPInstanceUID var:new_SOPInstanceUID
ADD MediaStorageSOPInstanceUID var:new_MediaStorageSOPInstanceUID

And in my code I define each of these new variables as done in. this example code https://github.com/pydicom/deid/blob/master/examples/dicom/recipe/deid-dicom-example.py

However, I am getting this error
image

I hope you could help me figure out what's wrong

Many thanks!

Adil

@vsoch
Copy link
Member Author

vsoch commented Sep 5, 2021

Oh interesting, looks like an issue with save. Can you share with me a dummy example to reproduce the issue? Are you just using the provided cat.dcm but updated the deid recipe?

@adildahlan
Copy link

adildahlan commented Sep 5, 2021

Oh interesting, looks like an issue with save. Can you share with me a dummy example to reproduce the issue? Are you just using the provided cat.dcm but updated the deid recipe?

Hi @vsoch !

Oh I see. I am using the data I downloaded from the following link.

I would need to check if I am allowed to share the code as it is a private project I am collaborating on. Can you please try running your script with the dataset in the link above? If you don't get the same error maybe then I can check with my supervisor if I can share bits of the code with you. Apologies about that...

Many thanks!

@vsoch
Copy link
Member Author

vsoch commented Sep 5, 2021

it's okay I was able to reproduce on save:

In [3]: dicom.save_as('test.dcm')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~/Desktop/Code/deid/examples/dicom/header-manipulation/filemeta/example.py in <module>
----> 1 dicom.save_as('test.dcm')

~/Desktop/Code/deid/env/lib/python3.8/site-packages/pydicom/dataset.py in save_as(self, filename, write_like_original)
   1885             Write a DICOM file from a :class:`FileDataset` instance.
   1886         """
-> 1887         pydicom.dcmwrite(filename, self, write_like_original)
   1888 
   1889     def ensure_file_meta(self) -> None:

~/Desktop/Code/deid/env/lib/python3.8/site-packages/pydicom/filewriter.py in dcmwrite(filename, dataset, write_like_original)
    969     #   place
    970     if dataset.group_dataset(0x0002) != Dataset():
--> 971         raise ValueError(
    972             f"File Meta Information Group Elements (0002,eeee) should be in "
    973             f"their own Dataset object in the "

ValueError: File Meta Information Group Elements (0002,eeee) should be in their own Dataset object in the 'FileDataset.file_meta' attribute.

Let me take a look, I might need to ask for help from the other maintainers of pydicom on this one!

@adildahlan
Copy link

it's okay I was able to reproduce on save:

In [3]: dicom.save_as('test.dcm')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~/Desktop/Code/deid/examples/dicom/header-manipulation/filemeta/example.py in <module>
----> 1 dicom.save_as('test.dcm')

~/Desktop/Code/deid/env/lib/python3.8/site-packages/pydicom/dataset.py in save_as(self, filename, write_like_original)
   1885             Write a DICOM file from a :class:`FileDataset` instance.
   1886         """
-> 1887         pydicom.dcmwrite(filename, self, write_like_original)
   1888 
   1889     def ensure_file_meta(self) -> None:

~/Desktop/Code/deid/env/lib/python3.8/site-packages/pydicom/filewriter.py in dcmwrite(filename, dataset, write_like_original)
    969     #   place
    970     if dataset.group_dataset(0x0002) != Dataset():
--> 971         raise ValueError(
    972             f"File Meta Information Group Elements (0002,eeee) should be in "
    973             f"their own Dataset object in the "

ValueError: File Meta Information Group Elements (0002,eeee) should be in their own Dataset object in the 'FileDataset.file_meta' attribute.

Let me take a look, I might need to ask for help from the other maintainers of pydicom on this one!

Oh perfect! Many thanks!

Temperarly, I found a way to go around the issue with anonymising the MediaStorageSOPInstanceUID. I did so by running the anonymisation script without having in the recipe the MediaStorageSOPInstanceUID in it. Then afterwards I loop through all the anonymised dicom files and if the MediaStorageSOPInstanceUID dataelement doesn't match the SOPInstanceUID then I change the former to the later and save the dicom file as shown below.

if 'var:new_SOPInstanceUID' in new_variables:
        # Read in the new anonymised dicom files
        dicom_files_path = list(get_files(output_directory, pattern='*.dcm'))
        # Dictionary of dicom directories and header dictionary
        dicoms_dict = get_identifiers(dicom_files_path)
        # Loop through the new anonymised dicom files
        for i, (dicom_path, dicom_header) in enumerate(dicoms_dict.items()):
            # Read the anonymised dicom file
            ds = pydicom.dcmread(dicom_path)
            # Check if the MediaStorageSOPInstanceUID wasn't anonymised
            if ds.file_meta.MediaStorageSOPInstanceUID != ds.SOPInstanceUID:
                # Anonymise the MediaStorageSOPInstanceUID
                ds.file_meta.MediaStorageSOPInstanceUID = ds.SOPInstanceUID
                # Add the new MediaStorageSOPInstanceUID to the MediaStorageSOPInstanceUID_old_new_pair
                MediaStorageSOPInstanceUID_old_new_pair[ds.file_meta.MediaStorageSOPInstanceUID] = ds.SOPInstanceUID
                # Save the anonymised dicom file
                ds.save_as(dicom_path)

This worked so I guess saving the dicoms file after changing the MediaStorageSOPInstanceUID works but probably not with having MediaStorageSOPInstanceUID in the recipe.

Many thanks for all your help!

there is currently a bug that when we add (or otherwise manipulate a field) we assume
retrieving the field from the main dicom. We actually need to check if the field is from
the file meta, and change behavior based on that. So these changes should fix a bug of
running the example and trying to save the dicom.

Signed-off-by: vsoch <[email protected]>
@vsoch
Copy link
Member Author

vsoch commented Sep 5, 2021

hey @adildahlan so I figured it out! What was happening is that for any action that uses the add functionality, the assumption was adding to the dicom directly as a field. Of course with file meta this is problematic because we need to save to the filemeta instead. So the fix was to restore preserving the boolean if something was file meta, and then taking a different action in the parser depending on if it was file meta or not. Take a look at the changes here: a7f3f20 and when tests pass give the updated branch a try.

@adildahlan
Copy link

hey @adildahlan so I figured it out! What was happening is that for any action that uses the add functionality, the assumption was adding to the dicom directly as a field. Of course with file meta this is problematic because we need to save to the filemeta instead. So the fix was to restore preserving the boolean if something was file meta, and then taking a different action in the parser depending on if it was file meta or not. Take a look at the changes here: a7f3f20 and when tests pass give the updated branch a try.

Hi @vsoch, it is working!!!
Many thanks for having worked on this issue very quickly. You have done a great job at building this library and maintaining it 👏🏼!
Wish you all the best!
Warm regards
Adil

@vsoch
Copy link
Member Author

vsoch commented Sep 6, 2021

Woot! Let's merge and release then.

@vsoch vsoch merged commit 418ce32 into master Sep 6, 2021
@vsoch
Copy link
Member Author

vsoch commented Sep 6, 2021

@vsoch vsoch deleted the add/support-file-meta branch September 10, 2022 20:31
@wetzelj wetzelj mentioned this pull request Jan 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants