-
-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Study Instance UID and Series Instance UID not changed when dicom contains Related Series Sequence tags #153
Comments
Thank you for the report, and the clear way to reproduce! I'm testing locally and I've reproduced, and I'll share what I've found. It does look like the replacement function itself works okay - here we see that all the occurrences of the instance UIDs are changed to modified:
However the bug seems to be in actually using those fields to do the save, because the resulting file appears as you've showed. I suspect it's indexing. In the DeidParser perform_action we do grab the fields correctly: fields
{'(0020, 000d)': (0020, 000d) Study Instance UID UI: RelStudy0 [StudyInstanceUID],
'(0008, 1250)__0__(0020, 000d)': (0020, 000d) Study Instance UID UI: RelStudy0 [RelatedSeriesSequence__StudyInstanceUID],
'(0008, 1250)__1__(0020, 000d)': (0020, 000d) Study Instance UID UI: RelStudy1 [RelatedSeriesSequence__StudyInstanceUID],
'(0008, 1250)__2__(0020, 000d)': (0020, 000d) Study Instance UID UI: RelStudy2 [RelatedSeriesSequence__StudyInstanceUID]} Ah so I found the bug (now we are in the class add_value method. The issue is that the dicom generated doesn't have a value attribute, likely because you created it on the fly. The reason it has the field but then only replaces the nested ones is because we use not having the value directly as an indicator that there is a nested field, e.g.,
I don't know why, but the iteration through the self.fields seems to change both instances (the nested and non-nested) but doesn't actual edit the dicom. Here is an example - see that modified appears twice in self.fields, but only once in the self.dicom self.fields
{'(0008, 0012)': (0008, 0012) Instance Creation Date DA: '20201001' [InstanceCreationDate],
'(0008, 0033)': (0008, 0033) Content Time TM: '20201001' [ContentTime],
'(0008, 1250)': (0008, 1250) Related Series Sequence SQ: <Sequence, length 3> [RelatedSeriesSequence],
'(0020, 000d)': (0020, 000d) Study Instance UID UI: modified [StudyInstanceUID],
'(0020, 000e)': (0020, 000e) Series Instance UID UI: RelSeries0 [SeriesInstanceUID],
'(0010, 0010)': (0010, 0010) Patient's Name PN: 'ReproducingTheError' [PatientName],
'(0008, 1250)__0__(0020, 000d)': (0020, 000d) Study Instance UID UI: modified [RelatedSeriesSequence__StudyInstanceUID],
'(0008, 1250)__0__(0020, 000e)': (0020, 000e) Series Instance UID UI: RelSeries0 [RelatedSeriesSequence__SeriesInstanceUID],
'(0008, 1250)__1__(0020, 000d)': (0020, 000d) Study Instance UID UI: RelStudy1 [RelatedSeriesSequence__StudyInstanceUID],
'(0008, 1250)__1__(0020, 000e)': (0020, 000e) Series Instance UID UI: RelSeries1 [RelatedSeriesSequence__SeriesInstanceUID],
'(0008, 1250)__2__(0020, 000d)': (0020, 000d) Study Instance UID UI: RelStudy2 [RelatedSeriesSequence__StudyInstanceUID],
'(0008, 1250)__2__(0020, 000e)': (0020, 000e) Series Instance UID UI: RelSeries2 [RelatedSeriesSequence__SeriesInstanceUID]}
> self.dicom
(0008, 0012) Instance Creation Date DA: '20201001'
(0008, 0033) Content Time TM: '20201001'
(0008, 1250) Related Series Sequence 3 item(s) ----
(0020, 000d) Study Instance UID UI: modified
(0020, 000e) Series Instance UID UI: RelSeries0
---------
(0020, 000d) Study Instance UID UI: RelStudy1
(0020, 000e) Series Instance UID UI: RelSeries1
---------
(0020, 000d) Study Instance UID UI: RelStudy2
(0020, 000e) Series Instance UID UI: RelSeries2
---------
(0010, 0010) Patient's Name PN: 'ReproducingTheError'
(0020, 000d) Study Instance UID UI: SeriesUID
(0020, 000e) Series Instance UID UI: StudyUID The way this works is that (technically) the self.fields should always have a DicomField class that references the same element in the dicom data, that way when we edit one, the other edits as well. But this doesn't appear to be happening for some reason. I tried removing the deepcopy of the fields (added recently) and this isn't the issue - we still don't change the first one. I then moved earlier in the code and just tried manually changing a value - this hints that perhaps the reference to the top level field is incorrect, in fact it's pointing to the one that comes much later! field
Out[5]: (0020, 000d) Study Instance UID UI: RelStudy0 [StudyInstanceUID]
In [6]: field.element.value
Out[6]: 'RelStudy0'
In [7]: self.dicom
Out[7]:
(0008, 0012) Instance Creation Date DA: '20201001'
(0008, 0033) Content Time TM: '20201001'
(0008, 1250) Related Series Sequence 3 item(s) ----
(0020, 000d) Study Instance UID UI: RelStudy0
(0020, 000e) Series Instance UID UI: RelSeries0
---------
(0020, 000d) Study Instance UID UI: RelStudy1
(0020, 000e) Series Instance UID UI: RelSeries1
---------
(0020, 000d) Study Instance UID UI: RelStudy2
(0020, 000e) Series Instance UID UI: RelSeries2
---------
(0010, 0010) Patient's Name PN: 'ReproducingTheError'
(0020, 000d) Study Instance UID UI: SeriesUID
(0020, 000e) Series Instance UID UI: StudyUID
In [8]: field.element.value
Out[8]: 'RelStudy0'
In [9]: field.element.value = 'ADIFFERENTONE'
In [10]: self.dicom
Out[10]:
(0008, 0012) Instance Creation Date DA: '20201001'
(0008, 0033) Content Time TM: '20201001'
(0008, 1250) Related Series Sequence 3 item(s) ----
(0020, 000d) Study Instance UID UI: ADIFFERENTONE
(0020, 000e) Series Instance UID UI: RelSeries0
---------
(0020, 000d) Study Instance UID UI: RelStudy1
(0020, 000e) Series Instance UID UI: RelSeries1
---------
(0020, 000d) Study Instance UID UI: RelStudy2
(0020, 000e) Series Instance UID UI: RelSeries2
---------
(0010, 0010) Patient's Name PN: 'ReproducingTheError'
(0020, 000d) Study Instance UID UI: SeriesUID
(0020, 000e) Series Instance UID UI: StudyUID In the above, since we indexed uid In [19]: for key, value in dataset.items():
...: print(key)
...: print(value)
...:
(0008, 0012)
(0008, 0012) Instance Creation Date DA: '20201001'
(0008, 0033)
(0008, 0033) Content Time TM: '20201001'
(0008, 1250)
(0008, 1250) Related Series Sequence SQ: <Sequence, length 3>
(0010, 0010)
(0010, 0010) Patient's Name PN: 'ReproducingTheError'
(0020, 000d)
(0020, 000d) Study Instance UID UI: SeriesUID
(0020, 000e)
(0020, 000e) Series Instance UID UI: StudyUID
In [20]: for item in dataset.iterall():
...: print(item)
...:
...:
(0008, 0012) Instance Creation Date DA: '20201001'
(0008, 0033) Content Time TM: '20201001'
(0008, 1250) Related Series Sequence SQ: <Sequence, length 3>
(0020, 000d) Study Instance UID UI: RelStudy0
(0020, 000e) Series Instance UID UI: RelSeries0
(0020, 000d) Study Instance UID UI: RelStudy1
(0020, 000e) Series Instance UID UI: RelSeries1
(0020, 000d) Study Instance UID UI: RelStudy2
(0020, 000e) Series Instance UID UI: RelSeries2
(0010, 0010) Patient's Name PN: 'ReproducingTheError'
(0020, 000d) Study Instance UID UI: SeriesUID
(0020, 000e) Series Instance UID UI: StudyUID So I've been interactively debugging for a few hours this morning - I know the issue is here with respect to how we iterate through the elements and save the association, but I haven't yet figured out what's wrong / how to fix it. @wetzelj do you have any ideas? |
oh oops, I spoke too soon... I think I fixed it! Let me put in a PR. |
this leads to an error in replacement of nested fields and top level fields having mixed up references. We fixed this by iterating through the dataset as is, instead of using iterall. Signed-off-by: vsoch <[email protected]>
I've only cursorily read this (and have to hop on to a meeting now), but wasn't the iterall introduced to enable us to act on sequences and sub-sequences as well? Does removing the iterall break anything with detecting data in those subsequences? |
Here is the PR! #154 @wetzelj actually I don't think so! The reason is that we already loop through a sequence, discover that it is a sequence, and then add it's values as datasets to parse through. So we sort of unwrap the list. I think this issue likely arose because this function first just iterated through fields (and did it's own manual extraction of nested) but then you showed me the magic of iterall and I changed it, and I suspect we didn't fully test the replace again. That said, possibly now we lost something that iterall was providing? Please take a look at the PR and let me know what you think! |
Sounds good. I ran some tests and it seems to be okay - I didn't notice anything being lost, just the new corrected functionality. :) I'll keep an eye out as I continue development/testing. |
fixing issue #153 that fields are not properly indexed
Fixed with #154 |
It seems that Deid does not manage to find and change the the Study and Series Instance UID tags that are outside the Related Series Sequence when Related Series Sequence tags are present.
I've appended a (almost) minimally working example reproducing the error I see in my data. The python code first generates a dicom file and then tries to clean it by changing
contains:StudyInstanceUID
andcontains:SeriesInstanceUID
from their original values tomodified
.A copy of the output is copied below. Ignore the weird warning formatting it is due to the windows console not having colors.
Producing code:
Deid recipe
The text was updated successfully, but these errors were encountered: