Skip to content

Commit

Permalink
Add/remove keep behavior (Previous PR #199 - 0.2.29 rc) (#237)
Browse files Browse the repository at this point in the history
* #193 'bump pydicom' - bumped pydicom to 2.2.2, local tests green, changelog update, added PyCharm to gitignore
* WIP 197 remove + keep combo should keep (#198)
* #197 'remove and keep combo' - added tests that describe situation where remove overrules keep, and also explicitly show that except is using value as substring/regex match
* #197 'remove and keep combo' - added quite straightforward "add fields marked with KEEP to skiplist", also fixed docstring on parser#perform_action and removed unused parameter, fixed typo in random place
* #197 'remove and keep combo' - quotes changed as per code format convention
* #197 'remove and keep combo' - changed keep list to be separate property, updated docs on action order, added two tests for ensuring blank action order
* #197 'remove and keep combo' - changed keep fields collector to be more pythonic
* #197 'remove and keep combo' - more safeguards in keep fields collector, clarified docs upon field that should not be None
* #197 'remove and keep combo' - changelog update, typo fix, version bump
* #197 'remove and replace/jitter combo' - added once-evaluated list of fields being jittered or replaced to prevent them from being removed, tests, extended description of @keep property, changelog update, typo fix, version bump
* #197 'remove and replace/jitter combo' - run black
* Update deid/version.py - agreed to bump to 0.3
* Bug with return of derive_ctp_coordinate
Downstream code expects coordinate definitions to be a comma separated list of values, not a list of ints.
* Update deid/dicom/parser.py
Committing suggestion.
* change derive_ctp_coordinates to private function

Co-authored-by: Gleb <[email protected]>
Co-authored-by: Vanessasaurus <[email protected]>
  • Loading branch information
3 people authored Nov 21, 2022
1 parent 264f057 commit 5b458d1
Show file tree
Hide file tree
Showing 7 changed files with 539 additions and 158 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ env
_build

private
# dev tools
.idea
.vscode

# osx
Expand Down
163 changes: 84 additions & 79 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,93 +1,98 @@
# CHANGELOG

This is a manually generated log to track changes to the repository for each release.
Each section should include general headers such as **Implemented enhancements**
This is a manually generated log to track changes to the repository for each release.
Each section should include general headers such as **Implemented enhancements**
and **Merged pull requests**. Critical items to know are:

- renamed commands
- deprecated / removed commands
- changed defaults
- backward incompatible changes (recipe file format? image file format?)
- migration guidance (how to convert images?)
- changed behaviour (recipe sections work differently)
- renamed commands
- deprecated / removed commands
- changed defaults
- backward incompatible changes (recipe file format? image file format?)
- migration guidance (how to convert images?)
- changed behaviour (recipe sections work differently)

Referenced versions in headers are tagged on Github, in parentheses are for pypi.

## [vxx](https://github.com/pydicom/deid/tree/master) (master)
- Add `ctpcoordinates` and `ctpkeepcoordinates` to handle different formats (0.3.0)
- Minimum Python required is 3.7, numpy 1.20
- Remove unecessary typing - adds bugs (0.2.37)
- Provide data as an external package (0.2.36)
- Restore expand_sequences to get_identifiers (0.2.35)
- Add function to clean datasets without `DicomCleaner` [#223](https://github.com/pydicom/deid/pull/223) (0.2.34)
- add select:vr:XX field expander to select elements by VR (0.2.33)
- rename group:XXXX field expander to select:group:XXXX
- add group:XXXX field expander to select all elements with a specified DICOM tag group (0.2.32)
- custom class example for using dicom.Dataset, not requiring on client init [#211](https://github.com/pydicom/deid/pull/211) (0.2.31)
- adding support for deid provided functions [#207](https://github.com/pydicom/deid/issues/207) (0.2.3)
- update CTP deid.dicom up until [this commit](https://github.com/johnperry/CTP/commit/345b05b157c046532e8791a63ababbf6d0dba59b) (0.2.29)
- various LGTM alert fixes [#186](https://github.com/pydicom/deid/pull/186) (0.0.28)
- bug fix for exception when attempting to jitter DA/DT which cannot be jittered (space) [#189] (https://github.com/pydicom/deid/issues/189) (0.2.27)
- adding support to manipulate file meta [#183](https://github.com/pydicom/deid/issues/183) (0.2.26)
- updated pydicom dependency from 1.3.0 to 2.1.1 [#171](https://github.com/pydicom/deid/issues/171) (0.2.25)
- bug fix for multivalued fields in %values lists [#174](https://github.com/pydicom/deid/issues/174)
- allowing other VR types for jitter [#175](https://github.com/pydicom/deid/issues/175)
- ensuring that an add/replace of an existing value is also updated in fields [#173](https://github.com/pydicom/deid/issues/173)
- change to correct issue with deidentifying RGB images [#165](https://github.com/pydicom/deid/issues/165) (0.2.24)
- removing verbosity of debug logger (0.2.23)
- changing iteration technique through fields to properly add nested uids [#153](https://github.com/pydicom/deid/issues/153) (0.2.22)
- change to return results from detect when recipe does not contain filters [#155](https://github.com/pydicom/deid/issues/155)
- fix to correct bug in detect [#142](https://github.com/pydicom/deid/issues/142) (0.2.21)
- fixes to detect and clean to better represent keep/coordinates (0.2.20)
- modify default VR for added tags [#146](https://github.com/pydicom/deid/issues/146), bug with private tags in %fields section [#147](https://github.com/pydicom/deid/issues/147) (0.2.19)
- adding keepcoordinates to allow for reverse specified coordinates (0.2.18)
- bug with empty values within %values sections [#144](https://github.com/pydicom/deid/issues/144) (0.2.17)
- ability to save 4d dicom series as animation (0.2.16)
- bug with multiple actions on the same header field [#140](https://github.com/pydicom/deid/issues/140) (0.2.15)
- nested sequence in replace_identifiers/strip_sequences bug [#137](https://github.com/pydicom/deid/issues/137) (0.2.14)
- adding custom cleaner to specify region fields (0.2.13)
- bug with replace for tested fields (0.2.12)
- fixing func args to be kwargs (0.2.11)
- DicomCleaner empty method does not handle list (0.2.1)
- refactor to use iterall for all fields (0.2.0)
- including private tags in parsing (0.1.42)
- adding filters (contains through missing) for REMOVE (0.1.41)
- adding support for tag groups (values, fields) (0.1.4)
- Adding option to provide function to remove (must return boolean) (0.1.38)
- removing matplotlib version requirement (0.1.37)
- Matplotlib dependency >= 2.1.2 (0.1.36)
- Adding black formatting, tests run in GitHub actions (0.1.35)
- Adding option to recursively replace sequences (0.1.34)
- adding pylint to clean up code (0.1.33)
- removing dependency that isn't used (simplejson) (0.1.31)
- updating cleaner to use pixel array (0.1.30)
- validation function should use debug verbository, bumping license year [#92](https://github.com/pydicom/deid/issues/92) (0.1.29)
- bumping pydicom to 1.2.0 [#91](https://github.com/pydicom/deid/issues/91) (0.1.28)
- header expansions starts with and ends with should support regular expression OR (|) [#89](https://github.com/pydicom/deid/issues/89) (0.1.27)
- added all, allexcept, contains to field filters [#87](https://github.com/pydicom/deid/pull/87) (0.1.26)
- fixing bug in jitter timestamp function [#85](https://github.com/pydicom/deid/pull/85) (0.1.25)
- installation order of pydicom / matplotlib changes default python [#81](https://www.github.com/pydicom/deid/issues/81) (0.1.23)
- updating deid.dicom with contribution from @fimafurman [#63](https://github.com/pydicom/deid/issues/63) (0.1.22)
- adding "func" option for recipe to pass function (0.1.21)
- fixing client bug, redoing docs to be better organized (0.1.20)
- Removing MediaStorageSOPInstanceUID from file_meta, issue #72 (0.1.19)
- need to clean up temporary directory (mkdtemp), issue #68 (0.1.18)
- fixing issue #65, save for compressed data (0.1.17)
- matplotlib must be less than or equal to 2.1.2 for install (0.1.16)
- fixing bug with clean coordinate flipping rectangle
- Fixing bug with saving self.cleaned (0.1.15)
- Allowing for datasets to be passed in functions (not necessary for files) (0.1.14)
- index should be full path in header.py (0.1.13)
- pydicom bumped to install latest (1.0.2) (0.1.12)
- ensuring that ids for images are full paths (0.1.11)
- addition of the DeidRecipe class to better interact with and combine deid recipe files.
- the get_files function now returns a generator instead of a list.

- Add `ctpcoordinates` and `ctpkeepcoordinates` to handle different formats (0.3.0)
- Minimum Python required is 3.7, numpy 1.20
- Remove unecessary typing - adds bugs (0.2.37)
- Provide data as an external package (0.2.36)
- Restore expand_sequences to get_identifiers (0.2.35)
- Add function to clean datasets without `DicomCleaner` [#223](https://github.com/pydicom/deid/pull/223) (0.2.34)
- add select:vr:XX field expander to select elements by VR (0.2.33)
- rename group:XXXX field expander to select:group:XXXX
- add group:XXXX field expander to select all elements with a specified DICOM tag group (0.2.32)
- custom class example for using dicom.Dataset, not requiring on client init [#211](https://github.com/pydicom/deid/pull/211) (0.2.31)
- adding support for deid provided functions [#207](https://github.com/pydicom/deid/issues/207) (0.2.3)
- update CTP deid.dicom up until [this commit](https://github.com/johnperry/CTP/commit/345b05b157c046532e8791a63ababbf6d0dba59b) (0.2.29)
- various LGTM alert fixes [#186](https://github.com/pydicom/deid/pull/186) (0.0.28)
- `REPLACE/JITTER` actions have now higher priority than `REMOVE`, allowing to whitelist fields from `REMOVE ALL/Field` [#197](https://github.com/pydicom/deid/issues/197)
- `ADD/KEEP` actions have now higher priority than `REMOVE`, allowing to whitelist fields from `REMOVE ALL/Field` [#197](https://github.com/pydicom/deid/issues/197)
- updated pydicom dependency from 2.1.1 to 2.2.2 [#194](https://github.com/pydicom/deid/issues/194)
- bug fix for exception when attempting to jitter DA/DT which cannot be jittered (space) [#189] (<https://github.com/pydicom/deid/issues/189>) (0.2.27)
- adding support to manipulate file meta [#183](https://github.com/pydicom/deid/issues/183) (0.2.26)
- updated pydicom dependency from 1.3.0 to 2.1.1 [#171](https://github.com/pydicom/deid/issues/171) (0.2.25)
- bug fix for multivalued fields in %values lists [#174](https://github.com/pydicom/deid/issues/174)
- allowing other VR types for jitter [#175](https://github.com/pydicom/deid/issues/175)
- ensuring that an add/replace of an existing value is also updated in fields [#173](https://github.com/pydicom/deid/issues/173)
- change to correct issue with deidentifying RGB images [#165](https://github.com/pydicom/deid/issues/165) (0.2.24)
- removing verbosity of debug logger (0.2.23)
- changing iteration technique through fields to properly add nested uids [#153](https://github.com/pydicom/deid/issues/153) (0.2.22)
- change to return results from detect when recipe does not contain filters [#155](https://github.com/pydicom/deid/issues/155)
- fix to correct bug in detect [#142](https://github.com/pydicom/deid/issues/142) (0.2.21)
- fixes to detect and clean to better represent keep/coordinates (0.2.20)
- modify default VR for added tags [#146](https://github.com/pydicom/deid/issues/146), bug with private tags in %fields section [#147](https://github.com/pydicom/deid/issues/147) (0.2.19)
- adding keepcoordinates to allow for reverse specified coordinates (0.2.18)
- bug with empty values within %values sections [#144](https://github.com/pydicom/deid/issues/144) (0.2.17)
- ability to save 4d dicom series as animation (0.2.16)
- bug with multiple actions on the same header field [#140](https://github.com/pydicom/deid/issues/140) (0.2.15)
- nested sequence in replace_identifiers/strip_sequences bug [#137](https://github.com/pydicom/deid/issues/137) (0.2.14)
- adding custom cleaner to specify region fields (0.2.13)
- bug with replace for tested fields (0.2.12)
- fixing func args to be kwargs (0.2.11)
- DicomCleaner empty method does not handle list (0.2.1)
- refactor to use iterall for all fields (0.2.0)
- including private tags in parsing (0.1.42)
- adding filters (contains through missing) for REMOVE (0.1.41)
- adding support for tag groups (values, fields) (0.1.4)
- Adding option to provide function to remove (must return boolean) (0.1.38)
- removing matplotlib version requirement (0.1.37)
- Matplotlib dependency >= 2.1.2 (0.1.36)
- Adding black formatting, tests run in GitHub actions (0.1.35)
- Adding option to recursively replace sequences (0.1.34)
- adding pylint to clean up code (0.1.33)
- removing dependency that isn't used (simplejson) (0.1.31)
- updating cleaner to use pixel array (0.1.30)
- validation function should use debug verbository, bumping license year [#92](https://github.com/pydicom/deid/issues/92) (0.1.29)
- bumping pydicom to 1.2.0 [#91](https://github.com/pydicom/deid/issues/91) (0.1.28)
- header expansions starts with and ends with should support regular expression OR (|) [#89](https://github.com/pydicom/deid/issues/89) (0.1.27)
- added all, allexcept, contains to field filters [#87](https://github.com/pydicom/deid/pull/87) (0.1.26)
- fixing bug in jitter timestamp function [#85](https://github.com/pydicom/deid/pull/85) (0.1.25)
- installation order of pydicom / matplotlib changes default python [#81](https://www.github.com/pydicom/deid/issues/81) (0.1.23)
- updating deid.dicom with contribution from @fimafurman [#63](https://github.com/pydicom/deid/issues/63) (0.1.22)
- adding "func" option for recipe to pass function (0.1.21)
- fixing client bug, redoing docs to be better organized (0.1.20)
- Removing MediaStorageSOPInstanceUID from file_meta, issue #72 (0.1.19)
- need to clean up temporary directory (mkdtemp), issue #68 (0.1.18)
- fixing issue #65, save for compressed data (0.1.17)
- matplotlib must be less than or equal to 2.1.2 for install (0.1.16)
- fixing bug with clean coordinate flipping rectangle
- Fixing bug with saving self.cleaned (0.1.15)
- Allowing for datasets to be passed in functions (not necessary for files) (0.1.14)
- index should be full path in header.py (0.1.13)
- pydicom bumped to install latest (1.0.2) (0.1.12)
- ensuring that ids for images are full paths (0.1.11)
- addition of the DeidRecipe class to better interact with and combine deid recipe files.
- the get_files function now returns a generator instead of a list.

## [0.1.1](https://pypi.python.org/packages/28/26/ee80e7f1c3f65fae1c901497bb2388701158f0c96e0d633ab301abeaa478/deid-0.1.1.tar.gz#md5=39df7efb03e5d3b63308016742062a43) (0.1.1)

**additions**
- addition of this CHANGELOG and an AUTHORS and CONTRIBUTING file to properly open source the project.

- addition of this CHANGELOG and an AUTHORS and CONTRIBUTING file to properly open source the project.
**bug fix**
- when the user specifies a deid recipe, instead of adding it to a base template we honor the choice and don't append a base.
- when the user specifies a deid recipe, instead of adding it to a base template we honor the choice and don't append a base.
**creation**
- this is the initial creation of deid, including recipes for cleaning of image headers and flagging of potential phi in pixels.
- this is the initial creation of deid, including recipes for cleaning of image headers and flagging of potential phi in pixels.
10 changes: 6 additions & 4 deletions deid/config/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -258,7 +258,7 @@ def parse_filter_group(spec):
return members


def derive_ctp_coordinate(raw):
def _derive_ctp_coordinate(raw):
"""
Derive a ctp coordinate from a raw (comma separated) string.
Expand All @@ -284,9 +284,11 @@ def derive_ctp_coordinate(raw):
if len(new_coordinate) != 4:
bot.exit("Coordinates are expected to have length of 4, found %s" % raw)
xmin, ymin, width, height = new_coordinate
new_coordinate[2] = xmin + width
new_coordinate[3] = ymin + height

# Translate CTP coordinate to the convention we use
return [xmin, ymin, xmin + width, ymin + height]
return ",".join([str(i) for i in new_coordinate])


def parse_label(section, config, section_name, members, label=None):
Expand All @@ -311,14 +313,14 @@ def parse_label(section, config, section_name, members, label=None):
member = members.pop(0).strip()

if member.lower().startswith("ctpcoordinates"):
coordinate = derive_ctp_coordinate(
coordinate = _derive_ctp_coordinate(
member.replace("ctpcoordinates", "").strip()
)
criteria["coordinates"].append([0, coordinate])
continue

elif member.lower().startswith("ctpkeepcoordinates"):
coordinate = derive_ctp_coordinate(
coordinate = _derive_ctp_coordinate(
member.replace("ctpkeepcoordinates", "").strip()
)
criteria["coordinates"].append([1, coordinate])
Expand Down
52 changes: 46 additions & 6 deletions deid/dicom/parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,11 @@ class DicomParser:
file. For each, we store the element and child elements
"""

"""
list of fields to exclude from remove because they are replaced and jittered
"""
_excluded_fields = None

def __init__(
self, dicom_file, recipe=None, config=None, force=True, disable_skip=False
):
Expand Down Expand Up @@ -280,6 +285,41 @@ def skip(self):
skips = self.config.get("get", {}).get("skip", {})
return skips

@property
def keep(self):
"""
Return a list of fields to keep original, as defined by all KEEP actions in recipe
Those fields are not impacted by REPLACE/JITTER actions
"""
keeps = []
if self.recipe.deid is not None:
keeps = [
action.get("field")
for action in self.recipe.get_actions(action="KEEP")
if action and action.get("field")
]
return keeps

@property
def excluded_from_deletion(self):
"""
Return once-evaluated list of fields that are not removed by REMOVE ALL or REMOVE SomeField,
as they later have to be changed by REPLACE / JITTER
That allows whitelisting fields from REMOVE ALL/SomeField to change them if needed (i.e. obfuscation)
"""
if self._excluded_fields is None:
self._excluded_fields = []
if self.recipe.deid is not None:
self._excluded_fields = [
action.get("field")
for action in (
self.recipe.get_actions(action="JITTER")
+ self.recipe.get_actions(action="REPLACE")
)
if action and action.get("field")
]
return self._excluded_fields

def get_fields(self, expand_sequences=True):
"""expand all dicom fields into a list, where each entry is
a DicomField. If we find a sequence, we unwrap it and
Expand All @@ -290,7 +330,7 @@ def get_fields(self, expand_sequences=True):
dicom=self.dicom,
expand_sequences=expand_sequences,
seen=self.seen,
skip=self.skip,
skip=self.skip + self.keep,
)
return self.fields

Expand Down Expand Up @@ -347,12 +387,12 @@ def perform_action(self, field, value, action, filemeta=False):
Parameters
==========
fields: if provided, a filtered list of fields for expand
field: a field for expand
value: field value
action: the action from the parsed deid to take
"field" (eg, PatientID) the header field to process
"action" (eg, REPLACE) what to do with the field
"value": if needed, the field from the response to replace with
filemeta (bool) perform on filemeta
"""
# Validate the action
if action not in valid_actions:
Expand All @@ -364,7 +404,7 @@ def perform_action(self, field, value, action, filemeta=False):
values = self.lookup.get(re.sub("^values:", "", field), [])
fields = self.find_by_values(values=values)

# A fields list is used vertbatim
# A fields list is used verbatim
# In expand_field_expression below, the stripped_tag is being passed in to field. At this point,
# expanders for %fields lists have already been processed and each of the contenders is an
# identified, unique field. It is important to use stripped_tag at this point instead of
Expand Down Expand Up @@ -509,7 +549,7 @@ def _run_action(self, field, action, value=None):

# If a value is defined, parse it (could be filter)
do_removal = True
if value != None:
if value is not None:
do_removal = parse_value(
item=self.lookup,
dicom=self.dicom,
Expand All @@ -518,7 +558,7 @@ def _run_action(self, field, action, value=None):
funcs=self.deid_funcs,
)

if do_removal is True:
if do_removal is True and field.name not in self.excluded_from_deletion:
self.delete_field(field)

def remove_private(self):
Expand Down
Loading

0 comments on commit 5b458d1

Please sign in to comment.