-
-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deidentifying pydicom.Dataset with custom class #211
Conversation
dont come from a file directly
This is the example that I have added: In this example we will create a custom class to deidentify a single instance of a OverviewWe will use four files for this example:
The
We can install them by running the following commands (requires conda create -n deid_example python=3.9
conda activate deid_example
cd my_deid_example
pip install -r requirements.txt The contents of {
"SpecificCharacterSet":{"vr":"CS","Value":["ISO_IR 100"]},
"ImageType":{"vr":"CS","Value":["DERIVED","PRIMARY"]},
"SOPClassUID":{"vr":"UI","Value":["1.2.840.10008.5.1.4.1.1.1.2"]},
"StudyDate":{"vr":"DA","Value":["20220627"]},
"SeriesDate":{"vr":"DA","Value":["20220627"]},
"AcquisitionDate":{"vr":"DA","Value":["20220627"]},
"ContentDate":{"vr":"DA","Value":["20220627"]},
"StudyTime":{"vr":"TM","Value":["080803"]},
"ContentTime":{"vr":"TM","Value":["080808.202000"]},
"PatientName":{"vr":"PN","Value":[{"Alphabetic":"Maria^Doe"}]},
"PatientID":{"vr":"LO","Value":["1234567890"]},
"PatientBirthDate":{"vr":"DA","Value":["19900606"]},
"Modality":{"vr":"CS","Value":["MG"]},
"PatientSex":{"vr":"CS","Value":["F"]},
"PatientAge":{"vr":"AS","Value":["032Y"]},
"StudyID":{"vr":"SH","Value":["mammogram87654"]}
} The recipeWe create a custom recipe
The custom deidentifier classfrom deid.config import DeidRecipe
from deid.dicom.parser import DicomParser
import pydicom
from Crypto.Hash import SHA512
from datetime import datetime
class DeidDataset:
"""This class allows to pseudonymize an instance of
pydicom.Dataset with our custom recipe and functions.
"""
def __init__(self, secret_salt: str, recipe_path: str):
"""New instance of our pseudonymizer class.
:param secret_salt: a random string that makes the
hashing harder to break.
:param recipe_path: path to our deid recipe.
"""
self.secret_salt = secret_salt
self.recipe = DeidRecipe(recipe_path)
def pseudonymize(self, dataset:pydicom.Dataset) -> pydicom.Dataset:
"""Pseudonymize a single dicom dataset
:param dataset: dataset that will be pseudonymized
:returns: pseudonymized dataset
"""
parser = DicomParser(dataset, self.recipe, from_file=False)
# register functions that are specified in the recipe
parser.define('replace_name', self.replace_name)
parser.define('hash_func', self.deid_hash_func)
parser.define('remove_day', self.remove_day)
# parse the dataset and apply the deidentification
parser.parse(strip_sequences=True, remove_private=True)
return parser.dicom
# All registered functions that are used in the recipe must
# receive the arguments: `item`, `value`, `field`, `dicom`
def deid_hash_func(self, item, value, field, dicom) -> str:
"""Performs self.hash to field.element.value"""
val = field.element.value
return self.hash(str(val))
@staticmethod
def remove_day(item, value, field, dicom) -> str:
"""Removes the day from a DT field in the deid framework"""
dt = datetime.strptime(field.element.value, '%Y%m%d')
return dt.strftime("%Y%m01")
@staticmethod
def replace_name(item, value, field, dicom) -> str:
"""Replace PatientName with PatientSex and coarse PatientAge"""
sex = dicom.get('PatientSex')
sex = {"F":'Female', "M": 'Male', 'O':'Other'}[sex]
age = DeidDataset.round_to_nearest(int(dicom.get('PatientAge')[:-1]), 5)
return f"{sex} {age:03d}Y {dicom.get('Modality')}"
# Helper methods for our registered ones
@staticmethod
def round_to_nearest(value, interval):
"""Rounds value to closest multiple of interval"""
return interval * round(value/interval)
def hash(self, msg: str) -> str:
"""
:param msg: message that we want to encrypt,
normally the PatientID or the StudyID.
:return: the encrypted message as hexdigest
(in characters from '0' to '9' and 'a' to 'f')
"""
assert type(msg) == str, f"value is not of type str, {type(msg)}"
h = SHA512.new(truncate="256")
bytes_str = bytes(f"{self.secret_salt}{msg}", "utf-8")
h.update(bytes_str)
return str(h.hexdigest())
# Load the pydicom Dataset
import json
# Unorthodox way of loading a pydicom.Dataset
# please see pydicom documentation for more information
# on how to load dicom files
with open('my_dicom_file.json') as f:
dataset_dict = json.load(f)
dataset = pydicom.Dataset.from_json(dataset_dict)
print('Dataset before pseudonymization')
print(dataset)
#create an instance of our class
deid_ds = DeidDataset("!2#4%6&7abc", 'my_deid_recipe.dicom')
#pseudonymize the dataset
print('\nDataset after pseudonymization')
pseudonymized = deid_ds.pseudonymize(dataset)
print(pseudonymized) If we execute our python module python my_module.py It will give us the following output:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm really excited about this example - it's beautifully written! I added some comments for discussion on the implementation, let me know your thoughts!
deid/dicom/parser.py
Outdated
@@ -81,6 +91,8 @@ def __init__( | |||
# Deid can be a recipe or filename | |||
if not isinstance(recipe, DeidRecipe): | |||
recipe = DeidRecipe(recipe) | |||
|
|||
self.from_file = from_file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason this from_file should be saved with the class? E.g., isn't it just used for the load, and if so, would it make sense to pass there?
Also, is there a better way to determine this? E.g., if it's not a file that exists could we not just check for the type of the variable and not require the extra param?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason to pass it in the constructor is that init runs self.load.
You are right that a more intelligent way of doing it is checking if the variable exists and if it does not, just skip the necessary lines. I will modify the PR 😄
Hi @vsoch, |
deid/dicom/fields.py
Outdated
# Retrieve both dicom and file meta fields | ||
datasets = [dicom, dicom.file_meta] | ||
# Retrieve both dicom and file meta fields if dicom came from a file | ||
datasets = [d for d in [dicom, dicom.get('file_meta', None)] if d is not None] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a get defaults to None when it isn't there so you don't need to explicitly define it.
datasets = [d for d in [dicom, dicom.get('file_meta', None)] if d is not None] | |
datasets = [d for d in [dicom, dicom.get('file_meta')] if d is not None] |
I think you can also just do (but I'd test first)
datasets = [d for d in [dicom, dicom.get('file_meta', None)] if d is not None] | |
datasets = [d for d in [dicom, dicom.get('file_meta')] if d] |
deid/dicom/parser.py
Outdated
@@ -301,7 +311,7 @@ def get_fields(self, expand_sequences=True): | |||
dicom=self.dicom, | |||
expand_sequences=expand_sequences, | |||
seen=self.seen, | |||
skip=self.skip, | |||
skip=self.skip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you run a recent version of black on this? Usually black adds a trailing comma.
deid/dicom/parser.py
Outdated
self, dicom_file, recipe=None, config=None, force=True, disable_skip=False | ||
): | ||
self, dicom_file, recipe=None, config=None, force=True, disable_skip=False): | ||
"""Create new instance of DicomParser |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""Create new instance of DicomParser | |
""" | |
Create new instance of DicomParser |
Beautiful docstring! My preference is for a newline after the """
for the first sentence.
deid/dicom/parser.py
Outdated
:param config: deid config, defaults to None | ||
:param force: ignore errors when reading a dicom file, defaults to True | ||
:param disable_skip: _description_, defaults to False | ||
:param from_file: the dicom_file comes from an actual file, defaults to True. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was removed?
:param from_file: the dicom_file comes from an actual file, defaults to True. |
deid/dicom/parser.py
Outdated
# If we must read the file, the path must exist | ||
if not os.path.exists(dicom_file): | ||
bot.exit("%s does not exist." % dicom_file) | ||
self.dicom = read_file(dicom_file, force=force) | ||
|
||
# Set class variables that might be helpful later | ||
self.dicom_file = os.path.abspath(self.dicom.filename) | ||
self.dicom_name = os.path.basename(self.dicom_file) | ||
if self.dicom.get('filename', None) is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if self.dicom.get('filename', None) is not None: | |
if self.dicom.get('filename'): |
@@ -0,0 +1,254 @@ | |||
--- | |||
title: Deidentify a Pydicom Dataset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm really excited to add this! Note that we already have up to "5" in the examples folder, so either you can adjust the others order (and keep 3 here) or this should be 6. https://github.com/pydicom/deid/tree/master/docs/_docs/examples
updated the docstring remove extra "None"s
More updates :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are close! One more bug I just realized.
deid/dicom/parser.py
Outdated
# If we must read the file, the path must exist | ||
if not os.path.exists(dicom_file): | ||
bot.exit("%s does not exist." % dicom_file) | ||
self.dicom = read_file(dicom_file, force=force) | ||
|
||
# Set class variables that might be helpful later | ||
self.dicom_file = os.path.abspath(self.dicom.filename) | ||
self.dicom_name = os.path.basename(self.dicom_file) | ||
if self.dicom.get("filename", None) is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to possibly error later if there isn't a file name (and the user expects the attribute to be set). How about instead:
df = self.dicom.get("filename")
self.dicom_file = None if not df else os.path.abspath(df)
self.dicom_name = None if not df else os.path.basename(self.dicom_file)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed
This is ready for merge! @fcossio I'm in the middle of a work day, but I'll merge and then prepare the release tonight, and will share with the community. Thank you for adding this! |
Excited to hear and looking forward to it! And thanks for the detailed feedback and all the patience |
Thank you again! https://twitter.com/vsoch/status/1563307727811596289 |
Description
Modified DicomParser to be able to load datasets that don't come from a file (that are loaded as json) by adding a
from_file
argument to the constructor. The default value forfrom_file
isTrue
so that the interface remains unaffected.Added an example of how to create a custom class with a custom recipe and custom functions.
Related issues: #210
Please include a summary of the change(s) and if relevant, any related issues above.
Checklist
Open questions
Questions that require more discussion or to be addressed in future development: