This repositry contains the preliminary specification for the Imaging Pipeline Toolkit (IPTK). A reference implementation in Python is available at iptk/iptk-py. This document is a work in progress and issues and pull requests against the spec are welcome.
At its core, IPTK is just a structured way of saving arbitrary data in a folder structure on disk. While we provide additional tools for web-based access, database indices, the truth is always stored within this simple structure, all other applications are required to eventually sync with the file system.
IPTK defines three core entities, identifiers, datasets, and metadata. Each dataset and each type of metadata (called a metadata specification) have a unique identifier. Each dataset can be associated with no more than one set of metadata of each type.
A valid identifier is a lowercase hexadecimal string of exactly 40 character, i.e. a string matching the regular expression [0-9a-z]{40}
. There are no strict rules on how to generate identifiers for new datasets. Taking the SHA1 hash of some random data works just fine. Throughout this document we will use <id> as a shorthand for a valid identifier.
An IPTK dataset is a folder containing arbitrary data stored in a defined format. The name of the dataset folder itself must be its identifier and it must contain the two subfolders data/ and meta/. It may additionally contain a folder called lock/.
The data/ folder contains the raw data of the dataset. It may contain arbitrary subfolders and files. E.g. for a DICOM dataset this may consist of all images of one study instance.
The meta/ folder may only contain metadata as specified below. It may not contain any subfolders and all files within must be named <id>.json, where the identifier is that of a valid metadata specification.
The lock/ directory indicates the state of the dataset. If it is present, IPTK-compatible programs will assume the contents of the dataset to be immutable. You may not edit or amend the contents of the data/ folder if lock/ exists.
It is not allowed to delete the lock/ folder, i.e. a locked dataset cannot be unlocked again. To edit a locked dataset, create a new empty dataset, copy the contents over, then edit the new dataset.
The metadata of locked datasets may still be edited without any restrictions.
See below for an example of a valid IPTK dataset containing some DICOM images and two sets of associated metadata. The
92024b2371150d11001491646e2c18390e702255
├── data
│ ├── slice_001.dcm
│ ├── slice_002.dcm
│ └── slice_003.dcm
├── meta
│ ├── 21926de7e59ea3818fa340090a295191b88c6b2b.json
│ └── 64e246bc37f9aa94ac13b848149ed750a4a689c6.json
└─── lock
Metadata within IPTK consists of JSON-serialized key-value-pairs stored within the meta/ subfolder of each dataset. Each metadata set is uniquely defined by its specification and its dataset.
A metadata specification reserves an identifier for a specific kind of metadata. While you can simply generate a random identifier and use that for all your metadata of a specific kind, all users are encouraged to share their specifications at iptk/metadata-specs.
At a minimum, a metadata specification should contain the reserved identifier, a short name and some description of the kind of data that is stored under this specification. Additionally, a contact, url, and organization may be specified, where further information about the metadata can be obtained. The name and description may be used in IPTK-based user interfaces.
Different, otherwise identical, versions of the same metadata specification can exist, if their identifiers are different.
// Simple specification to reserve an identifier
{
"name": "DICOM Tags",
"description": "Values extracted from frequently used DICOM header fields.",
"identifier": "52c1bba9c08888c2e530166b8bd1d62db76f89cc",
"contact": "Jan-Gerd Tenberge <[email protected]>",
"organization": "University of Münster",
"url": "https://example.com"
}
// Minimal specification
{
"name": "Tags",
"description": "User-defined tags are stored here.",
"identifier": "2bc88bb1cbe97e9fa747ea54635888983de942d6"
}
The following rules apply to metadata objects:
- Each key must be a string.
- Each value may only be a string, a boolean, a number, an array thereof, or null.
- If the value is an array, all items must be of the same type.
- A value cannot contain nested arrays or objects.
Date and time values should be saved as strings in the ISO 8601 format.
// A valid set of metadata
{
"patientName": "John Doe",
"patientAge": 25,
"patientWeight": 70.23,
"parentNames": ["Jane Doe", "James Doe"],
"dateOfBirth": "1992-10-04"
}
// Valid, but strongly discouraged (ambiguous date format)
{
"dateOfBirth": "5/6/92"
}
// Invalid (value is an object)
{
"patientDetails": {
"age": 25,
"name": "John Doe"
}
}
// Invalid (nested array)
{
"freeIntervals": [
[1999, 2001],
[2004, 2017]
]
}