## This project has been archived as I finished my PhD! It won't be maintained anymore
This package offers a tool to convert your raw (f)MRI data to the "Brain Imaging Data Structuce" (BIDS) format. Using only a simple (json or yaml) config-file, it renames, reformats, and restructures your files such that it fits the BIDS naming scheme and conforms to file-formats specified by BIDS. After using bidsify
, you can run your data through BIDS-compatible analysis/preprocessing pipelines such as fmriprep
and mriqc package.
Currently, we use bidsify
at the Spinoza Centre for Neuroimaging (location REC) to convert data to BIDS after each scan-session. We automated this process, including automatic preprocessing and quality control, using another package, nitools (which essentially "glues together" bidsify
, fmriprep
, and mriqc
This package was originally developed to handle MRI-data from Philips scanners, which are traditionally exported
in the "PAR/REC" format. Currently, bidsify
also supports Philips (enhanced) DICOM (DICOM
format) and Siemens DICOM (.dcm
extension), but the latter has not been fully tested yet!
is still very much in development, so there are probably still some bugs for data
that differs from our standard format (at the Spinoza Centre in Amsterdam) and the API might change
in the future. If you encounter any issues, please submit an issue or (better yet), submit a pull-request
with your proposed solution!
This package can be installed using pip
$ pip install bidsify
To get the "bleeding edge" version, you can install the master branch from github:
$ pip install git+https://github.com/spinoza-rec/bidsify.git@master
In terms of dependencies: bidsify
uses dcm2niix
under the hood to convert PAR/REC and DICOM files to nifti. Make sure you're using release v1.0.20181125 or newer.
Apart from dcm2niix
, bidsify
depends on the following Python packages:
- nibabel
- scipy
- numpy
- joblib (for parallelization)
- pandas
Moreover, if you want to use the defacing option (i.e., removing facial features from anatomical images), make sure you have FSL installed, as well as the pydeface Python package. Also, to enable validating the BIDS-conversion process,(i.e., running bidsify
with the -v
flag), make sure to install bids-validator.
Lastly, if you want to use the Docker interface (i.e., running bidsify
with the -D flag), which obviates the need for installing dcm2niix/FSL/bids-validator, make sure to install Docker and make sure your user account has permission to run Docker (see below).
The current version (master branch) allows you to run bidsify
from docker, so you don't
have to install all the (large) dependencies (FSL, pydeface, dcm2niix, bids-validator, etc.). To do so,
you need to do the following.
- Install Docker (if you haven't already) and make sure you have permission to run Docker;
- Pull the Docker image:
docker pull lukassnoek/bidsify:0.x.x
(fill in the latest version at the x.x); - Run bidsify with the -D flag (e.g.,
bidsify -c /home/user/config.yml -d /home/user/data -D
Now you can use bidsify
even without having FSL, dcm2niix, and other dependencies installed!
(You do need to install bidsify
itself though.)
After installing, the bidsify
command can be called as follows:
$ bidsify [-c config_file] [-d path_to_data_directory] [-o output_directory] [-v] [-D]
The -c
flag defaults to config.yml
in the current working directory.
The -d
flag defaults to the current working directory.
The -o
flag defaults to the parent-directory of the data-directory.
The -v
flag calls bids-validator after BIDS-conversion (optional).
The -D
flag runs bidsify
from Docker (recommended; see "Docker" section above).
For example, if you would call the following command ...
$ bidsify -c /home/user/data/config.yml -d /home/user/data
... your bidsified data will be in the following location:
/home/user ├── data | ├── config.yml | ├── s01 | └── s02 | └── bids ├── dataset_description.json ├── participants.tsv ├── sub-01 └── sub-02
This package aims to take in any MRI-dataset and convert it to BIDS using information from the
config-file provided by the user. Obviously, bidsify
cannot handle all types of scans/data,
but it can process most of the default scans/files we use at our MRI centre (Spinoza Centre), including
- Standard (gradient-echo) EPI scans, both multiband and sequential
- Standard (spin-echo) DWI scans
- "Pepolar" (gradient-echo) EPI scans (also called "topup")
- B0-based fieldmap scans (1 phase-difference + 1 magnitude image)
- T1-weighted and T2-weighted scans
can handle both PAR/REC and DICOM files. Moreover, in the future we want to enable processing of:
- Philips physiology-files ("SCANPHYSLOG" files; WIP, not functional yet)
In terms of "structure", this package allows the following "types" of datasets:
- Multi-subject, multi-session datasets
only needs a config-file in either the json or YAML format. This file should contain
information that can be used to rename and convert the raw files.
The config file contains a couple of sections, which are explained below (we'll use the YAML format).
The first (top-level) section (or "attribute" in JSON/YAML-lingo) in the file is the "options" section. An example of this section could be:
mri_ext: PAR # alternatives: DICOM, dcm, nifti
debug: False
n_cores: -1
subject_stem: sub
deface: True
spinoza_data: True
out_dir: bids
No options need to be set explicitly as they all have sensible defaults. The attribute-value pairs mean the following:
: filetype of MRI-scans (PAR, dcm, DICOM, nifti; default: PAR)n_cores
: how many CPUs to use during conversion (default: -1, all CPUs)debug
: whether to print extra output for debugging (default: False)subject_stem
: prefix for subject-directories, e.g. "subject" in "subject-001" (default: sub)deface
: whether to deface the data (default: True, takes substantially longer though)spinoza_data
: whether data is from the Spinoza centre (default: False)out_dir
: name of directory to save results to (default: bids), relative to project-root.
Note that with respect to DICOM files, the mri_type
can be set to DICOM
(referring to Philips [enhanced] DICOM files) or dcm
(referring to Siemens DICOM files with the extension .dcm
The BIDS-format specifies the naming and format of several types of MRI(-related) filetypes.
These filetypes have specific suffixes, which are appended to the filenames in the renaming
process handled by bidsify
. The "mappings" section in the config is meant to
tell bidsify
what filetype can be identified by which "key". Thus, the mappings
section consists of "filetype": "identifier" pairs. Basically, if BIDS requires a
specific suffix for a filetype, you need to specify that here. For example, a standard
dataset with several BOLD-fMRI files, a T1, and physiological recordings could have
a mappings section like this:
# ............. #
bold: _func
T1w: 3DT1
dwi: DWI
physio: ppuresp
events: log
phasediff: _ph
magnitude: _mag
epi: topup
T2w: T2w
Note that every file should belong to one, and only one, file-type! In other words, bidsify
should be able to figure out what kind of file it's dealing with from the filename. For example, if you have a file named my_mri_file.PAR
and you have configured the mappings as in the example above, bidsify
won't be able to figure out what file-type it's dealing with (a bold
file? A T1w
file?), because the filename does not contain any of the mappings (e.g., _func
, 3DT1
, or DWI
Moreover, the filename should not contain more than one file-type identifier! Suppose you have a file named workingmemory_func_ppuresp.nii.gz
; with the above mappings, bidsify
would conclude that it's either a bold
file (because the name contains _func
) OR a physio
file (because the name contains ppuresp
). As such, bidsify
is going to skip converting/renaming this file and move it to the unallocated directory. In summary: files should contain one, and only one, identifier (such as _func
) mapping to a particular file-type (e.g., bold
Also, check the BIDS-specification for all filetypes supported by the format.
At the same (hierarchical) level as the "mappings" and "options" sections, a section with the name "metadata" can be optionally specified. This attribute may contain an arbitrary amount of attribute-value pairs which will be appended to each JSON-metadata file during the conversion. These are thus "dataset-general" metadata parameters. For example, you could specify the data of conversion here, if you'd like:
# some options
# some mappings
MagneticFieldStrength: 3
ParallelAcquisitionTechnique: SENSE
InstitutionName: Spinoza Centre for Neuroimaging, location REC
After the options
, mappings
, and (optionally) the metadata
the specifications for the four general "BIDS-datatypes" - func
, anat
, dwi
, and fmap
are listed in separate sections.
Each section, like func
, can contain multiple sub-sections referring to different scans
for that datatype. For example, you could have two different functional runs
with each a different task ("workingmemory" and "nback"). In that case, the "func"
section could look like:
# some options
# some mappings
id: wmtask
task: workingmemory
id: nbacktask
task: nback
The exact naming of the "attributes" (here: wm-task
and nback-task
) of the sub-sections
do not matter, but the subsequent key-value pairs do matter. You always need to set the id
key, which is used to identify the files that belong to this particular task. Any key-value pair
besides the id
key-value pair are append to the renamed filename along the BIDS-format.
For example, suppose you have a raw file sub-001_wmtask.PAR
. With the above config-file, this file
will be renamed into sub-001_task-workingmemory_bold.nii.gz
As discussed, any key-value pair besides id
will be appended (in the format "key-value") to the
filename during the renaming-process. Imagine, for example, that you have only one task - "nback" - but
you acquired four runs of it per subject, of which the first two were acquired with a sequential acquisition protocol,
but the last two with a multiband protocol (e.g. if you'd want to do some methodological comparison).
The config-file should, in that case, look like:
# some options
# some mappings
id: nback1
task: nback
run: 1
acq: sequential
id: nback1
task: nback
run: 2
acq: sequential
id: nback3
task: nback
run: 3
acq: multiband
id: nback4
task: nback
run: 4
acq: multiband
will then create four files (assuming that they can be "found" using their corresponding ``id``s):
The same logic can be applied to the "dwi", "anat", and "fmap" sections. For example, if you would have two T1-weighted structural scans, the "anat" section could look like:
# some options
# some mappings
id: 3DT1_1
run: 1
id: 3DT1_2
run: 2
Importantly, any UNIX-style wildcard (e.g. *, ?, and [a,A,1-9]) can be used in the
values in these sections!
Lastly, apart from the different elements (such as nback-task1
in the previous example),
each datatype-section (func
, anat
, fmap
, and dwi
) also may include a
section, similar to the "toplevel" metadata
section. This field may
include key-value pairs that will be appended to each JSON-file within that
datatype. This is especially nice if you'd want to add metadata that is needed for
specific preprocessing/analysis pipelines that are based on the BIDS-format.
For example, the fmriprep package provides
preprocessing pipelines for BIDS-datasets, but sometimes need specific metadata.
For example, for each BOLD-fMRI file, it needs a field EffectiveEchoSpacing
in the
corresponding JSON-file, and for B0-files (one phasediff, one magnitude image) it needs
the fields EchoTime1
and EchoTime2
. To include those metadata fields in the
corresponding JSON-files, just include a metadata
field under the appropriate
datatype section. For example, to do so for the previous examples:
EffectiveEchoSpacing: 0.00365
PhaseEncodingDirection: "j"
id: nback
task: nback
EchoTime1: 0.003
EchoTime2: 0.008
id: B0
After installing this package, the bidsify
command should be available.
This command assumes a specific organization of your directory with raw data.
Below, I outlined the assumed structure for a simple dataset with one BOLD run and one T1-weighted scan across
two sessions:
/home/user/data/ ├── config.yml ├── sub-01 │ ├── ses-1 │ │ ├── boldrun1.PAR │ │ ├── boldrun1.REC │ │ ├── T1.PAR │ │ └── T1.REC │ └── ses-2 │ ├── boldrun1.PAR │ ├── boldrun1.REC │ ├── T1.PAR │ └── T1.REC └── sub-02 ├── ses-1 │ ├── boldrun1.PAR │ ├── boldrun1.REC │ ├── T1.PAR │ └── T1.REC └── ses-2 ├── boldrun1.PAR ├── boldrun1.REC ├── T1.PAR └── T1.REC
(If you have DICOM-files with the .dcm
extension, just replace the PAR/REC files with a single dcm file.)
So all raw files should be in a single directory, which can be the subject-directory or, optionally, a session-directory. Note: the session directory must be named "ses-<something>".