Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error reading some .nii.gz files #52

Closed
chrisgorgo opened this issue Nov 20, 2015 · 14 comments
Closed

Error reading some .nii.gz files #52

chrisgorgo opened this issue Nov 20, 2015 · 14 comments
Labels

Comments

@chrisgorgo
Copy link
Contributor

I'm getting errors reading this file https://drive.google.com/folderview?id=0B2JWN60ZLkgkcndhTWpLYmswRDA&usp=sharing&tid=0B2JWN60ZLkgkN0dMTVQ1QU1IUEk (actually all nifti files from this dataset - ds109):

On commandline:

orange:bids-validator filo$ node nameOfFile.js /Volumes/Samsung_T1/bids_examples/symlinked/ds109/sub-01/anat/sub-01_T1w.nii.gz
{ error: 'Unable to read /Volumes/Samsung_T1/bids_examples/symlinked/ds109/sub-01/anat/sub-01_T1w.nii.gz' }

With the online version when I try to validate full version of ds109 I get an JS error:

r   @   app.min.js:24606
i.onloadend @   app.min.js:30419

(which is also concerning - why would I get a different error on command line and online?)

BTW the file is not corrupted:

orange:bids-validator filo$ fslinfo /Volumes/Samsung_T1/bids_examples/symlinked/ds109/sub-01/anat/sub-01_T1w.nii.gz
data_type      INT16
dim1           144
dim2           192
dim3           192
dim4           1
datatype       4
pixdim1        1.200000
pixdim2        1.197917
pixdim3        1.197917
pixdim4        2.200000
cal_max        0.0000
cal_min        0.0000
file_type      NIFTI-1+
@chrisgorgo chrisgorgo added the bug label Nov 20, 2015
@constellates
Copy link
Collaborator

Did you send me the wrong file? If I download the linked file it's a zero byte file?

@constellates
Copy link
Collaborator

The online error also looks like the error from #50 that was caused by not handling errors while reading empty files. If you're still seeing that online you may want to try clearing your cache. And if that fixes it we probably want to implement cache busting during every change.

@chrisgorgo
Copy link
Contributor Author

@constellates
Copy link
Collaborator

Thanks. It looks like some files are not making it through gzip decompression. I'm looking into it.

@constellates constellates mentioned this issue Nov 23, 2015
@constellates
Copy link
Collaborator

I'm having difficulty reproducing this identically. Both of the above files are working for me if I add them to a valid dataset. I tried validating my local version of ds109 and got the decompressing error for one file. I then tested decompressing in OSX by double clicking it and it decompressed to a .cpgz file which then decompresses to a .gz and loops like that forever. That file will still read in mango viewer. I added the above pull request so that the errors will at least match in the browser and CLI, but I'm continuing to look into this issue.

@constellates
Copy link
Collaborator

I've figured out what is happening with my data and I'm hoping it's the same with yours. I get this issue when a scan ends in the .gz extension but is not actually gzipped. A good test is the unix file command. You can try it by running file /path/to/file and it will print the file name followed by some information. An actually compressed file is show "gzip compressed data" and a non compressed file is showing "data" regardless of the file extension.

When this issue occurs I can either detect it and silently read the file while skipping the compression step or I can throw an error or warning with a message that the file ends in .gz but does not appear to be gzipped. I'm guessing we want to throw this an an error because it could cause analysis issues down the road, but let me know if you prefer something else. I'll add these changes to the currently open pull request "unzip fail".

@chrisgorgo
Copy link
Contributor Author

I cannot confirm your theory that those files are not compressed:

orange:bids-validator filo$ file ~/drive/tmp/sub-01_T1w.nii.gz
/Users/filo/drive/tmp/sub-01_T1w.nii.gz: gzip compressed data, from Unix

orange:bids-validator filo$ file ~/drive/tmp/highres001.nii.gz
/Users/filo/drive/tmp/highres001.nii.gz: gzip compressed data, from Unix

orange:bids-validator filo$ file ~/drive/tmp/sub-01_task-flankertask_run-01_bold.nii.gz
/Users/filo/drive/tmp/sub-01_task-flankertask_run-01_bold.nii.gz: gzip compressed data, was "bold.nii", last modified: Sat Nov 21 20:00:13 2015, max compression

orange:bids-validator filo$ gunzip ~/drive/tmp/sub-01_T1w.nii.gz

orange:bids-validator filo$ ls -al ~/drive/tmp/sub-01_T1w.nii*
-rwxr-xr-x  1 filo  staff  23071168 Nov 23 09:20 /Users/filo/drive/tmp/sub-01_T1w.nii

orange:bids-validator filo$ gunzip  ~/drive/tmp/sub-01_task-flankertask_run-01_bold.nii.gz

orange:bids-validator filo$ ls -al  ~/drive/tmp/sub-01_task-flankertask_run-01_bold.ni*
-rwxr-xr-x  1 filo  staff  47841632 Nov 23 09:20 /Users/filo/drive/tmp/sub-01_task-flankertask_run-01_bold.nii

orange:bids-validator filo$ gunzip ~/drive/tmp/highres001.nii.gz

orange:bids-validator filo$ ls -al  ~/drive/tmp/highres001.nii
-rwxr-xr-x  1 filo  staff  10617184 Nov 20 15:10 /Users/filo/drive/tmp/highres001.nii

@constellates
Copy link
Collaborator

Yea that's definitely not what I was experiencing with different files.

file ds109/sub-01/func/sub-01_task-theoryofmindwithmanualresponse_run-01_bold.nii.gz
ds109/sub-01/func/sub-01_task-theoryofmindwithmanualresponse_run-01_bold.nii.gz: data

I'm somewhat stumped at this point. The above file from ds109 has the .gz without compression issue for me. Other than that there are no file reading issues for me in that dataset. And the two files you linked above work without file reading issues if I add them to a dataset. Is it possible the linked files differ from the versions you have locally or that our ds109 datasets have diverged?

I did go ahead and update the unzip-fail branch to catch non-zipped scans. It looks like you're not experiencing that issue, but it may be worth trying that branch as it handles unzipping errors more granularly.

@chrisgorgo
Copy link
Contributor Author

I am still experiencing problems with both master and unzip-fail branches:

orange:bids-validator filo$ git checkout master
Already on 'master'

orange:bids-validator filo$ node nameOfFile.js /Volumes/Samsung_T1/bids_examples/symlinked/ds102/sub-01/func/sub-01_task-flankertask_run-01_bold.nii.gz
{ error: 'Unable to read /Volumes/Samsung_T1/bids_examples/symlinked/ds102/sub-01/func/sub-01_task-flankertask_run-01_bold.nii.gz' }

orange:bids-validator filo$ git checkout unzip-fail
Switched to branch 'unzip-fail'
Your branch is up-to-date with 'upstream/unzip-fail'.

orange:bids-validator filo$ node nameOfFile.js /Volumes/Samsung_T1/bids_examples/symlinked/ds102/sub-01/func/sub-01_task-flankertask_run-01_bold.nii.gz
dim[0] is out-of-range, we'll simply try continuing to read the file, but this will most likely fail horribly.
{ error:
   { code: 26,
     file: { path: '/Volumes/Samsung_T1/bids_examples/symlinked/ds102/sub-01/func/sub-01_task-flankertask_run-01_bold.nii.gz' },
     evidence: null,
     line: null,
     character: null,
     severity: 'error',
     reason: 'We were unable to read the contents of this file.' } }

orange:bids-validator filo$ file /Volumes/Samsung_T1/bids_examples/symlinked/ds102/sub-01/func/sub-01_task-flankertask_run-01_bold.nii.gz
/Volumes/Samsung_T1/bids_examples/symlinked/ds102/sub-01/func/sub-01_task-flankertask_run-01_bold.nii.gz: gzip compressed data, was "bold.nii", last modified: Sat Nov 21 20:00:13 2015, max compression

Here's the file in question: https://docs.google.com/uc?id=0B77zr9yIiKOTRDJDTTR2QkJfMTA&export=download

@constellates
Copy link
Collaborator

I downloaded that file and added it to a known good dataset and it validated without the file reading error in the currently deployed web demo and the master branch of the CLI. Is it possible this issue is caused by sym-linking? I noticed "symlinked" in your file path. How are you accessing these files?

And are you using an older file reading testing script "nameOfFile.js"? It's possible that it's doing something the actual validator isn't.


I also added a log for the header values of that file.

if (file.name == 'sub-01_task-flankertask_run-01_bold.nii.gz') {
    console.log(nifti.parseNIfTIHeader(unzipped));
}

And got the following values.

{
    littleEndian: true,
    sizeof_hdr: 348,
    dim_info: 0,
    dim: [ 4, 64, 64, 40, 146 ],
    intent_p1: 0,
    intent_p2: 0,
    intent_p3: 0,
    intent_code: 0,
    datatype: 'int16',
    bitpix: 16,
    slice_start: 0,
    pixdim: [ 1, 3, 3, 4, 2 ],
    vox_offset: 352,
    scl_slope: 1,
    scl_inter: 0,
    slice_end: 0,
    slice_code: 0,
    xyzt_units: [ 'mm', 'mm', 'mm', 's' ],
    cal_max: 0,
    cal_min: 0,
    slice_duration: 0,
    toffset: 0,
    descrip: 'FSL4.0\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000',
    aux_file: '\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000',
    qform_code: 1,
    sform_code: 1,
    quatern_b: 0,
    quatern_c: 0,
    quatern_d: 0,
    qoffset_x: -95.22289276123047,
    qoffset_y: -90.16265106201172,
    qoffset_z: -78,
    srow:
    {
        '0': 3,
        '1': 0,
        '2': 0,
        '3': -95.22289276123047,
        '4': 0,
        '5': 3,
        '6': 0,
        '7': -90.16265106201172,
        '8': 0,
        '9': 0,
        '10': 4,
        '11': -78
    },
    intent_name: '\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000',
    magic: 'n+1\u0000',
    extension: [ 0, 0, 0, 0 ]
}

@constellates
Copy link
Collaborator

I just tested with a symlinked directory without any issues. The symlink and the actual files were on the same hard-drive so I suppose there could be an issue in a more complicated symlink situation.

@chrisgorgo
Copy link
Contributor Author

Found it! It breaks with node 5.1! Works for 4.2.2. Could you try to replicate?

@constellates
Copy link
Collaborator

Nice work! I was able to reproduce this. Looks like zlib (node's default compression library) has changed behavior somewhere along the way. Still researching what exactly is happening.

@constellates
Copy link
Collaborator

So this is still proving somewhat complicated. It looks like we were unknowingly taking advantage of a bug. There was previously a bug in zlib for node that would not throw errors for truncated inputs (partial files) and they patched that in v5.0.0.

https://nodejs.org/en/blog/release/v5.0.0/
nodejs/node#2595

I've added a fix to the unzip fail branch to circumvent this by handling the error with streams but as of right now it's not a great/robust solution. I still need to ensure it's handling cases where files are not actually zipped and ensure the first stream chunk is at least 500 bytes. I've also reached out to stackoverflow and node's github issues to see if there are known solutions to working with partial files in node zlib.

I'll keep you posted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants