Skip to content

Conversation

@mkitti
Copy link
Contributor

@mkitti mkitti commented Dec 14, 2023

Add Zstandard dependency

		<dependency>
			<groupId>org.janelia</groupId>
			<artifactId>n5-zstandard</artifactId>
			<version>1.0.1</version>
		</dependency>

@bogovicj
Copy link
Collaborator

Turns out the unit tests are not quite thorough enough :-/

@mkitti could you please merge https://github.com/saalfeldlab/n5-zarr/commits/zstandard/ into your branch?

I'm also running into an issue when using zarr-python to read data written by n5-zarr

example

Write the data:

final String root = "...";
final N5Writer zarr = new N5Factory().openWriter( root );
final String dset = "simple-zst";
ArrayImg<UnsignedByteType, ByteArray> img = ArrayImgs.unsignedBytes(new byte[]{0,1,2,3,4,5,6,7,8,9,10,11}, 12);
N5Utils.save(img, zarr, dset, new int[]{12}, new ZstandardCompression());

Read the data:

import zarr
root = zarr.open('zstd-test.zarr')
arr = root['n5-test/simple-zst']
arr[:]
results in this error
RuntimeError                              Traceback (most recent call last)
Cell In[3], line 4
      2 root = zarr.open('zstd-test.zarr')
      3 arr = root['n5-test/simple-zst']
----> 4 arr[:]

File ~/.local/lib/python3.10/site-packages/zarr/core.py:844, in Array.__getitem__(self, selection)
    842     result = self.get_orthogonal_selection(pure_selection, fields=fields)
    843 else:
--> 844     result = self.get_basic_selection(pure_selection, fields=fields)
    845 return result

File ~/.local/lib/python3.10/site-packages/zarr/core.py:970, in Array.get_basic_selection(self, selection, out, fields)
    968     return self._get_basic_selection_zd(selection=selection, out=out, fields=fields)
    969 else:
--> 970     return self._get_basic_selection_nd(selection=selection, out=out, fields=fields)

File ~/.local/lib/python3.10/site-packages/zarr/core.py:1012, in Array._get_basic_selection_nd(self, selection, out, fields)
   1006 def _get_basic_selection_nd(self, selection, out=None, fields=None):
   1007     # implementation of basic selection for array with at least one dimension
   1008 
   1009     # setup indexer
   1010     indexer = BasicIndexer(selection, self)
-> 1012     return self._get_selection(indexer=indexer, out=out, fields=fields)

File ~/.local/lib/python3.10/site-packages/zarr/core.py:1388, in Array._get_selection(self, indexer, out, fields)
   1385 if math.prod(out_shape) > 0:
   1386     # allow storage to get multiple items at once
   1387     lchunk_coords, lchunk_selection, lout_selection = zip(*indexer)
-> 1388     self._chunk_getitems(
   1389         lchunk_coords,
   1390         lchunk_selection,
   1391         out,
   1392         lout_selection,
   1393         drop_axes=indexer.drop_axes,
   1394         fields=fields,
   1395     )
   1396 if out.shape:
   1397     return out

File ~/.local/lib/python3.10/site-packages/zarr/core.py:2228, in Array._chunk_getitems(self, lchunk_coords, lchunk_selection, out, lout_selection, drop_axes, fields)
   2226 for ckey, chunk_select, out_select in zip(ckeys, lchunk_selection, lout_selection):
   2227     if ckey in cdatas:
-> 2228         self._process_chunk(
   2229             out,
   2230             cdatas[ckey],
   2231             chunk_select,
   2232             drop_axes,
   2233             out_is_ndarray,
   2234             fields,
   2235             out_select,
   2236             partial_read_decode=partial_read_decode,
   2237         )
   2238     else:
   2239         # check exception type
   2240         if self._fill_value is not None:

File ~/.local/lib/python3.10/site-packages/zarr/core.py:2098, in Array._process_chunk(self, out, cdata, chunk_selection, drop_axes, out_is_ndarray, fields, out_selection, partial_read_decode)
   2096     if isinstance(cdata, PartialReadBuffer):
   2097         cdata = cdata.read_full()
-> 2098     self._compressor.decode(cdata, dest)
   2099 else:
   2100     if isinstance(cdata, UncompressedPartialReadBufferV3):

File numcodecs/zstd.pyx:219, in numcodecs.zstd.Zstd.decode()

File numcodecs/zstd.pyx:153, in numcodecs.zstd.decompress()

RuntimeError: Zstd decompression error: invalid input data

@mkitti
Copy link
Contributor Author

mkitti commented Dec 20, 2023

Where does N5Factory().openWriter( root ) come from? I don't see that method in N5Utils?

@mkitti
Copy link
Contributor Author

mkitti commented Dec 20, 2023

This seems to be a bug in zarr-developers/numcodecs. There they use the C function ZSTD_getDecompressedSize:

https://github.com/zarr-developers/numcodecs/blob/366318f3b82403fe56db5ae647f8747e7a4aaf38/numcodecs/zstd.pyx#L151C21-L153

According to the Zstandard manual, that routine is now deprecated. One issue with it is that it returns 0 if the result is empty, unknown, or if an error has occurred.

https://facebook.github.io/zstd/zstd_manual.html

The numcodecs bug is that they assume that a value of 0 means error. In this case, it actually means unknown. I know it means unknown since I used the function ZSTD_getFrameContentSize and that returns 0xffffffffffffffff or ZSTD_CONTENTSIZE_UNKNOWN.

@mkitti
Copy link
Contributor Author

mkitti commented Dec 20, 2023

Here's what my current test class looks like:

package org.janelia.saalfeldlab.n5.zarr;

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;

import org.janelia.saalfeldlab.n5.N5Writer;
import org.janelia.saalfeldlab.n5.imglib2.N5Utils;
import org.janelia.scicomp.n5.zstandard.ZstandardCompression;
import org.junit.Test;

import com.github.luben.zstd.Zstd;

import net.imglib2.img.array.ArrayImg;
import net.imglib2.img.array.ArrayImgs;
import net.imglib2.img.basictypeaccess.array.ByteArray;
import net.imglib2.type.numeric.integer.UnsignedByteType;

public class ZstandardTest {

	@Test
	public void testZstandard() throws IOException {
		final String root = "/home/mkitti/eclipse-workspace/n5-zarr/test.zarr";
		final N5Writer zarr = new N5ZarrWriter(root);
		final String dset = "simple-zst";
		final byte[] bytes = new byte[1024*1024];
		for(int i=0; i < bytes.length; ++i) {
			bytes[i] = (byte)(i*5-128);
		}
		//bytes = new byte[]{0,1,2,3,4,5,6,7,8,9,10,11};
		ArrayImg<UnsignedByteType, ByteArray> img = ArrayImgs.unsignedBytes(bytes, bytes.length);
		ZstandardCompression compressor = new ZstandardCompression();
		compressor.setSetCloseFrameOnFlush(true);
		N5Utils.save(img, zarr, dset, new int[]{1024}, compressor);
		
		byte[] compressedBytes = Files.readAllBytes(Paths.get(root, dset, "0"));
		System.out.println(Zstd.getFrameContentSize(compressedBytes));
	}
}

@mkitti
Copy link
Contributor Author

mkitti commented Dec 20, 2023

Basically the problem is that at the time the Zstandard frame header is written it does not seem to know the size of the input. Thus it marks it as unknown. numcodecs does not know what to do with an unknown size.

Rather than using the stream API we may need to a buffer API.

To address the issue rather specifically, we may need to use setPledgedSrcSize.
https://www.javadoc.io/doc/com.github.luben/zstd-jni/latest/com/github/luben/zstd/ZstdCompressCtx.html

@bogovicj
Copy link
Collaborator

thanks for investigating @mkitti

@mkitti
Copy link
Contributor Author

mkitti commented Dec 20, 2023

This PR to n5-zstandard fixes the issue for me.

JaneliaSciComp/n5-zstandard#3

@bogovicj
Copy link
Collaborator

N5Factory().openWriter( root )

comes from n5-universe.. The tests I was running involved adding Zstandard compression to the list of options in the imagej export plugin in https://github.com/saalfeldlab/n5-ij

@mkitti
Copy link
Contributor Author

mkitti commented Dec 21, 2023

@bogovicj I updated n5-zstandard to version 1.0.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants