Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zarr streaming #6144

Merged
merged 31 commits into from
May 4, 2022
Merged

Zarr streaming #6144

merged 31 commits into from
May 4, 2022

Conversation

leowe
Copy link
Contributor

@leowe leowe commented Apr 8, 2022

URL of deployed dev instance (used for testing):

Steps to test:

  • Use this script to test:
def _assert_check_equality(args) -> None:
    view_a, view_b, _ = args
    assert np.array_equal(view_a.read(absolute_offset=(3072, 3072, 512), size=(32, 32, 32)),
                          view_b.read(absolute_offset=(3072, 3072, 512), size=(32, 32, 32)))
    print("They are equal!!")


if __name__ == "__main__":
    p = UPath('http://localhost:9000/data/zarr/sample_organization/l4_sample/', **{"headers": {"X-Auth-Token": "secretSampleUserToken"}})
    assert p.exists()

    disk = wk.Dataset.open("/Users/leo/Documents/work/scm/webknossos/binaryData/sample_organization/l4_sample/")
    http = wk.Dataset.open(p)

    mag_disk = disk.get_layer("color").get_mag("1-1-1")
    mag_http = http.get_layer("color").get_mag("1-1-1")
    mag_disk.for_zipped_chunks(_assert_check_equality, mag_http)

There are currently 4 issues on the client side: 1) is_dir is wrongly implemented and mostly returns false in fsspec (should be fixed in UPath PR), 2) encoding=utf-8 caused a problem, 3) iterdir in UPath uses a false implementation of _sub_path (should be fixed in UPath), and 4) is_file check, doesn't check first whether file exists which leads to an exception (change first line of wk.WKWArray.open to if (path / "header.wkw").exists() and (path / "header.wkw").is_file():

Issues:


(Please delete unneeded items, merge only when none are left open)


ToDo

  • check with Zarr files
  • check with different mags
  • check with different layer names

@leowe leowe requested a review from fm3 April 22, 2022 11:04
@leowe
Copy link
Contributor Author

leowe commented Apr 22, 2022

@fm3 Can you have a look already?

What doesn't seem to work yet is working with different mags... Somehow the request always leads to empty responses.

When substituting the VoxelPosition calculation in requestRawZarr with one that doesn't multiply by cubeLength and mag, it worked for higher mags, but not for Mag-1.

Copy link
Member

@fm3 fm3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good stuff, thank you for pushing this so far already :)

I added some comments about what I noticed while first reading the code.

Regarding different mags, I did not immediately see an issue, though I think I would need a concrete test case to debug this. Maybe we can look at this together this week.
One thing that could be a culprit is the supplied coordinates. We should double check whether the requested coordinates need to still be in mag1 or not, and what the client that you tested, does.

@leowe leowe marked this pull request as ready for review April 28, 2022 10:47
@leowe
Copy link
Contributor Author

leowe commented Apr 28, 2022

Since currently all buckets are uncompressed when serving them, we don't need to check the compression.
For reference for later:

  private def isWkwCompressed(mag: Vec3Int,
                              dataSource: DataSource,
                              dataLayer: DataLayer,
                              dataLayerName: String): Fox[Boolean] = {
    val result = dataLayer.bucketProvider match {
      case provider: WKWBucketProvider =>
        WKWHeader(
          provider
            .wkwHeaderFilePath(mag, Some(dataSource.id), Some(dataLayerName), binaryDataService.dataBaseDir)
            .toFile)
      case provider: ZarrBucketProvider => ???
    }

    result.flatMap((h) => Full(h.isCompressed))
  }

@leowe leowe requested a review from fm3 April 28, 2022 11:27
Copy link
Member

@fm3 fm3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is getting somewhere :) I left some minor comments, but I’d say this will soon be good to go!
One more thing: please write follow-up issues for the things that were discussed as future work (e.g. sending 404 for chunks outside the real bounding box instead of empty 200 replies; handing through already-compressed buckets)

leowe and others added 3 commits April 28, 2022 14:54
Co-authored-by: Florian M <[email protected]>
…e/controllers/ZarrStreamingController.scala

Co-authored-by: Florian M <[email protected]>
…e/controllers/ZarrStreamingController.scala

Co-authored-by: Florian M <[email protected]>
@leowe leowe requested a review from fm3 April 28, 2022 16:45
Copy link
Member

@fm3 fm3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@leowe leowe merged commit 8d11c24 into master May 4, 2022
@leowe leowe deleted the zarr-streaming branch May 4, 2022 12:32
@normanrz
Copy link
Member

normanrz commented May 4, 2022

whoop whoop 🙌 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expose datasets through a Zarr-compatible interface
4 participants