Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement viewing sharded neuroglancer precomputed datasets #6920

Merged
merged 11 commits into from
Mar 27, 2023
Merged

Conversation

frcroth
Copy link
Member

@frcroth frcroth commented Mar 15, 2023

URL of deployed dev instance (used for testing):

Steps to test:

  • Sharded datasets to explore and view:
    • gs://h01-release/data/20210601/4nm_raw
    • See neuroglancer for comparison

Issues:


(Please delete unneeded items, merge only when none are left open)

  • Updated changelog
  • Needs datastore update after deployment

@frcroth frcroth requested a review from fm3 March 20, 2023 16:57
@frcroth frcroth changed the title WIP: Neuroglancer Precomputed Sharding Implement viewing sharded neuroglancer precomputed datasets Mar 20, 2023
@frcroth frcroth self-assigned this Mar 20, 2023
@fm3
Copy link
Member

fm3 commented Mar 21, 2023

I managed to improve the performance :) The main problem was that Array[Int] does not work as a cache key, because java arrays are not automatically equal if they have the same content. Sorry for suggesting it in the first place 🙈 I changed it to string again now.

In the process I also increased parallelization by fetching the source chunks for a given read requests in parallel (Future.sequence rather than Fox.serialCombined) and I also noticed that the histogram sampled positions aren’t aligned with the 32 grid, so they used way more source chunks than intended. I added a commit adressing all these things. I’ll add a review for the rest of the code in the coming days :)

Copy link
Member

@fm3 fm3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool stuff! Nice dealing with the sharding metadata!
I added a question dealing with unifying the sharded vs non-sharded reading code, otherwise this already looks very good :)

Comment on lines 131 to 133
for {
bytes <- Fox.option2Fox(shardPath.readBytes(Some(shardIndexRange)))
} yield bytes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for {
bytes <- Fox.option2Fox(shardPath.readBytes(Some(shardIndexRange)))
} yield bytes
Fox.option2Fox(shardPath.readBytes(Some(shardIndexRange)))

should be identical

Comment on lines 98 to 109
if (header.isSharded) {
for {
chunkData: Array[Byte] <- readShardedChunk(chunkIndex)
chunkShape = header.chunkSizeAtIndex(chunkIndex)
multiArray <- chunkReader.parseChunk(chunkData, chunkShape)
} yield multiArray
} else {
val chunkFilename = getChunkFilename(chunkIndex)
val chunkFilePath = relativePath.resolve(chunkFilename)
val storeKey = chunkFilePath.storeKey
val chunkShape = header.chunkSizeAtIndex(chunkIndex)
chunkReader.read(storeKey, chunkShape)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m wondering whether we can get the two branches here into one. As I understand, currently, both branches have their own version of reading from the store, decompressing, then typing. Could it be unified? Maybe the sharding implementation could just return the chunk path plus byte range to be passed to the existing chunkReader? (With non-sharding returning None for the range)

@frcroth frcroth marked this pull request as ready for review March 27, 2023 08:21
@frcroth frcroth requested a review from fm3 March 27, 2023 08:21
Copy link
Member

@fm3 fm3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me :) Thanks for addressing the feedback

@frcroth frcroth merged commit d46eb37 into master Mar 27, 2023
@frcroth frcroth deleted the sharding branch March 27, 2023 09:49
hotzenklotz added a commit that referenced this pull request Apr 3, 2023
…come-toast

* 'master' of github.com:scalableminds/webknossos:
  Log all details on deleting annotation layer (#6950)
  fix typo
  Rename demo instance to wkorg instance (#6941)
  Add LOD mesh support for frontend (#6909)
  Fix layout of view mode switcher and move it (#6949)
  VaultPath no longer extends nio.Path (#6942)
  Release 23.04.0 (#6945)
  Use new zip.js version to allow zip64 uploads (#6939)
  Implement viewing sharded neuroglancer precomputed datasets (#6920)
  Reject dataset uploads if organization storage quota is exceeded (#6893)
  Refactor deprecated antd Dropdown menus (#6898)
hotzenklotz added a commit that referenced this pull request Apr 4, 2023
…wings

* 'master' of github.com:scalableminds/webknossos:
  updates docs for docker installation (#6963)
  Fix misc stuff when viewing tasks/annotations of another user (#6957)
  Remove segment from list and add undo/redo for segments (#6944)
  Log all details on deleting annotation layer (#6950)
  fix typo
  Rename demo instance to wkorg instance (#6941)
  Add LOD mesh support for frontend (#6909)
  Fix layout of view mode switcher and move it (#6949)
  VaultPath no longer extends nio.Path (#6942)
  Release 23.04.0 (#6945)
  Use new zip.js version to allow zip64 uploads (#6939)
  Implement viewing sharded neuroglancer precomputed datasets (#6920)
  Reject dataset uploads if organization storage quota is exceeded (#6893)
  Refactor deprecated antd Dropdown menus (#6898)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support sharding for remote datasets
2 participants