-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up Zarr3 data reading by fixing VaultPath equality check #7363
Conversation
@@ -76,4 +77,9 @@ class VaultPath(uri: URI, dataVault: DataVault) extends LazyLogging { | |||
override def toString: String = uri.toString | |||
|
|||
def summary: String = s"VaultPath: ${this.toString} for ${dataVault.getClass.getSimpleName}" | |||
|
|||
override def equals(obj: Any): Boolean = hashCode() == obj.hashCode() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this collision-safe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, should be now :)
override def hashCode(): Int = | ||
new HashCodeBuilder(19, 31).toHashCode | ||
|
||
override def equals(obj: Any): Boolean = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note to self: this should still do the typecheck with match
Different instances of VaultPath were not equal, leading to the caching mechanism in Zarr3Array.scala not working as intended (there, VaultPath objects are used as cache keys). So the Shard index was parsed anew in every single chunk request, and stored in memory hundreds of times.
This PR implements content-dependent equals and hashCode methods for VaultPath, FileSystemVaultPath, and the DataVaults themselves, enabling the cache to work properly.
URL of deployed dev instance (used for testing):
Steps to test:
_ = logger.info(s"cache miss! shardPath: $shardPath")
in Zarr3Array.scala line 119