-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Use dedicated cache keys instead of relying on an absolute path #51669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use dedicated cache keys instead of relying on an absolute path #51669
Conversation
|
Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore) |
DaveCTurner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, I left a few minor comments/questions.
| private final String fileName; | ||
| private final long fileLength; | ||
|
|
||
| CacheKey(SnapshotId snapshotId, IndexId indexId, ShardId shardId, Path cacheDir, String fileName, long fileLength) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we have an EqualsHashCodeTestUtils#checkEqualsAndHashCode-based test for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤦♂️ Of course!
| private final ShardId shardId; | ||
| private final Path cacheDir; | ||
| private final String fileName; | ||
| private final long fileLength; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels strange to have the length as part of the key. I kinda see that it's here to make some plumbing a bit simpler, but I think I'd still rather move it back out again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I look at it again I agree it feels strange. I reverted this.
| private final SnapshotId snapshotId; | ||
| private final IndexId indexId; | ||
| private final ShardId shardId; | ||
| private final Path cacheDir; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly cacheDir. In my mind the thing in the cache is the Lucene file, which is uniquely identified by snapshot/index/shard/filename, and cacheDir is a bit of an implementation detail that I feel should be hidden here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, thanks
|
Thanks @DaveCTurner, I've applied your feedback. |
DaveCTurner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM; I left one additional optional request.
| // directory has been closed. | ||
| cacheService.removeFromCache(key -> key.startsWith(cacheDir.toString())); | ||
| cacheService.removeFromCache(key -> | ||
| Objects.equals(key.getSnapshotId(), snapshotId) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe move this lambda to a method CacheKey#belongsTo(SnapshotId, IndexId, ShardId) (not sure about the name but that's the best I've got)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. I kept the name and added a test in e8afbe7
|
Thanks David! |
Today cache files are identified in cache using a string representing an absolute path to a file on disk. This path is a sub directory of the current shard data path and as such already contains identification bits like the current index id and the shard id. It also contains the snapshot id that is passed at
CacheDirectorycreation time.While this has been done for quick prototyping and already been improved in #51520, it feels wrong to rely on a path converted to a string as cache keys. Instead we should have a distinct
CacheKeyobject to identifyCacheFilein cache.Relates #50693 (comment)