-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Account already downloaded data when resetting a prefetcher #797
Conversation
Signed-off-by: Vladislav Volodkin <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this idea better than #795. Just a couple notes.
Signed-off-by: Vladislav Volodkin <[email protected]>
4a302da
to
451b2a2
Compare
let part_queue = PartQueue { | ||
current_part: AsyncMutex::new(None), | ||
receiver, | ||
failed: AtomicBool::new(false), | ||
downloaded: Arc::clone(&downloaded), | ||
bytes_arrived_total: Arc::clone(&bytes_counter), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe bytes_received
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, I put some thinking in that tbh! bytes_received
may be misleading because of PartQueue.receiver
, which may imply that those bytes were received (i.e read through receiver), but what it actually tracks is the number of bytes arrived to the queue (which may also not have the clearest meaning, naming is complicated)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, they are the bytes received by the receiver, so it still sounds good to be. I'd rather clarify in the rustdoc than add a different name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I liked bytes_downloaded
, but also not worth bikeshedding over the name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll replace with bytes_received
, may be not completely accurate, but symmetrical to the sending part and there is a comment with a clarification
mountpoint-s3/src/prefetch.rs
Outdated
// it. Otherwise, we're willing to wait for the bytes to download only if they're coming "soon", where | ||
// soon is defined as up to `max_forward_seek_distance` bytes ahead of the available offset. | ||
let available_offset = current_task.available_offset(); | ||
if offset > available_offset.saturating_add(self.config.max_forward_seek_distance) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
max_forward_seek_distance
is now the maximum number of bytes we are happy to wait to be downloaded when seeking forward. Can we think of a more appropriate name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
max_forward_seek_wait_distance
?
Also, there's a subtle thing here that it doesn't matter if available_offset + max_forward_seek_wait_distance > current_task.end_offset()
, because the loop above guarantees that the offset
lands within this task in the loop above. Maybe worth a comment on the line above this one that selects current_task
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, yeah, not regarding this particular line, but I also thought about the case when offset
is beyond the last task's range, I'll add a comment and will try to add a test (if possible without too much hacks)
max_forward_seek_wait_distance
looks good to me, can not think about anything less wordy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I like this. We should think about how we want metrics for the prefetcher to work now, but let's do that in a separate PR.
mountpoint-s3/src/prefetch.rs
Outdated
// it. Otherwise, we're willing to wait for the bytes to download only if they're coming "soon", where | ||
// soon is defined as up to `max_forward_seek_distance` bytes ahead of the available offset. | ||
let available_offset = current_task.available_offset(); | ||
if offset > available_offset.saturating_add(self.config.max_forward_seek_distance) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
max_forward_seek_wait_distance
?
Also, there's a subtle thing here that it doesn't matter if available_offset + max_forward_seek_wait_distance > current_task.end_offset()
, because the loop above guarantees that the offset
lands within this task in the loop above. Maybe worth a comment on the line above this one that selects current_task
?
let part_queue = PartQueue { | ||
current_part: AsyncMutex::new(None), | ||
receiver, | ||
failed: AtomicBool::new(false), | ||
downloaded: Arc::clone(&downloaded), | ||
bytes_arrived_total: Arc::clone(&bytes_counter), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I liked bytes_downloaded
, but also not worth bikeshedding over the name.
Signed-off-by: Vladislav Volodkin <[email protected]>
let send_result = self.sender.send_blocking(part); | ||
if let Some(part_size) = part_size { | ||
self.bytes_sent.fetch_add(part_size, Ordering::SeqCst); | ||
if let Ok(ref part) = part { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you can also use:
if let Ok(ref part) = part { | |
if let Ok(part) = &part { |
Description of change
This is a development if an idea from this PR: #795
The idea is to avoid discarding prefetched data if it is already downloaded. This optimises a certain read pattern, which reads data sequentially but skips chunks between reads.
Here is some benchmarking with
fio
job included in the pr (no cache usage, remounted in betweenfio
invocations):upd. updated benchmark results with longer runtime
300s
and larger file10gb
, lowest observed improvement is6.8x
Relevant issues: No
Does this change impact existing behavior?
This is not a functional change and thus not a breaking one. We should reason about possible performance regressions in certain read patterns before merging.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the Developer Certificate of Origin (DCO).