-
Notifications
You must be signed in to change notification settings - Fork 339
pull: Add --per-object-fsync #2152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pull: Add --per-object-fsync #2152
Conversation
|
For lots and lots more info, see: https://hackmd.io/WeqiDWMAQP2sNtuPRul9QA (I will probably copy/paste that doc into here for posterity too when it's finalized) |
|
I am also considering adding a repo option for this to force it on by default. But the next step here is to change e.g. rpm-ostree to set this by default when it's pulling from a local repo for updates. |
8665a67 to
52172aa
Compare
| * https://github.com/openshift/machine-config-operator/issues/1897 | ||
| * */ | ||
| #define _OSTREE_MAX_OUTSTANDING_WRITE_REQUESTS 16 | ||
| #define _OSTREE_MAX_OUTSTANDING_WRITE_REQUESTS 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shouldn't though would be good to do a sanity-check that over-the-network pull performance isn't affected by this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With plain kola http-server over localhost I get 155MB/s and changing this slows it down from 9s to 15s...but that doesn't really matter a lot IMO.
I briefly investigated trying out tc but ended up doing strace -f -e write --inject=write:delay_enter=5ms kola http-server 2>/dev/null and there's no appreciable difference with that (~21MB/s, still a quite fast connection speed obviously). Both end up at 36s for my test case.
| return FALSE; | ||
|
|
||
| if (!fsync_object_dirs (self, cancellable, error)) | ||
| return FALSE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a purpose to the refactor into a separate function? Seems like fsync behaviour in this path hasn't really changed here, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In per-object-fsync mode we aren't using the staging dir; we still allocate one because making that conditional would be a notably bigger patch. So we need to sync the actual target objects/ directory. And in any case I think we can get away with doing it this way rather than syncing the staging dir in both cases.
| return glnx_throw_errno_prefix (error, "syncfs"); | ||
| } | ||
|
|
||
| if (!rename_pending_loose_objects (self, cancellable, error)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we skip this function entirely in the case where we're writing directly into the object dir?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, but per above we'd need to make more of the transaction/staging dir conditional.
|
flake in test-concurrency.py still, I thought we'd fixed that 😢 |
|
@cgwalters: Overrode contexts on behalf of cgwalters: continuous-integration/travis-ci/pr DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
It's also interesting to compare the effect of The more I think about this though I think ultimately what we want is to better control the total bandwidth of writes. Which will come with cgroups v2 but we can't rely on that yet. |
jlebon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I sanity-checked this locally as well. I wanted to make sure that on my slow Internet connection (5 MB/s) that I didn't see a slowdown, because that would indicate we're putting pressure on the wrong thing. And indeed I didn't!
LGTM overall, just some minor/optional things I noticed while trying it out.
Pairs with: ostreedev/ostree#2152 Be nice to concurrent processes; operating system updates are usually a background thing. See e.g. openshift/machine-config-operator#1897 ostreedev/ostree#2152 This option is most effective in combination with a block scheduler such as `bfq`, which is the systemd default since systemd/systemd#13321
52172aa to
a527823
Compare
This is the opposite of ostreedev#1184 Motivated by OpenShift seeing etcd performance issues during OS updates: openshift/machine-config-operator#1897 Basically, if we switch to invoking `fsync()` as we go, it makes ostree performance worse (in my tests, 31s to write 2G versus 7s if we delay sync) but it avoids *huge* outliers in `fsync()` time for etcd.
a527823 to
a615d35
Compare
|
Hmm, Travis CI looks like a flake. Restarted it. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cgwalters, jlebon The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
It looks like for some reason the GPG agent socket is being written into the tempdir. Need to chase that down but |
|
@cgwalters: Overrode contexts on behalf of cgwalters: continuous-integration/travis-ci/pr DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This is the opposite of
#1184
Motivated by OpenShift seeing etcd performance issues during
OS updates: openshift/machine-config-operator#1897
Basically, if we switch to invoking
fsync()as we go, it makesostree performance worse (in my tests, 31s to write 2G versus 7s if we
delay sync) but it avoids huge outliers in
fsync()time for etcd.