Skip to content

Conversation

@cgwalters
Copy link
Member

This is the opposite of
#1184

Motivated by OpenShift seeing etcd performance issues during
OS updates: openshift/machine-config-operator#1897

Basically, if we switch to invoking fsync() as we go, it makes
ostree performance worse (in my tests, 31s to write 2G versus 7s if we
delay sync) but it avoids huge outliers in fsync() time for etcd.

@cgwalters
Copy link
Member Author

For lots and lots more info, see: https://hackmd.io/WeqiDWMAQP2sNtuPRul9QA

(I will probably copy/paste that doc into here for posterity too when it's finalized)

@cgwalters
Copy link
Member Author

I am also considering adding a repo option for this to force it on by default.

But the next step here is to change e.g. rpm-ostree to set this by default when it's pulling from a local repo for updates.

* https://github.com/openshift/machine-config-operator/issues/1897
* */
#define _OSTREE_MAX_OUTSTANDING_WRITE_REQUESTS 16
#define _OSTREE_MAX_OUTSTANDING_WRITE_REQUESTS 3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't though would be good to do a sanity-check that over-the-network pull performance isn't affected by this.

Copy link
Member Author

@cgwalters cgwalters Jul 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With plain kola http-server over localhost I get 155MB/s and changing this slows it down from 9s to 15s...but that doesn't really matter a lot IMO.

I briefly investigated trying out tc but ended up doing strace -f -e write --inject=write:delay_enter=5ms kola http-server 2>/dev/null and there's no appreciable difference with that (~21MB/s, still a quite fast connection speed obviously). Both end up at 36s for my test case.

return FALSE;

if (!fsync_object_dirs (self, cancellable, error))
return FALSE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a purpose to the refactor into a separate function? Seems like fsync behaviour in this path hasn't really changed here, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In per-object-fsync mode we aren't using the staging dir; we still allocate one because making that conditional would be a notably bigger patch. So we need to sync the actual target objects/ directory. And in any case I think we can get away with doing it this way rather than syncing the staging dir in both cases.

return glnx_throw_errno_prefix (error, "syncfs");
}

if (!rename_pending_loose_objects (self, cancellable, error))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we skip this function entirely in the case where we're writing directly into the object dir?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, but per above we'd need to make more of the transaction/staging dir conditional.

@cgwalters
Copy link
Member Author

flake in test-concurrency.py still, I thought we'd fixed that 😢
/override continuous-integration/travis-ci/pr

@openshift-ci-robot
Copy link
Collaborator

@cgwalters: Overrode contexts on behalf of cgwalters: continuous-integration/travis-ci/pr

Details

In response to this:

flake in test-concurrency.py still, I thought we'd fixed that 😢
/override continuous-integration/travis-ci/pr

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@cgwalters
Copy link
Member Author

It's also interesting to compare the effect of --fsync-incremental on "fast ethernet" HTTP pulls. It looks like it about doubles time from 35 to 68s with the same data, with the same "avoids huge latency spike" effect.

The more I think about this though I think ultimately what we want is to better control the total bandwidth of writes. Which will come with cgroups v2 but we can't rely on that yet.

Copy link
Member

@jlebon jlebon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I sanity-checked this locally as well. I wanted to make sure that on my slow Internet connection (5 MB/s) that I didn't see a slowdown, because that would indicate we're putting pressure on the wrong thing. And indeed I didn't!

LGTM overall, just some minor/optional things I noticed while trying it out.

cgwalters added a commit to cgwalters/rpm-ostree that referenced this pull request Jul 18, 2020
Pairs with: ostreedev/ostree#2152

Be nice to concurrent processes; operating system updates
are usually a background thing.  See e.g.
openshift/machine-config-operator#1897
ostreedev/ostree#2152
This option is most effective in combination with
a block scheduler such as `bfq`, which is the systemd
default since systemd/systemd#13321
@cgwalters cgwalters changed the title pull: Add --fsync-incremental pull: Add --per-object-fsync Jul 18, 2020
This is the opposite of
ostreedev#1184

Motivated by OpenShift seeing etcd performance issues during
OS updates: openshift/machine-config-operator#1897

Basically, if we switch to invoking `fsync()` as we go, it makes
ostree performance worse (in my tests, 31s to write 2G versus 7s if we
delay sync) but it avoids *huge* outliers in `fsync()` time for etcd.
@jlebon
Copy link
Member

jlebon commented Jul 20, 2020

Hmm, Travis CI looks like a flake. Restarted it.
/lgtm

@openshift-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, jlebon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@cgwalters
Copy link
Member Author

It looks like for some reason the GPG agent socket is being written into the tempdir. Need to chase that down but
/override continuous-integration/travis-ci/pr

@openshift-ci-robot
Copy link
Collaborator

@cgwalters: Overrode contexts on behalf of cgwalters: continuous-integration/travis-ci/pr

Details

In response to this:

It looks like for some reason the GPG agent socket is being written into the tempdir. Need to chase that down but
/override continuous-integration/travis-ci/pr

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants