pull: Add --per-object-fsync #2152

cgwalters · 2020-07-16T21:41:46Z

This is the opposite of
#1184

Motivated by OpenShift seeing etcd performance issues during
OS updates: openshift/machine-config-operator#1897

Basically, if we switch to invoking fsync() as we go, it makes
ostree performance worse (in my tests, 31s to write 2G versus 7s if we
delay sync) but it avoids huge outliers in fsync() time for etcd.

cgwalters · 2020-07-16T21:43:22Z

For lots and lots more info, see: https://hackmd.io/WeqiDWMAQP2sNtuPRul9QA

(I will probably copy/paste that doc into here for posterity too when it's finalized)

cgwalters · 2020-07-16T21:45:08Z

I am also considering adding a repo option for this to force it on by default.

But the next step here is to change e.g. rpm-ostree to set this by default when it's pulling from a local repo for updates.

jlebon · 2020-07-17T16:04:26Z

src/libostree/ostree-repo-private.h

+ * https://github.com/openshift/machine-config-operator/issues/1897
 * */
-#define _OSTREE_MAX_OUTSTANDING_WRITE_REQUESTS 16
+#define _OSTREE_MAX_OUTSTANDING_WRITE_REQUESTS 3


It shouldn't though would be good to do a sanity-check that over-the-network pull performance isn't affected by this.

With plain kola http-server over localhost I get 155MB/s and changing this slows it down from 9s to 15s...but that doesn't really matter a lot IMO.

I briefly investigated trying out tc but ended up doing strace -f -e write --inject=write:delay_enter=5ms kola http-server 2>/dev/null and there's no appreciable difference with that (~21MB/s, still a quite fast connection speed obviously). Both end up at 36s for my test case.

jlebon · 2020-07-17T16:13:59Z

src/libostree/ostree-repo-commit.c

    return FALSE;

+  if (!fsync_object_dirs (self, cancellable, error))
+    return FALSE;


Is there a purpose to the refactor into a separate function? Seems like fsync behaviour in this path hasn't really changed here, right?

In per-object-fsync mode we aren't using the staging dir; we still allocate one because making that conditional would be a notably bigger patch. So we need to sync the actual target objects/ directory. And in any case I think we can get away with doing it this way rather than syncing the staging dir in both cases.

jlebon · 2020-07-17T16:19:28Z

src/libostree/ostree-repo-commit.c

        return glnx_throw_errno_prefix (error, "syncfs");
    }

  if (!rename_pending_loose_objects (self, cancellable, error))


Can't we skip this function entirely in the case where we're writing directly into the object dir?

Probably, but per above we'd need to make more of the transaction/staging dir conditional.

cgwalters · 2020-07-17T18:02:00Z

flake in test-concurrency.py still, I thought we'd fixed that 😢
/override continuous-integration/travis-ci/pr

openshift-ci-robot · 2020-07-17T18:02:03Z

@cgwalters: Overrode contexts on behalf of cgwalters: continuous-integration/travis-ci/pr

Details

In response to this:

flake in test-concurrency.py still, I thought we'd fixed that 😢
/override continuous-integration/travis-ci/pr

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

cgwalters · 2020-07-17T18:25:14Z

It's also interesting to compare the effect of --fsync-incremental on "fast ethernet" HTTP pulls. It looks like it about doubles time from 35 to 68s with the same data, with the same "avoids huge latency spike" effect.

The more I think about this though I think ultimately what we want is to better control the total bandwidth of writes. Which will come with cgroups v2 but we can't rely on that yet.

src/libostree/ostree-repo-commit.c

jlebon

I sanity-checked this locally as well. I wanted to make sure that on my slow Internet connection (5 MB/s) that I didn't see a slowdown, because that would indicate we're putting pressure on the wrong thing. And indeed I didn't!

LGTM overall, just some minor/optional things I noticed while trying it out.

src/libostree/ostree-repo-pull.c

man/ostree.repo-config.xml

Pairs with: ostreedev/ostree#2152 Be nice to concurrent processes; operating system updates are usually a background thing. See e.g. openshift/machine-config-operator#1897 ostreedev/ostree#2152 This option is most effective in combination with a block scheduler such as `bfq`, which is the systemd default since systemd/systemd#13321

This is the opposite of ostreedev#1184 Motivated by OpenShift seeing etcd performance issues during OS updates: openshift/machine-config-operator#1897 Basically, if we switch to invoking `fsync()` as we go, it makes ostree performance worse (in my tests, 31s to write 2G versus 7s if we delay sync) but it avoids *huge* outliers in `fsync()` time for etcd.

jlebon · 2020-07-20T14:42:48Z

Hmm, Travis CI looks like a flake. Restarted it.
/lgtm

openshift-ci-robot · 2020-07-20T14:42:54Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, jlebon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [cgwalters,jlebon]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

cgwalters · 2020-07-20T17:19:16Z

It looks like for some reason the GPG agent socket is being written into the tempdir. Need to chase that down but
/override continuous-integration/travis-ci/pr

openshift-ci-robot · 2020-07-20T17:19:20Z

@cgwalters: Overrode contexts on behalf of cgwalters: continuous-integration/travis-ci/pr

Details

In response to this:

It looks like for some reason the GPG agent socket is being written into the tempdir. Need to chase that down but
/override continuous-integration/travis-ci/pr

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot requested review from d4s and mwleeds July 16, 2020 21:41

openshift-ci-robot added the approved label Jul 16, 2020

cgwalters mentioned this pull request Jul 16, 2020

pull: Add OSTREE_PULL_MAX_OUTSTANDING_WRITES env var #2147

Closed

cgwalters force-pushed the pull-fdatasync branch from 8665a67 to 52172aa Compare July 16, 2020 22:17

jlebon reviewed Jul 17, 2020

View reviewed changes

src/libostree/ostree-repo-commit.c Outdated Show resolved Hide resolved

src/libostree/ostree-repo-commit.c Show resolved Hide resolved

jlebon approved these changes Jul 17, 2020

View reviewed changes

src/libostree/ostree-repo-pull.c Outdated Show resolved Hide resolved

man/ostree.repo-config.xml Outdated Show resolved Hide resolved

cgwalters mentioned this pull request Jul 17, 2020

Bug 1850057: stage OS updates (nicely) while etcd is still running openshift/machine-config-operator#1897

Closed

cgwalters mentioned this pull request Jul 18, 2020

daemon: Use IOSchedulingClass=idle coreos/rpm-ostree#2164

Closed

cgwalters force-pushed the pull-fdatasync branch from 52172aa to a527823 Compare July 18, 2020 14:51

cgwalters changed the title ~~pull: Add --fsync-incremental~~ pull: Add --per-object-fsync Jul 18, 2020

cgwalters force-pushed the pull-fdatasync branch from a527823 to a615d35 Compare July 18, 2020 14:59

openshift-ci-robot assigned jlebon Jul 20, 2020

openshift-ci-robot added the lgtm label Jul 20, 2020

openshift-merge-robot merged commit 12a9161 into ostreedev:master Jul 20, 2020

ironcladlou mentioned this pull request Aug 13, 2020

Bug 1850057: Use bfq scheduler on control plane, idle I/O for rpm-ostreed openshift/machine-config-operator#1957

Merged

pull: Add --per-object-fsync #2152

pull: Add --per-object-fsync #2152

Uh oh!

Conversation

cgwalters commented Jul 16, 2020

Uh oh!

cgwalters commented Jul 16, 2020

Uh oh!

cgwalters commented Jul 16, 2020

Uh oh!

jlebon Jul 17, 2020

Choose a reason for hiding this comment

Uh oh!

cgwalters Jul 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jlebon Jul 17, 2020

Choose a reason for hiding this comment

Uh oh!

cgwalters Jul 17, 2020

Choose a reason for hiding this comment

Uh oh!

jlebon Jul 17, 2020

Choose a reason for hiding this comment

Uh oh!

cgwalters Jul 17, 2020

Choose a reason for hiding this comment

Uh oh!

cgwalters commented Jul 17, 2020

Uh oh!

openshift-ci-robot commented Jul 17, 2020

Uh oh!

cgwalters commented Jul 17, 2020

Uh oh!

Uh oh!

Uh oh!

jlebon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jlebon commented Jul 20, 2020

Uh oh!

openshift-ci-robot commented Jul 20, 2020

Uh oh!

cgwalters commented Jul 20, 2020

Uh oh!

openshift-ci-robot commented Jul 20, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

cgwalters Jul 17, 2020 •

edited

Loading