integration tests: learn to start a dummy registry #3894

nalind · 2022-04-01T21:24:22Z

What type of PR is this?

/kind other

What this PR does / why we need it:

When a test needs to talk to a registry server, launch one as part of the test rather than depending on it having been started by someone else.

Use run_buildah where we used to use 'run buildah' without checking the return code, and in a few cases where we did check it.

In the "from with non buildah container" test, use "podman create" with host networking, in an attempt to avoid messing with networking in cases where we're running on a system with a version of podman that will create a bridge with CNI that we'll also create with netavark. We're not sharing storage between the two invocations, so the logic that tries to detect this problem won't detect it.

How to verify it

Updates to tests!

Which issue(s) this PR fixes:

None

Special notes for your reviewer:

This gets us closer to being able to run integration tests anywhere by pointing bats at our tests directory.

Does this PR introduce a user-facing change?

None

TomSweeneyRedHat · 2022-04-02T21:15:55Z

tests/helpers.bash

+# Sets REGISTRY_PID, REGISTRY_PORT (to append to "localhost:"), and
+# REGISTRY_DIR (where the CA cert can be found) on success.
+function start_registry() {
+  local REGISTRY_IMAGE=docker.io/library/registry:2


I think we may call this a fair number of times during the tests and I wonder if we'll hit the rate limits. Would it make sense to have our own registry image tucked away on quay.io somewhere? Do they have one for Podman already?

I'm not against that, but I have no idea if we have a plan in place to keep any of those up to date. The start_registry function uses _prefetch, so it should be pulled, at most, once in a given CI job.

Consider using quay.io/libpod/registry:2.6. I recall that an update to the one on Docker Hub once broke gating.

Using 2.7 now. 2.6 and 2.7 are not multi-arch, and 2.8 isn't in that repository. How/when do they get updated?

I think they're under the loving care of @edsantiago

2.8 copy in progress; it'll take a long time (30m?) due to my slow network.

$ skopeo copy --all docker://docker.io/registry:2.8 docker://quay.io/libpod/registry:2.8

@edsantiago, feel free to ping me next time. Still enjoying fast French fibre :)

Done:

$ buildah manifest inspect quay.io/libpod/registry:2.8 | jq '.manifests[].platform.architecture' "amd64" "arm" "arm" "arm64" "ppc64le" "s390x"

edsantiago · 2022-04-04T19:20:48Z

This one is failing on setxattr (without the l)

nalind · 2022-04-04T21:10:19Z

This one is failing on setxattr (without the l)

This was trying to re-enable a test that was previously disabled under SELinux, but I guess we didn't work around it after all.

TomSweeneyRedHat · 2022-04-05T00:12:48Z

The integration test is failing with one that I've not seen before:

[+1249s] not ok 286 bud-multiple-platform-no-run
...
[+1249s] # 9505e448a8192a7713bf10737028537f74f36a1fb89298e8b4914548eaca053c
[+1249s] # --> dc60ecdbd35
[+1249s] # dc60ecdbd35d1c5a9e89efd7f77f8c820d97643f3a5e4c03e2bf4d23e670a5b3
[+1249s] # error creating build container: writing blob: adding layer with blob "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1": ApplyLayer exit status 1 stdout:  stderr: Error making old root private after pivot: no such file or directory
[+1249s] # [ rc=125 (** EXPECTED 0 **) ]

Later, it can't find containers.conf in a few tests:

[+1532s] ok 364 containers.conf selinux test
[+1532s] time="2022-04-04T16:56:02-05:00" level=warning msg="Error loading container config when searching for local runtime: finding config on system: CONTAINERS_CONF file: stat /var/tmp/go/src/github.com/containers/buildah/tests/./containers1.conf: no such file or directory"
[+1532s] time="2022-04-04T16:56:02-05:00" level=error msg="failed to setup From and Build flags: failed to get container config: finding config on system: CONTAINERS_CONF file: stat /var/tmp/go/src/github.com/containers/buildah/tests/./containers1.conf: no such file or directory"

edsantiago · 2022-04-05T12:37:20Z

@TomSweeneyRedHat that's our nemesis, #3710

TomSweeneyRedHat · 2022-04-05T18:20:32Z

tests/copy.bats

 @test "copy-preserving-extended-attributes" {
  createrandom ${TESTDIR}/randomfile
-  image="quay.io/libpod/fedora-minimal:34"
+  image="registry.fedoraproject.org/fedora-minimal:35"


A thought for later. Should we define ${FEDORA_MINIMAL} or some such in helpers.bat? That way if we need to bump from 35 to 36, we only have one place to do it.

This could as likely have left the version tag off of the image spec, since the test doesn't care about it as much, but don't let me stop you.

The problem with omitting the version spec is that the next (or next-next) version will introduce some sort of breakage that will cause us to scramble in a panic. . Example: containers/podman#12343

Fair enough.

See also #3640. If there is a critical need to use f35 here, can we push a copy to quay instead?

It isn't critical that it be Fedora or Fedora-based, more that it have working setcap, getfattr, and setfattr commands. I'll add comment there to try to clarify that. If you've another image in mind that would work better, I'm happy to switch to using it.

My question was, is fedora-minimal:34 not working? If it no longer works, can we push a new image to quay? If it still works, can we keep it?

I guess it will. I switched to 35 mainly because I know 34 is going EOL in a little over a month. Changing it back to 34.

TomSweeneyRedHat · 2022-04-05T18:23:51Z

LGTM

TomSweeneyRedHat · 2022-04-06T22:12:21Z

Happy green test buttons.

edsantiago · 2022-04-07T13:30:48Z

tests/helpers.bash

+  local htpasswd='testuser:$2y$05$OW6Qlf1ygRuqJxt/CYcBq.0MZARYLIvr.mgjksw2m7K4cwWkw3Pda'
+  if test "$1" = testuserfoo && test "$2" = testpassword ; then
+    htpasswd='testuserfoo:$2y$05$Pkly/9sA8iGY..2SaerFH.wr4qGPPaGZ66KMrvVmcCP8gaRPpeoeC'
+  elif test "${1:-testuser}" != testuser && test "${2:-testpassword}" != testpassword ; then


Isn't this just a complicated way of saying:

local testuser="${1:-testuser}" local testpass="${2:-testpassword}"

That whole section probably makes more sense as a case statement. Reworking it.

What I don't understand is, why the special case at all?

It's an attempt to sort of future-proof that, but thinking on it more, it's better to just avoid needing htpasswd, so I'll drop that case.

htpasswd is already a hard requirement for the buildah-tests package, in both fedora and rhel, because a registry is already mandatory for running tests (and has been from day one).

If we use previously-computed hash values for passwords that we hard-code in the tests, we'll no longer need to call out to htpasswd when the tests are run. Are you saying that's harmful? I'm missing how that would be.

It's the hardcoding I most object to: although right now all tests use testuser/testpassword, we have the option (and should take advantage of it) of using randomly-generated user and password which would improve confidence in testing.

We check in "authenticate: cert and credentials" that we can successfully authenticate to the registry with a known-good username/password pair, and that we get an error when we intentionally supply values that we know the registry isn't configured to accept. What types of bugs do you see us catching by randomly selecting unique known-good values for each test?

Mostly my paranoia: with static values (credentials, input/output test strings) there's always a chance that you're not really testing what you think you're testing: that there's a leftover process somewhere and you're talking to it instead of the registry/httpserver/whatever that you think you're talking to. With unique values that worry is diminished.

Hmm, I still like the idea of reducing the set of requirements that someone needs to have installed in order to run the tests, and it turns out we can generate the hashes for passwords ourselves by wrapping the right function call. I'll add that and switch the "authenticate: cert and credentials" test to randomize them.

edsantiago · 2022-04-07T13:37:07Z

tests/helpers.bash

+      echo error computing hashed password
+      exit 1


How about die "Error computing hashed password"?

That'll work. Changing it.

edsantiago · 2022-04-07T13:37:35Z

tests/helpers.bash

+    echo error creating new key and certificate
+    return 1


Likewise, die might be better

Changing it.

edsantiago · 2022-04-07T13:39:13Z

tests/helpers.bash

+    echo error determining listening port from log:
+    cat ${TESTDIR}/registry/registry.log
+    stop_registry
+    return 1


Just plain false is enough, that will abort bats for the given test

(sorry for all the single-comments; I'm just worried that this will merge too early. Feel free to batch up my comments until I say "done with review")

Changing it. No worries, I'll try to make sure it's all sorted before the next rebase.

edsantiago · 2022-04-07T13:39:26Z

tests/helpers.bash

+  if ! ${BUILDAH_BINARY} --storage-driver vfs --root "${REGISTRY_DIR}"/root --runroot "${REGISTRY_DIR}"/run push --cert-dir "${REGISTRY_DIR}" --creds "${1:-testuser}":"${2:-testpassword}" "${REGISTRY_IMAGE}" localhost:"${REGISTRY_PORT}"/registry; then
+    echo error pushing to /registry repository at localhost:$REGISTRY_PORT
+    stop_registry
+    return 1


likewise, false

Changing it.

edsantiago · 2022-04-07T13:39:37Z

tests/helpers.bash

+    return 1
+  fi
+
+  return 0


unnecessary

Dropping it.

edsantiago · 2022-04-07T16:02:57Z

tests/helpers.bash

+  # wait for it to start logging things, then give it a second more
+  local waited=0
+  while ! test -s "${TESTDIR}"/registry/registry.log ; do
+    if test $waited -ge $BUILDAH_TIMEOUT ; then
+      break
+    fi
+    sleep 1
+    waited=$((${waited}+1))
+  done
+  sleep 1


I'm not a big fan of wait-loops that simply time out without failing. If you're going to time out, I like it big & bold. Would you consider:

# record the coprocess's ID and try to parse the listening port from the log # we're separating all of this from the storage for any test that might call # this function and using vfs to minimize the cleanup required REGISTRY_PID="${COPROC_PID}" REGISTRY_DIR="${TESTDIR}"/registry local waited=0 REGISTRY_PORT= while [[ -z "$REGISTRY_PORT" ]]; do if [[ $waited -gt $BUILDAH_TIMEOUT ]]; then echo "Could not determine listening port from log:" sed -e 's/^/ >/' <${REGISTRY_DIR}/registry.log false fi waited=$((waited+1)) sleep 1 REGISTRY_PORT=$(sed -ne 's^.*listening on.*:$[0-9]\+$,.*^\1^p' <${REGISTRY_DIR}/registry.log) done

Taking this almost verbatim.

edsantiago · 2022-04-07T16:03:36Z

tests/helpers.bash

+  fi
+
+  # push the registry image we just started... to itself, as a confidence check
+  if ! ${BUILDAH_BINARY} --storage-driver vfs --root "${REGISTRY_DIR}"/root --runroot "${REGISTRY_DIR}"/run push --cert-dir "${REGISTRY_DIR}" --creds "${1:-testuser}":"${2:-testpassword}" "${REGISTRY_IMAGE}" localhost:"${REGISTRY_PORT}"/registry; then


Another argument for declaring local testuser/testpass variables at function start

edsantiago

LGTM with a few suggestions. Thanks for doing this, it greatly simplifies our gating-test setup.

When a test needs to talk to a registry server, launch one as part of the test rather than depending on it having been started by someone else. Use run_buildah where we used to use 'run buildah' without checking the return code, and in a few cases where we did check it. In the "from with non buildah container" test, use "podman create" with host networking, in an attempt to avoid messing with networking in cases where we're running on a system with a version of podman that will create a bridge with CNI that we'll also create with netavark. We're not sharing storage between the two invocations, so the logic that tries to detect this problem won't detect it. Signed-off-by: Nalin Dahyabhai <[email protected]>

rhatdan · 2022-04-08T13:49:25Z

/approve
/lgtm

openshift-ci · 2022-04-08T13:49:32Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: nalind, rhatdan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [nalind,rhatdan]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci bot added kind/other approved labels Apr 1, 2022

nalind force-pushed the test-registry branch from ecb5ed5 to b3351d2 Compare April 1, 2022 21:25

TomSweeneyRedHat reviewed Apr 2, 2022

View reviewed changes

nalind force-pushed the test-registry branch 3 times, most recently from 450122d to 9b11911 Compare April 4, 2022 17:02

nalind force-pushed the test-registry branch from 9b11911 to 4510c85 Compare April 4, 2022 21:11

nalind force-pushed the test-registry branch 4 times, most recently from 75ed189 to 24481f5 Compare April 5, 2022 17:19

TomSweeneyRedHat reviewed Apr 5, 2022

View reviewed changes

nalind force-pushed the test-registry branch 2 times, most recently from e9962ee to 3135ab8 Compare April 6, 2022 17:43

edsantiago reviewed Apr 7, 2022

View reviewed changes

nalind force-pushed the test-registry branch from 3135ab8 to b4c4926 Compare April 7, 2022 20:50

openshift-ci bot assigned rhatdan Apr 8, 2022

openshift-ci bot added the lgtm label Apr 8, 2022

openshift-merge-robot merged commit 2ea2c07 into containers:main Apr 8, 2022

nalind deleted the test-registry branch April 8, 2022 13:55

github-actions bot added the locked - please file new issue/PR label Sep 10, 2023

github-actions bot locked as resolved and limited conversation to collaborators Sep 10, 2023

integration tests: learn to start a dummy registry #3894

integration tests: learn to start a dummy registry #3894

Uh oh!

Conversation

nalind commented Apr 1, 2022

What type of PR is this?

What this PR does / why we need it:

How to verify it

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

edsantiago commented Apr 4, 2022

Uh oh!

nalind commented Apr 4, 2022

Uh oh!

TomSweeneyRedHat commented Apr 5, 2022

Uh oh!

edsantiago commented Apr 5, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TomSweeneyRedHat commented Apr 5, 2022

Uh oh!

TomSweeneyRedHat commented Apr 6, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!