Skip to content

Conversation

@nalind
Copy link
Member

@nalind nalind commented Apr 1, 2022

What type of PR is this?

/kind other

What this PR does / why we need it:

When a test needs to talk to a registry server, launch one as part of the test rather than depending on it having been started by someone else.

Use run_buildah where we used to use 'run buildah' without checking the return code, and in a few cases where we did check it.

In the "from with non buildah container" test, use "podman create" with host networking, in an attempt to avoid messing with networking in cases where we're running on a system with a version of podman that will create a bridge with CNI that we'll also create with netavark. We're not sharing storage between the two invocations, so the logic that tries to detect this problem won't detect it.

How to verify it

Updates to tests!

Which issue(s) this PR fixes:

None

Special notes for your reviewer:

This gets us closer to being able to run integration tests anywhere by pointing bats at our tests directory.

Does this PR introduce a user-facing change?

None

# Sets REGISTRY_PID, REGISTRY_PORT (to append to "localhost:"), and
# REGISTRY_DIR (where the CA cert can be found) on success.
function start_registry() {
local REGISTRY_IMAGE=docker.io/library/registry:2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may call this a fair number of times during the tests and I wonder if we'll hit the rate limits. Would it make sense to have our own registry image tucked away on quay.io somewhere? Do they have one for Podman already?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not against that, but I have no idea if we have a plan in place to keep any of those up to date. The start_registry function uses _prefetch, so it should be pulled, at most, once in a given CI job.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using quay.io/libpod/registry:2.6. I recall that an update to the one on Docker Hub once broke gating.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using 2.7 now. 2.6 and 2.7 are not multi-arch, and 2.8 isn't in that repository. How/when do they get updated?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think they're under the loving care of @edsantiago

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2.8 copy in progress; it'll take a long time (30m?) due to my slow network.

$ skopeo copy --all docker://docker.io/registry:2.8 docker://quay.io/libpod/registry:2.8

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@edsantiago, feel free to ping me next time. Still enjoying fast French fibre :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done:

$ buildah manifest inspect quay.io/libpod/registry:2.8 | jq '.manifests[].platform.architecture'
"amd64"
"arm"
"arm"
"arm64"
"ppc64le"
"s390x"

@nalind nalind force-pushed the test-registry branch 3 times, most recently from 450122d to 9b11911 Compare April 4, 2022 17:02
@edsantiago
Copy link
Member

This one is failing on setxattr (without the l)

@nalind
Copy link
Member Author

nalind commented Apr 4, 2022

This one is failing on setxattr (without the l)

This was trying to re-enable a test that was previously disabled under SELinux, but I guess we didn't work around it after all.

@TomSweeneyRedHat
Copy link
Member

The integration test is failing with one that I've not seen before:

[+1249s] not ok 286 bud-multiple-platform-no-run
...
[+1249s] # 9505e448a8192a7713bf10737028537f74f36a1fb89298e8b4914548eaca053c
[+1249s] # --> dc60ecdbd35
[+1249s] # dc60ecdbd35d1c5a9e89efd7f77f8c820d97643f3a5e4c03e2bf4d23e670a5b3
[+1249s] # error creating build container: writing blob: adding layer with blob "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1": ApplyLayer exit status 1 stdout:  stderr: Error making old root private after pivot: no such file or directory
[+1249s] # [ rc=125 (** EXPECTED 0 **) ]

Later, it can't find containers.conf in a few tests:

[+1532s] ok 364 containers.conf selinux test
[+1532s] time="2022-04-04T16:56:02-05:00" level=warning msg="Error loading container config when searching for local runtime: finding config on system: CONTAINERS_CONF file: stat /var/tmp/go/src/github.com/containers/buildah/tests/./containers1.conf: no such file or directory"
[+1532s] time="2022-04-04T16:56:02-05:00" level=error msg="failed to setup From and Build flags: failed to get container config: finding config on system: CONTAINERS_CONF file: stat /var/tmp/go/src/github.com/containers/buildah/tests/./containers1.conf: no such file or directory"

@edsantiago
Copy link
Member

@TomSweeneyRedHat that's our nemesis, #3710

@nalind nalind force-pushed the test-registry branch 4 times, most recently from 75ed189 to 24481f5 Compare April 5, 2022 17:19
tests/copy.bats Outdated
@test "copy-preserving-extended-attributes" {
createrandom ${TESTDIR}/randomfile
image="quay.io/libpod/fedora-minimal:34"
image="registry.fedoraproject.org/fedora-minimal:35"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A thought for later. Should we define ${FEDORA_MINIMAL} or some such in helpers.bat? That way if we need to bump from 35 to 36, we only have one place to do it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could as likely have left the version tag off of the image spec, since the test doesn't care about it as much, but don't let me stop you.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with omitting the version spec is that the next (or next-next) version will introduce some sort of breakage that will cause us to scramble in a panic. . Example: containers/podman#12343

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See also #3640. If there is a critical need to use f35 here, can we push a copy to quay instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't critical that it be Fedora or Fedora-based, more that it have working setcap, getfattr, and setfattr commands. I'll add comment there to try to clarify that. If you've another image in mind that would work better, I'm happy to switch to using it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question was, is fedora-minimal:34 not working? If it no longer works, can we push a new image to quay? If it still works, can we keep it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it will. I switched to 35 mainly because I know 34 is going EOL in a little over a month. Changing it back to 34.

@TomSweeneyRedHat
Copy link
Member

LGTM

@nalind nalind force-pushed the test-registry branch 2 times, most recently from e9962ee to 3135ab8 Compare April 6, 2022 17:43
@TomSweeneyRedHat
Copy link
Member

Happy green test buttons.

Comment on lines 617 to 620
local htpasswd='testuser:$2y$05$OW6Qlf1ygRuqJxt/CYcBq.0MZARYLIvr.mgjksw2m7K4cwWkw3Pda'
if test "$1" = testuserfoo && test "$2" = testpassword ; then
htpasswd='testuserfoo:$2y$05$Pkly/9sA8iGY..2SaerFH.wr4qGPPaGZ66KMrvVmcCP8gaRPpeoeC'
elif test "${1:-testuser}" != testuser && test "${2:-testpassword}" != testpassword ; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this just a complicated way of saying:

    local testuser="${1:-testuser}"
    local testpass="${2:-testpassword}"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That whole section probably makes more sense as a case statement. Reworking it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I don't understand is, why the special case at all?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an attempt to sort of future-proof that, but thinking on it more, it's better to just avoid needing htpasswd, so I'll drop that case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

htpasswd is already a hard requirement for the buildah-tests package, in both fedora and rhel, because a registry is already mandatory for running tests (and has been from day one).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use previously-computed hash values for passwords that we hard-code in the tests, we'll no longer need to call out to htpasswd when the tests are run. Are you saying that's harmful? I'm missing how that would be.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the hardcoding I most object to: although right now all tests use testuser/testpassword, we have the option (and should take advantage of it) of using randomly-generated user and password which would improve confidence in testing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We check in "authenticate: cert and credentials" that we can successfully authenticate to the registry with a known-good username/password pair, and that we get an error when we intentionally supply values that we know the registry isn't configured to accept. What types of bugs do you see us catching by randomly selecting unique known-good values for each test?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly my paranoia: with static values (credentials, input/output test strings) there's always a chance that you're not really testing what you think you're testing: that there's a leftover process somewhere and you're talking to it instead of the registry/httpserver/whatever that you think you're talking to. With unique values that worry is diminished.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I still like the idea of reducing the set of requirements that someone needs to have installed in order to run the tests, and it turns out we can generate the hashes for passwords ourselves by wrapping the right function call. I'll add that and switch the "authenticate: cert and credentials" test to randomize them.

Comment on lines 623 to 624
echo error computing hashed password
exit 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about die "Error computing hashed password"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That'll work. Changing it.

Comment on lines 636 to 637
echo error creating new key and certificate
return 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise, die might be better

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing it.

echo error determining listening port from log:
cat ${TESTDIR}/registry/registry.log
stop_registry
return 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just plain false is enough, that will abort bats for the given test

(sorry for all the single-comments; I'm just worried that this will merge too early. Feel free to batch up my comments until I say "done with review")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing it. No worries, I'll try to make sure it's all sorted before the next rebase.

if ! ${BUILDAH_BINARY} --storage-driver vfs --root "${REGISTRY_DIR}"/root --runroot "${REGISTRY_DIR}"/run push --cert-dir "${REGISTRY_DIR}" --creds "${1:-testuser}":"${2:-testpassword}" "${REGISTRY_IMAGE}" localhost:"${REGISTRY_PORT}"/registry; then
echo error pushing to /registry repository at localhost:$REGISTRY_PORT
stop_registry
return 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

likewise, false

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing it.

return 1
fi

return 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropping it.

Comment on lines 652 to 661
# wait for it to start logging things, then give it a second more
local waited=0
while ! test -s "${TESTDIR}"/registry/registry.log ; do
if test $waited -ge $BUILDAH_TIMEOUT ; then
break
fi
sleep 1
waited=$((${waited}+1))
done
sleep 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a big fan of wait-loops that simply time out without failing. If you're going to time out, I like it big & bold. Would you consider:

  # record the coprocess's ID and try to parse the listening port from the log
  # we're separating all of this from the storage for any test that might call
  # this function and using vfs to minimize the cleanup required
  REGISTRY_PID="${COPROC_PID}"
  REGISTRY_DIR="${TESTDIR}"/registry
  local waited=0
  REGISTRY_PORT=
  while [[ -z "$REGISTRY_PORT" ]]; do
    if [[ $waited -gt $BUILDAH_TIMEOUT ]]; then
      echo "Could not determine listening port from log:"
      sed -e 's/^/  >/' <${REGISTRY_DIR}/registry.log
      false
    fi
    waited=$((waited+1))

    sleep 1
    REGISTRY_PORT=$(sed -ne 's^.*listening on.*:\([0-9]\+\),.*^\1^p' <${REGISTRY_DIR}/registry.log)
  done

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking this almost verbatim.

fi

# push the registry image we just started... to itself, as a confidence check
if ! ${BUILDAH_BINARY} --storage-driver vfs --root "${REGISTRY_DIR}"/root --runroot "${REGISTRY_DIR}"/run push --cert-dir "${REGISTRY_DIR}" --creds "${1:-testuser}":"${2:-testpassword}" "${REGISTRY_IMAGE}" localhost:"${REGISTRY_PORT}"/registry; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another argument for declaring local testuser/testpass variables at function start

Copy link
Member

@edsantiago edsantiago left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a few suggestions. Thanks for doing this, it greatly simplifies our gating-test setup.

When a test needs to talk to a registry server, launch one as part of
the test rather than depending on it having been started by someone
else.

Use run_buildah where we used to use 'run buildah' without checking the
return code, and in a few cases where we did check it.

In the "from with non buildah container" test, use "podman create" with
host networking, in an attempt to avoid messing with networking in cases
where we're running on a system with a version of podman that will
create a bridge with CNI that we'll also create with netavark.  We're
not sharing storage between the two invocations, so the logic that tries
to detect this problem won't detect it.

Signed-off-by: Nalin Dahyabhai <[email protected]>
@rhatdan
Copy link
Member

rhatdan commented Apr 8, 2022

/approve
/lgtm

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 8, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: nalind, rhatdan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 2ea2c07 into containers:main Apr 8, 2022
@nalind nalind deleted the test-registry branch April 8, 2022 13:55
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 10, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants