-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Checkpoint/Restore test fixes #11955
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Checkpoint/Restore test fixes #11955
Conversation
|
@edsantiago PTAL |
|
My only concern is that perhaps the 3-second timeout (w/ 5 retries) is too long. However, the code also isn't waiting for a 'ready' signal from the container (can it?), and the |
ee739d7 to
40ef16c
Compare
|
Switched to |
edsantiago
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, one suggestion.
|
LGTM |
40ef16c to
b33845f
Compare
|
/lgtm |
|
/hold |
|
/hold cancel |
jwhonce
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issues I see are mostly modifying a known good port. Those new ports could be used elsewhere which cause CI flakes. Generate all "new" port numbers using GetRandomPort()
test/e2e/checkpoint_test.go
Outdated
| fmt.Fprintf(os.Stderr, "Trying to connect to redis server at localhost:%d", orginalPort) | ||
| // Open a network connection to the redis server via initial port mapping | ||
| conn, err := net.DialTimeout("tcp4", fmt.Sprintf("localhost:%d", orginalPort), time.Duration(3)*time.Second) | ||
| Expect(err).To(BeNil()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Expect(err).To(BeNil()) | |
| Expect(err).ShouldNot(HaveOccurred()) |
test/e2e/checkpoint_test.go
Outdated
| } | ||
| fmt.Fprintf(os.Stderr, "Trying to reconnect to redis server at localhost:%d", orginalPort+100) | ||
| conn, err = net.DialTimeout("tcp4", fmt.Sprintf("localhost:%d", orginalPort+100), time.Duration(3)*time.Second) | ||
| Expect(err).To(BeNil()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Expect(err).To(BeNil()) | |
| Expect(err).ShouldNot(HaveOccurred()) |
test/e2e/checkpoint_test.go
Outdated
| localRunString := getRunString([]string{"-p", "1234:6379", "--rm", redis}) | ||
| port, err := utils.GetRandomPort() | ||
| Expect(err).ShouldNot(HaveOccurred()) | ||
| orginalPort := port + GinkgoParallelProcess() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused, you get a random port (that is "safe") and then add something to it?
test/e2e/checkpoint_test.go
Outdated
| Fail("Container failed to get ready") | ||
| } | ||
|
|
||
| fmt.Fprintf(os.Stderr, "Trying to connect to redis server at localhost:%d", orginalPort) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debugging information?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so, the test was hanging just after this point, so knowing what it was trying to do may also be helpful in the future. Ginkgo doesn't give us this detail otherwise 😕
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly. It was not added for debugging just for better feedback during the test to know what is happening.
test/e2e/checkpoint_test.go
Outdated
|
|
||
| // Restore container with different port mapping | ||
| result = podmanTest.Podman([]string{"container", "restore", "-p", "1235:6379", "-i", fileName}) | ||
| result = podmanTest.Podman([]string{"container", "restore", "-p", fmt.Sprintf("%d:6379", orginalPort+100), "-i", fileName}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, adding a number to a "safe" port?
test/e2e/checkpoint_test.go
Outdated
| // Open a network connection to the redis server via initial port mapping | ||
| // This should fail | ||
| conn, err = net.Dial("tcp", "localhost:1234") | ||
| conn, err = net.DialTimeout("tcp4", fmt.Sprintf("localhost:%d", orginalPort), time.Duration(3)*time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to steal this one. :-)
Oooohhhh, I understand. So the concern is if this test goes south/hangs/leaks, another test might also ask for a port and receive the same one (since it was |
Moving to Fedora 35 showed test failures (time outs) in the test
"podman checkpoint and restore container with different port mappings"
The test starts a container and maps the internal port 6379 to the local
port 1234 ('-p 1234:6379') and then tries to connect to localhost:1234
On Fedora 35 this failed and blocked the test because the container was
not yet ready. The test was trying to connect to localhost:1234 but
nothing was running there. So the error was not checkpointing related.
Before trying to connect to the container the test is now waiting for
the container to be ready.
Another problem with this test and running ginkgo in parallel was that
it was possible that the port was already in use. Now for each run a
random port is selected to decrease the chance of collisions.
Signed-off-by: Adrian Reber <[email protected]>
b33845f to
8439a6d
Compare
|
@jwhonce Thanks for the review. I adapted the PR to fix what you commented on. Besides the debug output which was added to know that the test is trying to connect to the container. I can remove it but I think it can be helpful understanding what is going on. |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: adrianreber, cevich, jwhonce The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Thanks everyone. |
Moving to Fedora 35 showed test failures (time outs) in the test
"podman checkpoint and restore container with different port mappings"
The test starts a container and maps the internal port 6379 to the local
port 1234 ('-p 1234:6379') and then tries to connect to localhost:1234
On Fedora 35 this failed and blocked the test because the container was
not yet ready. The test was trying to connect to localhost:1234 but
nothing was running there. So the error was not checkpointing related.
Another problem with this test and running ginkgo in parallel was that
it was possible that the port was already in use. Now for each run a
random port is selected to decrease the chance of collisions.
This tries to fix problems seen in #11795
CC: @cevich