Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UPSTREAM: 23894: OOM errors when processes exit rapidly #8412

Merged
merged 1 commit into from
Apr 12, 2016

Conversation

smarterclayton
Copy link
Contributor

This is on the bubble for 1.2 but I wanted to see if it helps clear up our failures

@ncdc @derekwaynecarr @liggitt [test]

@smarterclayton
Copy link
Contributor Author

[test]

@ncdc
Copy link
Contributor

ncdc commented Apr 8, 2016

@smarterclayton upstream PR was updated and hopefully will fix the type issues in all the right places (by removing the type assertions).

@smarterclayton
Copy link
Contributor Author

Updated against upstream

@ncdc
Copy link
Contributor

ncdc commented Apr 11, 2016

Conformance tests failed, again with the update-demo scaling an RC test. But it doesn't appear to be the same failure text. Still tracking it down. But I did notice that the docker.log that's captured in the jenkins artifacts isn't complete. For example, the failing test created its first container that had an issue at 12:04, but the contents of docker.log starts at 12:08 😢

@smarterclayton
Copy link
Contributor Author

#8441 is the other failure.

@smarterclayton
Copy link
Contributor Author

You can extend the docker log time. Going up to 30m is probably fine.

@ncdc
Copy link
Contributor

ncdc commented Apr 11, 2016

How do we do that?

I'll spin up a rhel7 vm in ec2 to try to repro manually.

@smarterclayton
Copy link
Contributor Author

It's in the test failure trap where we shut down the server - we grab the docker logs from the journal in hack/test-end-to-end-docker.sh

@smarterclayton
Copy link
Contributor Author

[test]

@smarterclayton
Copy link
Contributor Author

Updated

@ncdc
Copy link
Contributor

ncdc commented Apr 11, 2016

Because we run tests in parallel, each test's namespace needs to be added to the various SCCs to ensure upstream e2es can pass against OpenShift's security model. It looks like that code was resulting in each namespace stomping on the other namespaces such that only a single e2e namespace at a time was ever a member of the various SCCs.

#8465 should fix this issue.

@smarterclayton
Copy link
Contributor Author

smarterclayton commented Apr 11, 2016 via email

@spadgett
Copy link
Member

@smarterclayton looking at it

@spadgett
Copy link
Member

@jwforres new font-awesome update today is breaking us

@smarterclayton
Copy link
Contributor Author

[test]

@smarterclayton
Copy link
Contributor Author

Flaked #8399 [test]

On Mon, Apr 11, 2016 at 7:50 PM, OpenShift Bot [email protected]
wrote:

continuous-integration/openshift-jenkins/test FAILURE (
https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/2907/)


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#8412 (comment)

@openshift-bot
Copy link
Contributor

Evaluated for origin test up to 9ec799d

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/test FAILURE (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/2913/)

@smarterclayton
Copy link
Contributor Author

Flaked on

Apr 11 20:39:02.217: INFO: Error running &{/data/src/github.com/openshift/origin/_output/local/bin/linux/amd64/oc [oc create --namespace=extended-test-scoped-router-27a6z-0lea8 --config=/tmp/openshift-extended-tests/extended-test-scoped-router-27a6z-0lea8-user.kubeconfig -f /data/src/github.com/openshift/origin/test/extended/fixtures/scoped-router.yaml] []   Error from server: User "extended-test-scoped-router-27a6z-0lea8-user" cannot create pods in project "extended-test-scoped-router-27a6z-0lea8"

@smarterclayton
Copy link
Contributor Author

Have not seen OOMs reoccur - [merge]

@smarterclayton
Copy link
Contributor Author

I think that flake is an extended flake w.r.t. the policy cache falling behind. Not sure though - @deads2k?

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/merge_pull_requests_origin/5568/) (Image: devenv-rhel7_3953)

@openshift-bot
Copy link
Contributor

Evaluated for origin merge up to 9ec799d

@openshift-bot openshift-bot merged commit 5aa33d1 into openshift:master Apr 12, 2016
@deads2k
Copy link
Contributor

deads2k commented Apr 12, 2016

I think that flake is an extended flake w.r.t. the policy cache falling behind. Not sure though - @deads2k?

It's likely. We have a method WaitForPolicyUpdate to avoid that problem during our integration tests.

@ncdc
Copy link
Contributor

ncdc commented Apr 12, 2016

@pecameron this already merged. No need to re-test.

@deads2k
Copy link
Contributor

deads2k commented Apr 12, 2016

@pecameron this already merged. No need to re-test.

Man, he meant it too. :)

@liggitt liggitt mentioned this pull request Apr 16, 2016
85 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants