-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Java jobs using ubuntu-latest get cancelled/skipped midway with no logs available #1491
Comments
@jaikiran , thank you for report. It definitely something different from MacOS issue and I have never seen the same reports for ubuntu before and we have not changed infrastructure somehow. We will take a look. A few questions:
|
Hello @maxim-lobanov,
It's (relatively) big and it does do things like:
We started observing this at least 3 weeks back (around 22 to 23 days back somewhere around 2nd August), I think. @gsmet who keeps a more detailed eye on these PRs in that project can confirm this or correct me if that isn't an accurate timeline.
From what I can gather, we rarely saw this with |
By the way, this is what the workflow file looks like https://github.com/quarkusio/quarkus/blob/master/.github/workflows/ci-actions.yml (the one which triggers those 31 odd jobs) and the jobs that fail is defined here https://github.com/quarkusio/quarkus/blob/master/.github/workflows/ci-actions.yml#L100 (that file has seen some minor changes the past few days, but in its current form too it runs into the issue) |
One of project members just replied on the Quarkus dev mailing list[1]:
So it looks like these jobs run into resource issues (in our case the memory and swap) and get initally marked as cancelled/skipped and finally error out. So I think this now comes down to:
[1] https://groups.google.com/d/msg/quarkus-dev/yfbiBPjm6cM/JJEMSX9pAAAJ |
Such issues can't be reported better for now. From backend side, this case looks like VM has stopped heartbeat and stopped receiving any messages. We do some internal work to improve such issues handling (freeze such machines, get all necessary logs and etc) but unfortunately, I can't provide any ETA since this work is still in progress |
Thank for looking into this @maxim-lobanov
That's alright, I'll just keep a watch on this issue for the progress. Given that we have narrowed this down to the root cause, we are taking some measures to try and prevent this in the meantime in the Quarkus project. |
I am closing this issue for now and we will announce separately when the work to improve handling of exhausted machines is done. |
We managed to solve this issue with doing changes to our workflow jobs and fixing a few other things. So we are no longer running into resource exhaustion on these VMs. Thank you for your help on this issue. |
I have met the same problem.When I tried to build the app written by kotlin,it also throw the error "The runner has received a shutdown signal. This can happen when the runner service is stopped, or a manually started runner is canceled.",and it do not give a detailed log. |
The past few weeks, one of the projects that I contribute to (https://github.com/quarkusio/quarkus/) has been running into odd issues with the GitHub actions. The project uses GitHub actions to trigger a workflow on PR creation, which internally triggers around 31 jobs. Most of these jobs succeed. However, there are 2-3 jobs which are exhibiting odd failures the past few weeks. All our investigation so far has been almost blocked by the lack of logs from these jobs when they fail. The symptoms we have seen so far is the following:
This issue has practically made it almost impossible for us to validate PRs for the past few weeks.
All these jobs are Java jobs (i.e. they run Java applications and tests) and all of these use "ubuntu-latest".
@maxim-lobanov, I see a similar issue reported for MacOS runners here #736 and I see that you asked users to report if they still see this issue. I didn't want to add these details there since this is
ubuntu-latest
images, so I created this one.If you want to see a sample PR which exhibited this issue (which we anyway went ahead and merged), then here's one quarkusio/quarkus#11512 (I'm sure you know this, but if not, then to check these jobs that were run and failed, click on the "View Details" at the end of PR and select the "JDK Java8 JVM Tests" job for details and logs). If you look at the recent PRs in that repo almost all are affected by this. We have a discussion going on in our mailing list too, if it helps to set some context https://groups.google.com/d/msg/quarkus-dev/yfbiBPjm6cM/G1Na5I0TBgAJ
The text was updated successfully, but these errors were encountered: