-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI] Centos docker builds failing during yum install #38832
Comments
Pinging @elastic/es-core-infra |
@tvernum thanks for creating this. Ironically enough it seems that this is happening on CentOS-7 workers attempting to I have been unable to reproduce this using a simple reproduction inside a clean CentOS 7 vagrant box with 20 iterations. Attempted reproduction script
Spinning up a worker similar to the one used in the failures above I observed that the base OS has certain images already present (incl. centos7) and the above reproduction script was basically a no-op; while this speeds up things, it might be the reason behind some stale things. |
Looked at this again. It mostly happens on centos-7 workers but I also spotted it happening on a Ubuntu kibana-ci, devops-ci Ubuntu (beats) immutable workers and even on an openSUSE Leap 42.3 non immutable worker, so it doesn't look specifically related to centos-7 workers. The history seems to start on Jan 17 this year; this doesn’t really chronologically correspond to the packer cache script fix pr in https://github.com/elastic/elasticsearch/pull/38023/files. Had a brainstorming session with @atorok on some ideas. Alpar pointed out that the failing jobs were for non master branches hence not benefiting from caching. He'll work on a simple PR to extend the caching for older versions. I looked at our Dockerfile and as an additional action we can:
Hopefully with all the above steps combined we will get rid of the noise. |
As the Dockerfile evolved we don't need anymore certain commands like `unzip`, `which` and `wget` allowing us to slightly string the images too. Relates elastic#38832
Re-occurred in builds: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.7+periodic/89/console
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+7.0+periodic/89/console
|
@benwtrent can we please add at least the relevant build output here to help investigating? Jenkins links are usually invalid after a few weeks. While it is still possible to get the build output it is much more trouble to go to build-stats and dig for the respective logs. |
@danielmitterdorfer I updated my comment to include the snippet of the timeout failure. I initially did not add them as it did not provide any more information than what was already included in this issue. |
@atorok this issue seems to persist e.g. in https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.7+matrix-java-periodic/ES_BUILD_JAVA=java11,ES_RUNTIME_JAVA=java11,nodes=immutable&&linux&&docker/40/console. AFAICS updated base images incl. caching support have been building successfully. This is a list of failures the past few days: WDYT, should we be more resilient in all cases by introducing the retry commit: 6f52008? |
I just checked and we do call the relevant task when building the image I'm not against the retries, just looking to understand this a bit better as we also don't seem to be getting all the benefits of the caching. |
I think it would be better to use |
I checked and |
Another occurrence has been raised in #40205. The Docker caches didn't seem to get honored. During a team discussion with @danielmitterdorfer / @rjernst and @mark-vieira we thought it may make sense to bring back this yum retries commit: 6f52008 and backport it back to 7.x/7.0 and even 6.7. |
yum has a retries setting in yum.conf right? Why don't these retries work? |
We should do the retires. I looked into it and the caching we do on images won't help us with this as I initially taught as we copy the Dockerfile ( even if we weren't the one used for the cache is still a different checkout ). We also have changing dependencies, as we just built the distirbution so we do want to regenerate the image. |
Yes there is |
I raised #40349 resurrecting the retries |
#40349 (retries for yum commands in Dockerfile with in-between sleep period) has been merged, I'll close this out now. If the problem surfaces again, feel free to re-open. |
I don't know if there's any solution to this, but I didn't want the issue to just get lost in build noise.
We seem to have recurring failures building the docker image on Centos because yum fails to retrieve the mirror list.
Feb 13 https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+7.x+matrix-java-periodic/ES_BUILD_JAVA=java11,ES_RUNTIME_JAVA=java11,nodes=centos-7&&immutable&&linux&&docker/14/console
Feb 11 https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+7.x+periodic/30/console
Feb 10 https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+matrix-java-periodic/ES_BUILD_JAVA=openjdk12,ES_RUNTIME_JAVA=zulu11,nodes=immutable&&linux&&docker/233/console
The text was updated successfully, but these errors were encountered: