[CI] Centos docker builds failing during yum install #38832

tvernum · 2019-02-13T07:35:33Z

I don't know if there's any solution to this, but I didn't want the issue to just get lost in build noise.

We seem to have recurring failures building the docker image on Centos because yum fails to retrieve the mirror list.

Feb 13 https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+7.x+matrix-java-periodic/ES_BUILD_JAVA=java11,ES_RUNTIME_JAVA=java11,nodes=centos-7&&immutable&&linux&&docker/14/console

05:06:48 Cannot find a valid baseurl for repo: base/7/x86_64
05:06:48 �[0mCould not retrieve mirrorlist http://mirrorlist.centos.org/?release=7&arch=x86_64&repo=os&infra=container error was
05:06:48 12: Timeout on http://mirrorlist.centos.org/?release=7&arch=x86_64&repo=os&infra=container: (28, 'Operation too slow. Less than 1000 bytes/sec transferred the last 30 seconds')
05:06:49 
05:06:49 The command '/bin/sh -c yum install -y unzip which' returned a non-zero code: 1
05:06:49

Feb 11 https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+7.x+periodic/30/console

11:12:05 Cannot find a valid baseurl for repo: base/7/x86_64
11:12:05 �[0mCould not retrieve mirrorlist http://mirrorlist.centos.org/?release=7&arch=x86_64&repo=os&infra=container error was
11:12:05 14: curl#7 - "Failed to connect to 2604:1580:fe02:2::10: Network is unreachable"
11:12:05 The command '/bin/sh -c yum install -y unzip which' returned a non-zero code: 1
11:12:05 
11:12:05

Feb 10 https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+matrix-java-periodic/ES_BUILD_JAVA=openjdk12,ES_RUNTIME_JAVA=zulu11,nodes=immutable&&linux&&docker/233/console

05:07:21 Cannot find a valid baseurl for repo: base/7/x86_64
05:07:21 �[0mCould not retrieve mirrorlist http://mirrorlist.centos.org/?release=7&arch=x86_64&repo=os&infra=container error was
05:07:21 12: Timeout on http://mirrorlist.centos.org/?release=7&arch=x86_64&repo=os&infra=container: (28, 'Operation too slow. Less than 1000 bytes/sec transferred the last 30 seconds')
05:07:21 The command '/bin/sh -c yum update -y &&     yum install -y nc unzip wget which &&     yum clean all' returned a non-zero code: 1
05:07:21 
05:07:21 > Task
```

The text was updated successfully, but these errors were encountered:

elasticmachine · 2019-02-13T07:35:34Z

Pinging @elastic/es-core-infra

dliappis · 2019-02-13T17:16:07Z

@tvernum thanks for creating this. Ironically enough it seems that this is happening on CentOS-7 workers attempting to yum install inside the container based on centos:7.

I have been unable to reproduce this using a simple reproduction inside a clean CentOS 7 vagrant box with 20 iterations.

Attempted reproduction script

$ cat Dockerfile
FROM centos:7 AS builder
ENV PATH /usr/share/elasticsearch/bin:$PATH
ENV JAVA_HOME /opt/jdk-11.0.2
RUN curl --retry 8 -s https://download.java.net/java/GA/jdk11/9/GPL/openjdk-11.0.2_linux-x64_bin.tar.gz | tar -C /opt -zxf -
RUN ln -sf /etc/pki/ca-trust/extracted/java/cacerts /opt/jdk-11.0.2/lib/security/cacerts
RUN yum install -y unzip which

for i in {1..20}; do docker build -t testimage . && echo ">>>> Test $i successful"; docker rmi testimage; done

Spinning up a worker similar to the one used in the failures above I observed that the base OS has certain images already present (incl. centos7) and the above reproduction script was basically a no-op; while this speeds up things, it might be the reason behind some stale things.

dliappis · 2019-02-15T10:10:33Z

Looked at this again.

It mostly happens on centos-7 workers but I also spotted it happening on a Ubuntu kibana-ci, devops-ci Ubuntu (beats) immutable workers and even on an openSUSE Leap 42.3 non immutable worker, so it doesn't look specifically related to centos-7 workers. The history seems to start on Jan 17 this year; this doesn’t really chronologically correspond to the packer cache script fix pr in https://github.com/elastic/elasticsearch/pull/38023/files.

Had a brainstorming session with @atorok on some ideas.

Alpar pointed out that the failing jobs were for non master branches hence not benefiting from caching. He'll work on a simple PR to extend the caching for older versions.

I looked at our Dockerfile and as an additional action we can:

Remove yum install -y unzip which from the stage 0 image -- not needed for some time now.
Clean up the yum section in stage 1; we still need yum update -y but I'll add a retry loop with some sleep to help even further.

Hopefully with all the above steps combined we will get rid of the noise.

As the Dockerfile evolved we don't need anymore certain commands like `unzip`, `which` and `wget` allowing us to slightly string the images too. Relates elastic#38832

benwtrent · 2019-02-21T14:21:21Z

Re-occurred in builds:

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.7+periodic/89/console

02:06:40 Cannot find a valid baseurl for repo: base/7/x86_64
02:06:40 �[0mCould not retrieve mirrorlist http://mirrorlist.centos.org/?release=7&arch=x86_64&repo=os&infra=container error was
02:06:40 14: curl#7 - "Failed to connect to 2607:f8f8:700:12::10: Network is unreachable"

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+7.0+periodic/89/console

04:06:23 Cannot find a valid baseurl for repo: base/7/x86_64
04:06:23 �[0mCould not retrieve mirrorlist http://mirrorlist.centos.org/?release=7&arch=x86_64&repo=os&infra=container error was
04:06:23 12: Timeout on http://mirrorlist.centos.org/?release=7&arch=x86_64&repo=os&infra=container: (28, 'Operation too slow. Less than 1000 bytes/sec transferred the last 30 seconds')

danielmitterdorfer · 2019-02-22T09:27:12Z

@benwtrent can we please add at least the relevant build output here to help investigating? Jenkins links are usually invalid after a few weeks. While it is still possible to get the build output it is much more trouble to go to build-stats and dig for the respective logs.

benwtrent · 2019-02-22T14:45:20Z

@danielmitterdorfer I updated my comment to include the snippet of the timeout failure. I initially did not add them as it did not provide any more information than what was already included in this issue.

dliappis · 2019-02-26T15:38:07Z

@atorok this issue seems to persist e.g. in https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.7+matrix-java-periodic/ES_BUILD_JAVA=java11,ES_RUNTIME_JAVA=java11,nodes=immutable&&linux&&docker/40/console. AFAICS updated base images incl. caching support have been building successfully.

This is a list of failures the past few days:

WDYT, should we be more resilient in all cases by introducing the retry commit: 6f52008?

alpar-t · 2019-02-27T07:27:55Z

I just checked and we do call the relevant task when building the image [7.1.0] [6.7.0] > Task :distribution:docker:pullFixture I think we build a different image for the fixture ( since we do it trough docker compose ) than our regular build, I taught that docker would cache and reuse the layers, but this doesn't seem to be the case.

I'm not against the retries, just looking to understand this a bit better as we also don't seem to be getting all the benefits of the caching.

alpar-t · 2019-02-27T07:35:09Z

I think it would be better to use image: "docker.elastic.co/elasticsearch/elasticsearch-oss:8.0.0-SNAPSHOT" in docker-compose.yml and have the fixture depend on the image build, we can then avoid building too images and should also better benefit from the caching.

alpar-t · 2019-02-27T07:40:21Z

I checked and :x-pack:test:smb-fixture:composeUp is rebuilt also, so something is definitely not right with the caches.

dliappis · 2019-03-20T15:29:56Z

Another occurrence has been raised in #40205. The Docker caches didn't seem to get honored.

During a team discussion with @danielmitterdorfer / @rjernst and @mark-vieira we thought it may make sense to bring back this yum retries commit: 6f52008 and backport it back to 7.x/7.0 and even 6.7.

rjernst · 2019-03-20T19:27:51Z

yum has a retries setting in yum.conf right? Why don't these retries work?

alpar-t · 2019-03-21T14:19:11Z

We should do the retires. I looked into it and the caching we do on images won't help us with this as I initially taught as we copy the Dockerfile ( even if we weren't the one used for the cache is still a different checkout ). We also have changing dependencies, as we just built the distirbution so we do want to regenerate the image.

dliappis · 2019-03-22T08:49:17Z

yum has a retries setting in yum.conf right? Why don't these retries work?

Yes there is retries and by default it's set to 10, however from what I've seen it only honors it for failures pertinent to specific package files and not earlier failures e.g. pulling the mirrorlist (Timeout on http://mirrorlist.centos.org).

dliappis · 2019-03-22T08:53:28Z

I raised #40349 resurrecting the retries

dliappis · 2019-03-27T13:58:55Z

#40349 (retries for yum commands in Dockerfile with in-between sleep period) has been merged, I'll close this out now. If the problem surfaces again, feel free to re-open.

tvernum added :Delivery/Build Build or test infrastructure >test-failure Triaged test failures from CI labels Feb 13, 2019

dliappis self-assigned this Feb 13, 2019

alpar-t mentioned this issue Feb 15, 2019

Make pullFixture a task dependency of resolveAllDependencies #38956

Merged

dliappis mentioned this issue Feb 18, 2019

Remove unnecessary Dockerfile commands #39040

Merged

alpar-t closed this as completed in #38956 Feb 18, 2019

benwtrent reopened this Feb 21, 2019

dliappis mentioned this issue Mar 20, 2019

[CI] buildDockerImage fails when installing unzip #40205

Closed

dliappis closed this as completed Mar 27, 2019

mark-vieira added the Team:Delivery Meta label for Delivery team label Nov 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] Centos docker builds failing during yum install #38832

[CI] Centos docker builds failing during yum install #38832

tvernum commented Feb 13, 2019

elasticmachine commented Feb 13, 2019

dliappis commented Feb 13, 2019 •

edited

Loading

dliappis commented Feb 15, 2019

benwtrent commented Feb 21, 2019 •

edited

Loading

danielmitterdorfer commented Feb 22, 2019

benwtrent commented Feb 22, 2019

dliappis commented Feb 26, 2019

alpar-t commented Feb 27, 2019

alpar-t commented Feb 27, 2019

alpar-t commented Feb 27, 2019

dliappis commented Mar 20, 2019

rjernst commented Mar 20, 2019

alpar-t commented Mar 21, 2019

dliappis commented Mar 22, 2019

dliappis commented Mar 22, 2019

dliappis commented Mar 27, 2019

[CI] Centos docker builds failing during yum install #38832

[CI] Centos docker builds failing during yum install #38832

Comments

tvernum commented Feb 13, 2019

elasticmachine commented Feb 13, 2019

dliappis commented Feb 13, 2019 • edited Loading

dliappis commented Feb 15, 2019

benwtrent commented Feb 21, 2019 • edited Loading

danielmitterdorfer commented Feb 22, 2019

benwtrent commented Feb 22, 2019

dliappis commented Feb 26, 2019

alpar-t commented Feb 27, 2019

alpar-t commented Feb 27, 2019

alpar-t commented Feb 27, 2019

dliappis commented Mar 20, 2019

rjernst commented Mar 20, 2019

alpar-t commented Mar 21, 2019

dliappis commented Mar 22, 2019

dliappis commented Mar 22, 2019

dliappis commented Mar 27, 2019

dliappis commented Feb 13, 2019 •

edited

Loading

benwtrent commented Feb 21, 2019 •

edited

Loading