Cleanup output of docker cache generation #16412

larroy · 2019-10-09T19:05:07Z

Description

Cleanup output of docker cache generation script.

marcoabreu

I think the log is quite useful to debug, thus I'd like to keep it. While the different threads are intermingled, it still gives you enough data to trace down an issue without having to run it locally first.

mseth10

LGTM

larroy · 2019-10-10T20:16:17Z

Marco, would it be possible to merge this? splitting the PR is reasonable but CI is very flaky. It would help to get this in to alleviate the timeouts.

Also having the output from the containers built in parallel is not really useful and not needed in my opinion.

We would appreciate to get this merged so we can unblock CI soon. Thanks.

ChaiBapchya · 2019-10-10T20:21:51Z

@marcoabreu Logging is great. Based on what I understand, this PR logs it enough for one to understand what causes the error (without giving too much cluttered info)

Lot of info is great (but since it executes in parallel), it seems to get garbled.
So in that sense, it's better to keep it clean and clear than lead astray.

What do you reckon?

marcoabreu · 2019-10-10T21:04:47Z

Keep in mind that this job only runs on master, which means that every job has passed as a requirement to be merged in the first place. This means that if this job suddenly fails, we got an inconsistency - which could be flaky. Since the logs don't do any harm and in 99% of the cases the build is a nop since it's fully incremental and no changes have been made (thus, the log being super short), the output is only really there if something actually changed. And that's potentially exact the log you're interesting in in that particular moment. Also, you wouldn't be able to tell from a first glance if the caching is working properly if the log wasn't there for this job (since the "using cached" output is missing).

Generally, I take this change as a "fixing something that's not broken".

You're constantly bring up CI being flaky as argument to merge multiple changes in one PR. I have said it on multiple occasions and still stand by the stance that this is not acceptable. Everybody else in this community is able to split up their changes and get them through CI, so you should also be able to.

The time that was spent trying to convince me could've been used instead on extracting the crucial fix (which is the one-liner that increases the timeout) and retrying CI in case it flakes out.

aaronmarkham · 2019-10-10T21:39:19Z

ci/docker_cache.py

@@ -37,7 +37,7 @@
 DOCKERHUB_LOGIN_NUM_RETRIES = 5
 DOCKERHUB_RETRY_SECONDS = 5
 DOCKER_CACHE_NUM_RETRIES = 3
-DOCKER_CACHE_TIMEOUT_MINS = 15
+DOCKER_CACHE_TIMEOUT_MINS = 30


So does this mean that even though we have a working container that has all of the deps in it, we'll still build a new one from scratch after 15 (or now 30) minutes?

If this is the case, why isn't this number much, much higher? Like in terms of hours or days? I am always frustrated by watching the same intermediate dependencies being built when there doesn't seem to be any change at all.

No, this is the job that publishes the cache. Right now, the job fails because it runs into a timeout. Once the job works again, everything will be back to normal with incremental builds.

Why shall we set it way higher? The job is only supposed to run a few minutes

Nevermind then, I thought this would explain why I see the cache invalidated so often and for no apparent reason.

Yeah the reason for the cache invalidations is because this job is broken since the 20th of September.

larroy · 2019-10-10T22:23:42Z

Generally, I take this change as a "fixing something that's not broken".

Marco, please check the logs from this job, is failing since two months, and causing containers to be rebuilt on CI. Really suprised that you are making such comments. Also what you are saying doesn't match my observations. I have run this in a separate machine to diagnose the problems and timeouts.

marcoabreu · 2019-10-10T22:29:30Z

With my quote I'm referring to majority of this PR, which is the log change. It won't solve the problem but is the mere reason the issue hasn't been resolved yet. With regards to the timeout change, I'm happy to merge it asap.

larroy · 2019-10-10T22:50:34Z

Marco, there is a cost to the project of needlesly discussing minor things. This prevent us from making progress and is not useful. Have a look at this article:

https://www.theregister.co.uk/2008/05/30/google_open_source_talk/

larroy · 2019-11-06T21:25:26Z

@marcoabreu could you have a look again at this PR? the changes were separated as you requested. Thanks.

marcoabreu · 2019-11-07T10:13:26Z

Thanks, I'd prefer to not surpress the output and not move forward with the change

larroy · 2019-11-08T00:53:25Z

Thanks for your review. So going forward will you maintain this infrastructure then?

larroy requested review from aaronmarkham and marcoabreu as code owners October 9, 2019 19:05

marcoabreu suggested changes Oct 9, 2019

View reviewed changes

mseth10 approved these changes Oct 9, 2019

View reviewed changes

aaronmarkham reviewed Oct 10, 2019

View reviewed changes

larroy mentioned this pull request Oct 10, 2019

[URGENT] increase docker cache timeout #16430

Merged

larroy force-pushed the fix_docker_cache branch from dbf5981 to 7dc622a Compare October 18, 2019 03:54

larroy changed the title ~~Fix docker cache generation, cleanup output~~ Cleanup output of docker cache generation Oct 18, 2019

Fix docker cache generation, cleanup output

c340488

larroy force-pushed the fix_docker_cache branch from 7dc622a to c340488 Compare October 22, 2019 18:16

marcoabreu closed this Nov 10, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cleanup output of docker cache generation #16412

Cleanup output of docker cache generation #16412

larroy commented Oct 9, 2019 •

edited

Loading

marcoabreu left a comment

mseth10 left a comment

larroy commented Oct 10, 2019

ChaiBapchya commented Oct 10, 2019

marcoabreu commented Oct 10, 2019 •

edited

Loading

aaronmarkham Oct 10, 2019

marcoabreu Oct 10, 2019

aaronmarkham Oct 10, 2019

marcoabreu Oct 10, 2019

larroy commented Oct 10, 2019 •

edited

Loading

marcoabreu commented Oct 10, 2019

larroy commented Oct 10, 2019

larroy commented Nov 6, 2019

marcoabreu commented Nov 7, 2019

larroy commented Nov 8, 2019

Cleanup output of docker cache generation #16412

Cleanup output of docker cache generation #16412

Conversation

larroy commented Oct 9, 2019 • edited Loading

Description

marcoabreu left a comment

Choose a reason for hiding this comment

mseth10 left a comment

Choose a reason for hiding this comment

larroy commented Oct 10, 2019

ChaiBapchya commented Oct 10, 2019

marcoabreu commented Oct 10, 2019 • edited Loading

aaronmarkham Oct 10, 2019

Choose a reason for hiding this comment

marcoabreu Oct 10, 2019

Choose a reason for hiding this comment

aaronmarkham Oct 10, 2019

Choose a reason for hiding this comment

marcoabreu Oct 10, 2019

Choose a reason for hiding this comment

larroy commented Oct 10, 2019 • edited Loading

marcoabreu commented Oct 10, 2019

larroy commented Oct 10, 2019

larroy commented Nov 6, 2019

marcoabreu commented Nov 7, 2019

larroy commented Nov 8, 2019

larroy commented Oct 9, 2019 •

edited

Loading

marcoabreu commented Oct 10, 2019 •

edited

Loading

larroy commented Oct 10, 2019 •

edited

Loading