Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Cleanup output of docker cache generation #16412

Closed
wants to merge 1 commit into from

Conversation

larroy
Copy link
Contributor

@larroy larroy commented Oct 9, 2019

Description

Cleanup output of docker cache generation script.

@mseth10

Copy link
Contributor

@marcoabreu marcoabreu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the log is quite useful to debug, thus I'd like to keep it. While the different threads are intermingled, it still gives you enough data to trace down an issue without having to run it locally first.

Copy link
Contributor

@mseth10 mseth10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@larroy
Copy link
Contributor Author

larroy commented Oct 10, 2019

Marco, would it be possible to merge this? splitting the PR is reasonable but CI is very flaky. It would help to get this in to alleviate the timeouts.

Also having the output from the containers built in parallel is not really useful and not needed in my opinion.

We would appreciate to get this merged so we can unblock CI soon. Thanks.

@ChaiBapchya
Copy link
Contributor

@marcoabreu Logging is great. Based on what I understand, this PR logs it enough for one to understand what causes the error (without giving too much cluttered info)

Lot of info is great (but since it executes in parallel), it seems to get garbled.
So in that sense, it's better to keep it clean and clear than lead astray.

What do you reckon?

@marcoabreu
Copy link
Contributor

marcoabreu commented Oct 10, 2019

Keep in mind that this job only runs on master, which means that every job has passed as a requirement to be merged in the first place. This means that if this job suddenly fails, we got an inconsistency - which could be flaky. Since the logs don't do any harm and in 99% of the cases the build is a nop since it's fully incremental and no changes have been made (thus, the log being super short), the output is only really there if something actually changed. And that's potentially exact the log you're interesting in in that particular moment. Also, you wouldn't be able to tell from a first glance if the caching is working properly if the log wasn't there for this job (since the "using cached" output is missing).

Generally, I take this change as a "fixing something that's not broken".

You're constantly bring up CI being flaky as argument to merge multiple changes in one PR. I have said it on multiple occasions and still stand by the stance that this is not acceptable. Everybody else in this community is able to split up their changes and get them through CI, so you should also be able to.

The time that was spent trying to convince me could've been used instead on extracting the crucial fix (which is the one-liner that increases the timeout) and retrying CI in case it flakes out.

@@ -37,7 +37,7 @@
DOCKERHUB_LOGIN_NUM_RETRIES = 5
DOCKERHUB_RETRY_SECONDS = 5
DOCKER_CACHE_NUM_RETRIES = 3
DOCKER_CACHE_TIMEOUT_MINS = 15
DOCKER_CACHE_TIMEOUT_MINS = 30
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So does this mean that even though we have a working container that has all of the deps in it, we'll still build a new one from scratch after 15 (or now 30) minutes?

If this is the case, why isn't this number much, much higher? Like in terms of hours or days? I am always frustrated by watching the same intermediate dependencies being built when there doesn't seem to be any change at all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is the job that publishes the cache. Right now, the job fails because it runs into a timeout. Once the job works again, everything will be back to normal with incremental builds.

Why shall we set it way higher? The job is only supposed to run a few minutes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind then, I thought this would explain why I see the cache invalidated so often and for no apparent reason.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the reason for the cache invalidations is because this job is broken since the 20th of September.

@larroy
Copy link
Contributor Author

larroy commented Oct 10, 2019

Generally, I take this change as a "fixing something that's not broken".

Marco, please check the logs from this job, is failing since two months, and causing containers to be rebuilt on CI. Really suprised that you are making such comments. Also what you are saying doesn't match my observations. I have run this in a separate machine to diagnose the problems and timeouts.

@marcoabreu
Copy link
Contributor

With my quote I'm referring to majority of this PR, which is the log change. It won't solve the problem but is the mere reason the issue hasn't been resolved yet. With regards to the timeout change, I'm happy to merge it asap.

@larroy
Copy link
Contributor Author

larroy commented Oct 10, 2019

Marco, there is a cost to the project of needlesly discussing minor things. This prevent us from making progress and is not useful. Have a look at this article:

https://www.theregister.co.uk/2008/05/30/google_open_source_talk/

@larroy larroy changed the title Fix docker cache generation, cleanup output Cleanup output of docker cache generation Oct 18, 2019
@larroy
Copy link
Contributor Author

larroy commented Nov 6, 2019

@marcoabreu could you have a look again at this PR? the changes were separated as you requested. Thanks.

@marcoabreu
Copy link
Contributor

Thanks, I'd prefer to not surpress the output and not move forward with the change

@larroy
Copy link
Contributor Author

larroy commented Nov 8, 2019

Thanks for your review. So going forward will you maintain this infrastructure then?

@marcoabreu marcoabreu closed this Nov 10, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants