-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Conversation
@mxnet-label-bot add [CI] |
@mxnet-label-bot add [pr-awaiting-review] |
d0c9197
to
13708f8
Compare
ln -s ccache /usr/local/bin/clang++-5.0 | ||
ln -s ccache /usr/local/bin/clang-5.0 | ||
ln -s ccache /usr/local/bin/clang++-6.0 | ||
ln -s ccache /usr/local/bin/clang-6.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also restore the links in /usr/local/bin/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need, also they fail with permission denied when executing locally or without root privileges.
No need because they are added to the path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you aware that the links you created previously were dangling? (basically didn't work)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you think the links to /usr/local/bin should be restored? Could you explain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We spoke about this already a dozen times, so I will only link the official guide: https://ccache.dev/manual/latest.html#_run_modes
Back when I developed this, I tested it only by relaying on the PATH (hence I introduced /tmp/ccache-redirects) and it didn't work. Thus, I added the /usr/local/bin method.
The links did work - why shouldn't they?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It didn't work because your links were broken. So with my changes it works in /tmp/ccache-redirects . there's no need to touch /usr/local/bin this is not allowed for users, requires sudo privileges, it's not necessary, etc. Please elaborate why you don't accept my fix, which uses the ccache wrappers in the /tmp folder. The guide says "something like this" you are taking the guide too seriously, having in any other folder which is then injected first in the path is enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate on why are you requesting changes when my changes actually fix the problem and make ccache work? or do you think there is something that is not working with my changes? Is not clear from your comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are not answering my comments, can you get into the container and see that this PR is fixing your broken links? That's the reason your approach didn't work, and you still left the /tmp/ccache-redirect broken links. I fixed this, thus no need to write in /usr/local/bin which prevents the build functions from being used locally or without root permissions.
Thanks for making it idempotent! |
549d1f1
to
01540a2
Compare
@marcoabreu can you have another look please? CI is passing. |
@KellenSunderland you might want to chime in as well. |
@marcoabreu @KellenSunderland Can you help review this PR? |
This code produces the following output:
Thus, I can not confirm the statement that the creation of the simlinks don't work. I have made the experience that in some cases the PATH method with /usr/local/bin was not picked up. Thus, I added /usr/local/bin as suggested by CCache. When I developed this integration, I did my due diligence to compare the ccache hit rates. I don't have the cycles to dive deep into this again and stand by the stance that my developed solution is working without major issues. Since you have not provided any instructions to reproduce your described problem, I can only execute the code our CI is running - which is running fine. This modification would reduce the CCache hitrate and thus my veto stands - as I have thoroughly explained to you over the course of the last months. I have already given my review and still stand by it. Thus there are no further actions required from the reviewer side until the comments have been addressed. |
We don't want to write in /usr/local/bin because then the script requires root privileges. The links in /tmp/ccache-redirects didn't work for you because the links were broken. Check the manpage of ln for details on usage. Here is an example of what I'm trying to explain to you, what I fixed, and that links in /usr/local/bin are not necessary.
|
Btw I tested that this is calling ccache using statistics as expected. And I'm using the redirect method as in the ccache documentation, just not in /usr/local/bin path. Why are you insisting on having them in /usr/local/bin when you admit that the reason is that in /tmp/ccache-redirects it didn't work for you due to your broken links? |
As you can see in your patch, ccache is disabled from one build because there's a problem. My PR fixes that problems, makes the function work without needing root privileges and more. Can you provide a reason why are you blocking this PR? This PR doesn't affect the hit rate of ccache but increase it, based on what are you making these claims? It's also not idempotent, the second time you call it it will fail, preventing all the build functions to be called multiple times which is essential if you are making changes or want to rebuild.
|
As written in the description, addresses and (fully) fixes the two linked issues. |
My PR increases ccache hit rate because ccache is disabled for "build_ubuntu_gpu_cuda91_cudnn7" as you can see in the PR. This is slowing down development if you need to use this build, as it doesn't use ccache. |
jenkins_slave@1a383a8c60d8:/work/mxnet$ ccache -s |
@mxnet-label-bot add [pr-awaiting-review] remove [pr-awaiting-response] |
Our Docker setup is running under the expectation to fully run as root. Usually these scripts are not intended to be called multiple times within the same session, so idempotency was never an issue. But great that you fixed, I'm happy to merge that part. For the broken symlink, it now makes sense - thanks for pointing that out. Since the /usr/local/bin method is recommended by the ccache authors, I'd just keep it in to not divert from the standard usage. It doesn't hurt to stay in, so I don't see much reason to have a discussion about it. Back when I worked on ccache, I had no issue with NVCC and it was officially supported (although the hit-rate wasn't that high). And as we can see, the setup we have works since otherwise our CI would break. There might be some use-cases in which it doesn't work, but until ccache/ccache#373 (comment) is confirmed and contains proper steps to reproduce the error, I would like to keep it as-is á la "never change a running system". As a summary, please split your change into two atomic PRs that address a single concern:
I won't merge removing /usr/local/bin, so there's no need to make a PR for that. I'm happy to merge PR 1 right away and would wait with merging PR 2 until detailed instructions to reproduce are available and have been confirmed. In my tests it worked and thus I'd like to have confirmation first. Right now, CI works and thus I'd keep it until it breaks. |
Thanks for your response. I don't think NVCC through ccache worked, otherwise why did you leave the build that this PR enables disabled? Having the ccache links in /usr/local/bin makes absolutely no difference versus having them in any other path is which prepended to the PATH variable. The result of insisting on keeping the paths in /usr/local/bin results in our build functions not working from a local environment to reproduce exact builds (since they require root) and because they are not idempotent. I can split the PR as you request. Have you seen that I verified that the hit rate is 100% while using the redirects in the /tmp path? Isn't this enough validation that the links in /usr/local/bin are not needed? |
@larroy can you please update the PR as per your comments above |
74816e8
to
a5c630a
Compare
@marcoabreu Can this PR be merged ? @mxnet-label-bot Update [pr-awaiting-merge, CI] |
* Fix broken links * Make it idempotent fixes apache#13456 fixes apache#14117 fixes apache#11516
Description
fixes Improve CCache handling #13456
fixes Would like to be able to execute build function several times, building ccache wrapers prevents this #14117
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.