-
Notifications
You must be signed in to change notification settings - Fork 278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stricter unused dependency checking for plus one deps mode #867
Comments
Thanks for the issue!
TLDR: Unfortunately not.
Longer:
The current unused deps mode uses a heuristic which works sometimes (in
Wix’s experience less, in Stripe’s experience more).
It’s not source based but rather uses some internal accounting scalac has
(which has both false positives and false negatives).
Can something be done?
1. Develop a different unused deps mechanism which is source based as part
of the build- I’m not sure how complicated this is to be honest. Java Bazel
people have something in the Bazel repo which we looked at in the past but
haven’t been able to get anywhere with. My initial plan with the +1 was to
have something like this (as a separate action which will fail the build
but doesn’t block the compilation graph).
2. Have an IntelliJ based unused deps source based mechanism- since
IntelliJ does many of the heavy lifting AST wise then it seems easier to
have some sort of IJ Inspection that looks at the entire target and the
sources and removed deps. I think this won’t be cheap perf wise but is an
interesting approach.
3. Drop and then recreate pattern- with buildozer you can drop all deps in
one command and if you then have a tool which automatically adds
dependencies you can use this pattern every X times to clean up. We (Wix)
have one such tool which we plan on open sourcing (tentatively Q1) and are
working on another one inside of our IJ plugin.
Would love to hear your thoughts
…On Fri, 1 Nov 2019 at 21:36 Jamie5 ***@***.***> wrote:
When using plus one deps mode, many unused deps do get marked, but some
deps can be left out because they are already the dep of another dep, and
it also makes sense to leave them out because they never appear in the
source code of the package being compiled, and the only reason the dep is
needed is to make scalac happy. Would it be possible to have a unused deps
mode for plus one deps mode where a dep is marked as an unused, unless it
is explicitly referenced in the source code? (Not sure if this would end up
with a number of false positives - not that familiar with scalac's needs)
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#867?email_source=notifications&email_token=AAKQQFY7W6JE5NVLAYRSOKDQRSAKHA5CNFSM4JH7D3J2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HWGHTRA>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAKQQF53MIXENSTDWEJSLRDQRSAKHANCNFSM4JH7D3JQ>
.
|
Re 1, I don't think walking over the tree and finding used types would be very hard - the unused deps mechanism already does something similar (though they walk over different things). The tricky part may be to understand, when you see a type, should it be a direct dep? Though that might not be tricky at all and actually be really straightforward. For 3, that seems useful though 1 feels more preferable. Also wouldn't this potentially remove direct dependencies that you also happen to depend on in a +1 situation, which would violate the desire of strict-deps? |
Re 3- you nailed it on both problems this brings. Our existing solution is
indeed not a very good one but better than letting build files rot. The new
mechanism in the plugin won’t solve the need for user activation but will
be source based so should solve the +1 issue.
Re 1- how do you suggest to do this? Our main goal was to have a small to
zero overhead for unused deps mechanism. Performing this iteration on the
scalac action (as a plugin for example) was deemed costly by people more
familiar with scalac than me. Performing this in a separate action has a
big cost resources wise even though you might not increase the effective
build time.
This is the main reason we went with the current heuristic which
capitalized on rough information scalac already collects.
Another thought was to work with the zinc people to extract their analysis
module to be more decoupled and then depend on that but it required
bandwidth we didn’t have.
…On Sat, 2 Nov 2019 at 21:23 Jamie5 ***@***.***> wrote:
Re 1, I don't think walking over the tree and finding used types would be
very hard - the unused deps mechanism already does something similar
(though they walk over different things). The tricky part may be to
understand, when you see a type, should it be a direct dep? Though that
might not be tricky at all and actually be really straightforward.
For 3, that seems useful though 1 feels more preferable. Also wouldn't
this potentially remove direct dependencies that you also happen to depend
on in a +1 situation, which would violate the desire of strict-deps?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#867?email_source=notifications&email_token=AAKQQFYY6QZ635EWXBJOAS3QRXHRHA5CNFSM4JH7D3J2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEC5DCCQ#issuecomment-549073162>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAKQQF6362WZSHZWZBIU74DQRXHRHANCNFSM4JH7D3JQ>
.
|
Yes, agreed that 3 is better than nothing, even with the problem of non-strict deps. For 1, IME iterating over the tree as a plugin was not overly expensive (though non-zero). But certainly I'm not that experienced with scalac, nor the cost of the specific operations needed for this. From my understanding the existing strict-deps mechanism already iterates over the full AST and pays some reasonable amount of cost for it. Actually is there a reason the existing strict-deps mechanism can't give the list of directly-referenced deps (from my understanding, it should be able to do that) which we can use to find the unneeded ones? |
Have you read the code of the existing strict deps mechanism? It doesn’t
do any iteration over the AST but rather just take the list of jars scalac
needed to load.
If you can do it via a plugin then maybe do it externally and measure the
cost?
…On Sat, 2 Nov 2019 at 22:38 Jamie5 ***@***.***> wrote:
Yes, agreed that 3 is better than nothing, even with the problem of
non-strict deps.
For 1, IME iterating over the tree as a plugin was not overly expensive
(though non-zero). But certainly I'm not that experienced with scalac, nor
the cost of the specific operations needed for this. From my understanding
the existing strict-deps mechanism already iterates over the full AST and
pays some reasonable amount of cost for it.
Actually is there a reason the existing strict-deps mechanism can't give
the list of directly-referenced deps (from my understanding, it should be
able to do that) which we can use to find the unneeded ones?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#867?email_source=notifications&email_token=AAKQQF7VM2I24LXEPQS4MKDQRXQKXA5CNFSM4JH7D3J2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEC5EQ3Y#issuecomment-549079151>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAKQQF4KQK3RYL2ETKVGRPLQRXQKXANCNFSM4JH7D3JQ>
.
|
Ah I see, I misread the code. It does seem to use the native java strict_deps but only for java files. Which does appear to iterate over the AST. Now I see that the strict_deps for scala files does not do that. Just to be sure, is https://github.com/bazelbuild/rules_scala/blob/master/third_party/dependency_analyzer/src/main/io/bazel/rulesscala/dependencyanalyzer/DependencyAnalyzer.scala what handles strict deps or am I misreading again? That one does appear to do very limited iteration over the AST in a way I don't fully follow. Hmm maybe will try that out and see, it would hopefully answer the question easily enough. Is there a particular big bazel-ified codebase you would recommend? |
None of the needed combo (OSS+Scala+Bazel).
Do you work for a company that uses Bazel and Scala? Maybe you can time it
on an internal codebase. If diff is small enough we can continue the
discussion (we can time it on our codebase as well)
…On Sat, 2 Nov 2019 at 23:19 Jamie5 ***@***.***> wrote:
Ah I see, I misread the code. It does seem to use the native java
strict_deps but only for java files. Which does appear to iterate over the
AST. Now I see that the strict_deps for scala files does not do that.
Just to be sure, is
https://github.com/bazelbuild/rules_scala/blob/master/third_party/dependency_analyzer/src/main/io/bazel/rulesscala/dependencyanalyzer/DependencyAnalyzer.scala
what handles strict deps or am I misreading again? That one does appear to
do very limited iteration over the AST in a way I don't fully follow.
Hmm maybe will try that out and see, it would hopefully answer the
question easily enough. Is there a particular big bazel-ified codebase you
would recommend?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#867?email_source=notifications&email_token=AAKQQF5CFZAACN6MLU4VCODQRXVGHA5CNFSM4JH7D3J2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEC5FIPY#issuecomment-549082175>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAKQQF27DN2A3SVVH5F4DLTQRXVGHANCNFSM4JH7D3JQ>
.
|
Ok Jamie5@cbba543 has a diff which is rather hacky (at least in terms of some plumbing) but does appear to do what we want. Run on a 2.12.8 codebase, some of the rules testing old vs new unused dependency checker were as follows. Note that while testing methodology was probably fairly reasonable, it would be far from airtight. The results suggest that timing is not significantly different, but if you have good infra for timing it would probably have more reliable results. Rule 1 Rule 2 Rule 3 Some notes
|
@ittaiz did you get a chance to look at this? |
No I'm sorry. This is really interesting to me but I'm a bit under capacity
trying to wrap my head around the refactor PR.
I'll do my best to get to it in the next few days, ok?
…On Wed, Nov 13, 2019 at 8:18 PM Jamie5 ***@***.***> wrote:
@ittaiz <https://github.com/ittaiz> did you get a chance to look at this?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#867?email_source=notifications&email_token=AAKQQF4GWJ3YM7L4BUXMEWLQTRAG5A5CNFSM4JH7D3J2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOED7EFZA#issuecomment-553534180>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAKQQF7325EZLJRHJ52OGC3QTRAG5ANCNFSM4JH7D3JQ>
.
|
Sounds good, no worries I just never know if something is in someone's queue or got lost in the notification void. |
Fair enough, unfortunately we have so many notifications it does happen
sometimes. Please feel free to ping me again mid of next week if I don't
respond.
…On Wed, Nov 13, 2019 at 11:27 PM Jamie5 ***@***.***> wrote:
Sounds good, no worries I just never know if something is in someone's
queue or got lost in the notification void.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#867?email_source=notifications&email_token=AAKQQFZ73SKZHMUAZHMFT7TQTRWNZA5CNFSM4JH7D3J2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOED7XBCA#issuecomment-553611400>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAKQQFZTCE66EGMA54GTPFLQTRWNZANCNFSM4JH7D3JQ>
.
|
One other random thought, if this works could a similar mechanism be used to not do plus one deps but examine only the ijar and determine which deps are actually needed to compile the ijar, and propogate only those ones? (IIRC I read somewhere java rules did something similar like that but not sure about it). And maybe strict deps can stem from this as well, the mechanisms seem similar at least. |
Potentially. I think that there are more nuances in this space so work will need to be iterative but a strong enough tool can definitely help us improve things. |
@ittaiz did you get a chance to take a look? |
I've re-read the thread and I think I originally replied to what I read and not what you wrote :) To align our discussion let's agree on the issues:
To mitigate these issues rules_scala currently has 3 strategies which impact the classpath:
Note all of the above tackle the hygiene issue even if in different ways and tradeoffs. Do I understand correct that you want to solve I took a look at the code and it's nice! If we'll decide to merge it in we'll of course need to clean it up a bit like you mentioned but my two main concerns are still performance and false positives. Can you see these +1 tests still pass with the flag turned on? To summarize (and assuming I understand correctly what you're trying to solve):
Thank you for your efforts here! I think that if we'll be able to polish it product, performance and false positives wise this will be a very big jump forward! Note- Direct only does solve the above problems while introducing others mentioned above. |
I guess SCP-009 has not made much progress since? Because if it did and they can backport it to 2.12.x or even just 2.13.x then that would be great and this becomes much less important IIUC. This diff would tackle 2.2.2, and if strict-deps was added (which I will try out) then it will also tackle 2.2.1. I don't fully follow what 2.2.3 and 2.2.4 are. If 2.2.3 means that some unneeded deps are captured by the +1 then in theory this might be able to help but it might be easier to do it by examining the ijar directly after it is produced, or something like that. Because otherwise we need to make assumptions about what exactly the ijar keeps and doesn't keep (which I guess might not be that complicated). But this would definitely be very up in the air and with very unclear feasibility/correctness/timelines. If I understand correctly, to run the tests you specified one only needs to do Regarding your last questions
|
SCP-009 made some progress and this is how we built the strict deps mechanism. We copied this work into rules_scala and adapted it into bazel. It’s too simplistic however... Never mind 2.2.3/2.2.4 for now Re the tests- the support you added works only if the user turns on both unused deps and plus 1, no? Because the tests I linked to only turn on plus 1 (you can modify them to only turn on unused deps and see) |
Ok, Jamie5@8724a32#diff-3830e6e26d863974d38e511b04761916 has code for unused_deps as well. Notes
As for potential small steps while validating the overall things
|
Thanks!
Completely fine for POC. When we'll want to merge it we'll need to consider if it's ok or not.
Again for POC fine, for merge we'll need the +1.
From our experience working with the existing strict-deps for a long time (6 months? 1 year?) on a very large codebase and many developers is that outputting unclear errors (to the developer) is super harmful. People started saying they need to "please the bazel beast". This was one of the main reasons why we moved to +1. Re code duplication- I agree. I'd probably prefer this be in separate commits to ease review. |
Agreed on both counts,
Fine with me, we can hammer the issues as we discover them. Would want to look at potential approaches for this, have some ideas but maybe some compiler expert knows the actual correct way to do things. One thing is that as we do more of this then the risk of breaking on different scala versions matters more and it would be useful to run the unit tests against all supported scala versions, not sure if there is already some mechanism to do that. (We already have this issue with final vals which are another false positive as mentioned above)
Agreed, would prefer to merge in small chunks that don't break things. I can look at the initial steps here if we gain confidence on the overall idea. |
@ittiaz just to make sure, you are not waiting for anything from me in order to test right? (want to make sure we are not both thinking we are waiting on something from the other and hence nothing ever happens) |
Indeed. Sorry for the silence, was sick and in BazelCon. |
The amazing @anchlovi is taking this week to run it on some large codebases internally |
Just started my tests and there is an issue with external source repos. I'm not sure that the unused deps tool should test external source repos targets. Also |
I think this is a bazel wide issue and not this tool specifically.
This is because Bazel treats everything as a mono repo (once fetch
finishes).
The pattern I think we'd need to use it to run with "warn" mode for
alignment and then switch to error.
WDYT?
…On Sun, Dec 22, 2019 at 10:52 AM Shachar Anchelovich < ***@***.***> wrote:
Just started my tests and there is an issue with external source repos.
I'm not sure that the unused deps tool should test external source repos
targets. Also buildozer (at least the version I'm running - 0.29.0) can
not handle external source repos
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#867?email_source=notifications&email_token=AAKQQF7IRR2CK5QNVNXCBFLQZ4TC7A5CNFSM4JH7D3J2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHPLKBY#issuecomment-568243463>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAKQQF3TIL2YTEAPQVXIGRDQZ4TC7ANCNFSM4JH7D3JQ>
.
--
*Ittai Zeidman*
Cell: 054-6735021
40 Hanamal street, Tel Aviv, Israel
<http://www.wix.com>
|
It will work for our use case |
Don't you think it can work for all use-cases?
Or do you mean if you use something you don't control like rules_scala?
Because that is indeed a known issue in the bazel ecosystem
…On Sun, Dec 22, 2019 at 10:59 AM Shachar Anchelovich < ***@***.***> wrote:
It will work for our use case
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#867?email_source=notifications&email_token=AAKQQFZP5WR3RQD6HDM7YA3QZ4T7BA5CNFSM4JH7D3J2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHPLNQQ#issuecomment-568243906>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAKQQF6PAYATJE554BPGUT3QZ4T7BANCNFSM4JH7D3JQ>
.
--
*Ittai Zeidman*
40 Hanamal street, Tel Aviv, Israel
<http://www.wix.com>
|
Exactly, your suggestion to start with |
👍 let's continue then. Mainly since in Bazel it's very easy to control external source repos if you must (fork them) |
Yes. Note even under the previous plan there would have been an |
We are now using the new ast mode and strict deps with our +1 and things seem to work fine. @anchlovi / others if you have time to test the process is now much simpler, just updating to latest rules_scala and then adjusting the toolchain as stated in README.md. Known potential issues
Things to do
|
Ok those timelines sound good. Regarding handling unused deps with +1 when strict-deps definition doesn't work: another possibility is that in the case of plus one deps we simply define the strict deps to be I would prefer to defer this until we get an example of this in real life, which may point to what sort of patterns in which this happens which may point out what approaches are better over others. Rather than just jumping in right now. This is with the awareness that the first examples will not necessarily be representative of all of rules_scala's customers, as it would at least be something more than the guessing of right now. And that this may delay the turning on of ast mode by default I think is reasonable. |
+1 on deferring until real examples. I really hope people at WixEng will be able to send examples (repros are hard). S+UP- probably a dumb question but doesn’t it effectively mean no unused deps? I think I missed your point |
Under the alternative proposal (let's call it proposal B), we would need to include both One concrete difference is that suppose we have class F subclassing D. Under the original proposal (proposal A, let's say), then it would be valid to have However under proposal B, we would emit a strict deps error that The downside of proposal B is that we need to have |
Can we somehow have the best of both worlds? It feels close |
One possibility is to take proposal A but change it so that in step 6 This means that random things like Another is to take proposal B, and change it so that the new strict deps definition includes, of So hopefully the real examples will help to clarify things. |
Thanks! First let me reiterate my agreement about real examples, they’re crucial. Second the first alternative which trims UP in proposal A sounds really good. Dropping F to get D is no fun but isn’t it the correct thing to do? This what we’re trying to improve in our current discussion. Maybe I’m missing something. |
So if we accept the fact that people will receive confusing errors then yes it is fine. Though it isn't a strict dep, I guess we can report the deps which would be orphaned by removing the unused dep so that the user can add them in in the same fell swoop. There may well be complaints that unused deps checker isn't working, because they removed a dep and it wasn't complained about as strict and also wasn't complained about as unused originally. (This dep would have been pinned by the +1 of some other dep in Implementation wise it is a bit more complex but that seems acceptable since it'll be simple on the outside. |
hmm, I think I understand. |
Summary of plus one/unused deps issue and potential solutions. Note: historical discussion on #991 (comment) and on this thread as well. Problem+1 deps combined with strict deps is just an approximation of what deps the scalac compiler actually needs and does not always work. For example assuming we have the inheritance chain A -> B -> C -> D, each in its own bazel package. For A, its only strict dep is B. B's +1 brings in C. But nothing brings in D as a dep, and scalac does require all of A, B, C, D to compile. As of right now, if C or D was included in A's deps, it would be reported as an unused dependency. Note that this problem is limited to +1 mode and does not apply to transitive mode. Existing mitigations include properly exporting iface deps in the cases in which this happens, and using The plan is to wait for examples in the wild where this is a problem to get more information about what sort of situations this occurs in practice. SolutionsThere are a few potential solutions, which share the initial steps
In order to do step 3, we will need to pass the plus one deps of each dep to the plugin. This information already exists in bazel, it just gets merged into a flat list of direct and indirect dependencies today before reaching the plugin. A note about the computation of R - it need not be perfect. We only need that Another note: There may be a desire, depending on the solution, to have this augmented behavior be opt-in. Solution AFor every target in U, report it as an unused dep unless it appears in UP. When we report an unused dep, we need to check if it "pins" any deps which are in UP, and if so emit errors to add those deps manually. Otherwise the user will get compile errors when they blindly follow the buildozer command. Solution BWe keep the overall framework of code the same, but we use
Advantages and DisadvantagesOne True Deps ListUnder solution B, for any state of code, there is only one valid set of deps, and any other set of deps will have either unused deps warnings or compile errors. There aren't situations where there is an dep which you can remove without an error - either that compilation fails, or that a strict dep is missing. No one will complain that there are false negatives in the unused deps checker, based on the fact that even when they delete the dep, things still compile. On the other hand, in solution A, this is not the case. For example, in a subclassing chain A -> B -> C -> D -> E, we can have A's deps be Less depsUnder solution A, we can elide some deps due to the plus one rule. For example in a subclassing chain A -> B -> C -> D -> E, for A the deps Implementation ComplexityWithout thinking too much about it, right now A feels somewhat more complex to implement than B, and both need a decent amount of work to bring in the +1 deps of each direct dep to the plugin. Computing the true required depsThis is a non-exhaustive list of things that we should do to find true required deps.
|
…ror toolchain (#1030) * move ast_plus_one_deps_strict_deps_unused_deps_error to //scala package also mention it in readme * add e2e tests to use ast_plus_one_deps_strict_deps_unused_deps_error * fix lint * rename ast_plus_one_deps_strict_deps_unused_deps_error to minimal_direct_source_deps
@ittaiz From the first post
Is this accurate, in that May 17 would be the day we could switch over, assuming no further reports come in? AFAIK there are a few potential issues such as the subclassing thing, but no real world examples. Should we actually only switch the default to be ast for transitive/plus-one, while still giving the user the option to toggle? Which might force more people onto ast by accident and hence maybe generate more bug reports and various issues? If so, should we do that now and then reset the 1 month/3 month clock, and then remove |
Sounds good re the extra caution of switching the default. |
Ok. In the near future I will switch the default, and then wait 3 months/1 months thing before removing the default. For any code which is public and reasonably easy to compile I can also help look directly. |
To make sure- we’re aligned that at any point people will be able to use +1/direct without strict deps or unused deps, yes? |
Yes - this removal of options is only for ast vs high-level. (and if someone turns off both strict_deps and unused_deps, then dependency_analyzer wouldn't run anyways) |
A few repros of non-trivial situations:
|
One more:
Despite my annoying repros, this features seems to be working very well! Thanks for all this fantastic work, @Jamie5 |
Example of how strict deps can create unnecessary coupling on the user side: #1052 Probably this means that there's a lot of value in solutions mentioned #867 (comment) |
@ittaiz we have passed the time gate that |
When using plus one deps mode, many unused deps do get marked, but some deps can be left out because they are already the dep of another dep, and it also makes sense to leave them out because they never appear in the source code of the package being compiled, and the only reason the dep is needed is to make scalac happy. Would it be possible to have a unused deps mode for plus one deps mode where a dep is marked as an unused, unless it is explicitly referenced in the source code? (Not sure if this would end up with a number of false positives - not that familiar with scalac's needs)
This is now in progress, the below summarizes the status
Known potential issues
Things to do
a. At this point take another look at [WIP] Handle a unused_dependency_checker_ignored_targets #1034 and see if it is landable
error
on toolchain, andstrict_deps=default
is deprecatedstrict_deps=default
as an optionThe text was updated successfully, but these errors were encountered: