Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

review: Use matching package with most contained types #4139

Merged
merged 9 commits into from
Sep 23, 2021

Conversation

I-Al-Istannen
Copy link
Collaborator

@I-Al-Istannen I-Al-Istannen commented Aug 29, 2021

About this PR

This PR closes #4051 by always using the matching package with most contained types. This ensures we will skip "synthetic" packages in overlapping modules and fetch the package from the correct module.

Considerations

  • The getContainedTypeCount getter was added as CtPackage#getAllTypes copies the full set just to call size on it. The getter does pollute the CtPackage interface a bit though.

Further Work

  • Without the Cache packages in CtModule (e1b82d3) commit the PR significantly increases runtime on codebases with modules. This happens as it calls getPackageFromModule on quite a few modules and that is a costly operation -- it iterates through the whole package chain each time:
    moduleA
     - fr
       - inria
         - spoon
           - a
    moduleB
     - fr
       - inria
         - spoon
           - b
    moduleC
     - fr
       - inria
         - spoon
           - c
    
    In this example every type lookup of fr.inria.spoon.* would walk down the chain in every module this application has, causing a lot of unnecessary work.
    The cache would need to be kept up-to-date though, which this PR doesn't implement as of now and might pose a challenge. Therefore I would trim off the commit and open a different PR for it after this one is merged.

@I-Al-Istannen I-Al-Istannen changed the title Use matching package with most contained types wip: Use matching package with most contained types Aug 29, 2021
@I-Al-Istannen
Copy link
Collaborator Author

Okay, the version in this PR is slightly optimized (max instead of reverse sorting) and clocks in at ~ the same time for projects without modules (it shouldn't do anything extra there, as the comparator is not called for single element streams and we should not have duplicates there) and slows down JDK indexing by ~40 seconds (so ~20%).

That is probably more acceptable, though I don't know where the slowdown even comes from. Maybe it increases the lock contention on the concurrent skip list map.

Though running it with one instead of 8 threads (I have an 8 thread CPU) yields:

With this PR
java16 -Xmx8G -XX:ActiveProcessorCount=1   619,10s user 3,65s system 111% cpu 9:17,34 total

without this PR
java16 -Xmx8G -XX:ActiveProcessorCount=1   420,27s user 4,09s system 119% cpu 5:54,36 total

The output without this PR is wrong though and correctness trumps performance, it just looks so weird.

If you want I can detail the steps you'd need to do to try to reproduce this but your time is probably quite a bit more valuable.

@I-Al-Istannen
Copy link
Collaborator Author

I-Al-Istannen commented Aug 30, 2021

Most of the slowdown is due to the following: With the comparator java needs to evaluate getPackageFromModule for all modules and not just lazily for the first. That method is really heavy, causing substantial regressions in the JDK code where quite a few packages are shared between modules.

I optimized the getPackageFromModule visible in #4145 resulting in the following change:
image
Most of that time is actually spent in ConcurrentSkipListMap:
image
Changing it to a ConcurrentHashMap is quite a bit faster but breaks the ordering guarantees:
image

The last commit (Cache packages in CtModule) introduces a package cache in CtModule - which I am sure will make you cry and destroy quite a few usecases where people decide to move packages between modules or change package instances or who knows what. If we could somehow cleanly maintain that it would bring down the time to around where it was before this PR, but with the correct behaviour now :)

/usr/lib/jvm/java-16-adoptopenjdk/bin/java -Xmx8G  -XX:+DebugNonSafepoints     1080,13s user 7,14s system 533% cpu 3:23,86 total

Code without modules is not affected at all from this PR as far as I can see (see here):
image


I will make a separate PR with the implicit solution. That one should have no performance regressions and therefore the optimizations in here should be a net positive and we can leave out the more or less invasive CtModule cache (though it might still be slightly beneficial). You can then have a look whether you want to follow the siren call of the implicit or whether you value the compatibility and the extensive yak shaving of this PR more.

@I-Al-Istannen I-Al-Istannen force-pushed the fix/overlapping-packages-2 branch 2 times, most recently from e1b82d3 to 6c83152 Compare September 7, 2021 10:28
@I-Al-Istannen I-Al-Istannen changed the title wip: Use matching package with most contained types review: Use matching package with most contained types Sep 7, 2021
@monperrus
Copy link
Collaborator

Thanks! The optimization is good, but I'm not sure we need a new method in the interface. WDYT?

@I-Al-Istannen
Copy link
Collaborator Author

That only exists to prevent a useless set copy from being created every time, but I think with some stuff from SirYwells PR we can sidestep that issue and maintain the cleaner API.

I will remove that method in this PR, we can always re-add some smarter set creation later on.

@monperrus
Copy link
Collaborator

LGTM. Could you document the core idea and contract also directly in the methods in PackageFactory?

@I-Al-Istannen
Copy link
Collaborator Author

I-Al-Istannen commented Sep 15, 2021

While writing the comment you requested I noticed a possible simplification:
javac does not allow overlapping packages. So each package either has zero types in it (if it is synthetic) or one or more if it is the real one. This means we could remove the max calculation and just pick the first package that has a non-zero amount of types in it.

Sadly, this does not seem to be part of the java spec but is something javac enforces. JDT does not. With JDT you can have the exact same class in the same package in different modules. I am not sure how I should handle that case.

I could:

  1. Assume overlapping packages won't ever occur and use the first package with a non-empty package
  2. Throw an error if overlap is detected

I would prefer 1 as I do not believe that any real-world program would be written to contain overlapping packages if javac doesn't allow it.

What do you think?

@slarse
Copy link
Collaborator

slarse commented Sep 15, 2021

javac does not allow overlapping packages. So each package either has zero types in it (if it is synthetic) or one or more if it is the real one. This means we could remove the max calculation and just pick the first package that has a non-zero amount of types in it.

javac does allow overlapping packages in modules, it just doesn't allow overlapping exported packages. At least this was the case last time I tried, with JDK 15 I believe. See my comment here: #4051 (comment)

I'm not quite sure what to make of that.

@I-Al-Istannen
Copy link
Collaborator Author

I-Al-Istannen commented Sep 15, 2021

Right, I misremembered that. Then we are kinda screwed, no? If we need to look up a type we'd need to give that information to the getPackage method because it needs to resolve overlaps. I am not sure we can retrofit that into the current architecture without changing the existing methods. We could duplicate them, but that doesn't sound ideal.

This PR is probably fine as-is then but only addresses the problem partially. Using the package with most types might statistically have a few more hits than taking the first, but I am not sure that's worth it...

@slarse
Copy link
Collaborator

slarse commented Sep 16, 2021

Right, I misremembered that. Then we are kinda screwed, no?

Well, in terms of compile time, I suppose so. But as the module system system doesn't seem to be onboard with overlapping packages at runtime, I think it's pretty safe to assume that no one will write code like that in a single project.

That being said, it shouldn't be particularly expensive to check for overlap and throw an exception if there is overlap, right? I think doing that would be a nice safeguard, and good for usability in case someone does something crazy, or we have misunderstood something about the module system.

@I-Al-Istannen
Copy link
Collaborator Author

I-Al-Istannen commented Sep 16, 2021

I had that originally but JDT seems to allow having overlapping packages and that caused the exception to trigger. If you are fine with rejecting such code I will re-add the check.

EDIT: I accidentally broke the implementation with the first commit after this comment, but the tests will pass nonetheless. That is because my code works if the last package is correct and the original code works if the first one is correct. Should I add a test with the reverse order as well? The order of getAllModules doesn't seem to be guaranteed so it might not actually help with catching more bugs.

Copy link
Collaborator

@slarse slarse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had that originally but JDT seems to allow having overlapping packages and that caused the exception to trigger. If you are fine with rejecting such code I will re-add the check.

Yes, I think we'll go with that first as it's an easy solution, and having overlapping packages seems to be entirely incompatible with the module system at runtime. So, we'll do it like that and... see what happens.

Otherwise, LGTM, thanks @I-Al-Istannen !

@slarse slarse merged commit 1922bf5 into INRIA:master Sep 23, 2021
@I-Al-Istannen I-Al-Istannen deleted the fix/overlapping-packages-2 branch September 23, 2021 10:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug: Classes in overlapping modules are not resolved correctly
3 participants