-
Notifications
You must be signed in to change notification settings - Fork 315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Oheger bosch/analyzer/dependency representation #3502
Oheger bosch/analyzer/dependency representation #3502
Conversation
d6d38b2
to
6a1e2f0
Compare
analyzer/src/main/kotlin/managers/utils/GradleDependencyGraphBuilder.kt
Outdated
Show resolved
Hide resolved
analyzer/src/main/kotlin/managers/utils/GradleDependencyGraphBuilder.kt
Outdated
Show resolved
Hide resolved
analyzer/src/main/kotlin/managers/utils/GradleDependencyGraphBuilder.kt
Outdated
Show resolved
Hide resolved
analyzer/src/main/kotlin/managers/utils/GradleDependencyGraphBuilder.kt
Outdated
Show resolved
Hide resolved
e6bd23e
to
8e6b318
Compare
By tweaking the representation of dependency references a bit, the size of the result file could be reduced even further to 25.2 MB. As this approach uses numeric indices to reference dependencies, the result - at least the part with the dependency tree - is now not really readable for human beings. I am not sure whether this is a criterion or not. It would be possible, based on this strongly optimized format, to come to a representation that is still free of most redundancies, but would be more readable (something like a tree of packages that replaces duplicate sub trees by references). But this format would have to be implemented, and it would increase the result size again of course. |
@sschuberth, @mnonnenmacher, @fviernau: It would be great to have a discussion regarding the format of the dependency tree in the Analyzer result. Should it be as efficient as possible (using indices as dependency references) or do you prefer a more human-readable format? What has been implemented here for Gradle can be applied to other Analyzer implementations as well. I would like to look at Maven next. I have some hope that the de-duplication of dependencies will also improve the performance of this package manager. |
Hi @oheger-bosch, can you please rebase this PR onto latest |
8e6b318
to
5c5aa45
Compare
@sschuberth, the PR has been rebased. |
analyzer/src/funTest/assets/projects/external/multi-kotlin-project-expected-output-cli.yml
Outdated
Show resolved
Hide resolved
analyzer/src/funTest/assets/projects/external/multi-kotlin-project-expected-output-cli.yml
Outdated
Show resolved
Hide resolved
analyzer/src/funTest/assets/projects/external/multi-kotlin-project-expected-output-cli.yml
Outdated
Show resolved
Hide resolved
analyzer/src/funTest/assets/projects/external/multi-kotlin-project-expected-output-cli.yml
Outdated
Show resolved
Hide resolved
analyzer/src/main/kotlin/managers/utils/GradleDependencyGraphBuilder.kt
Outdated
Show resolved
Hide resolved
5c5aa45
to
a96a410
Compare
This class implements an alternative layout to store information about the scopes of a project and the dependencies they contain. This layout avoids the duplication of dependency trees over different scopes by introducing a minimized graph of shared dependency nodes. Especially in large projects with many scopes, this storage format reduces the size of analyzer results significantly. Signed-off-by: Oliver Heger <[email protected]>
This class is going to replace parts of the dependency analysis implementation of the Gradle analyzer. It constructs a representation of dependencies, which is more efficient than the original representation and can hence deal with projects with many scopes and large dependency sets. Signed-off-by: Oliver Heger <[email protected]>
The project's dependencies can now be represented either by the classic set of scopes or a DependencyGraph. If a DependencyGraph is used, generate the scopes representation on demand. Use the original 'scopes' property for this purpose, so that read access to this information does not need to be changed; however, the construction to Project instances now has to be adapted, which affects also the analyzer implementations. Signed-off-by: Oliver Heger <[email protected]>
a96a410
to
b4d74d7
Compare
Use the GradleDependencyGraphBuilder class to create a significantly smaller result that avoids redundancies in the storage format of dependencies. Adapt functional tests to expect the new output format. Signed-off-by: Oliver Heger <[email protected]>
b4d74d7
to
4ac7626
Compare
@mnonnenmacher @fviernau this is looking good to me now, and I'd like to get this merged very soon. While this is not a breaking change in the strict sense, it changes the default output format for new Gradle project analyses. Any objections that I merge this? |
Please let me have another look before merging it, I'm currently busy with testing #3564. In the meantime I have a question, do you plan to implement this for the other package managers anytime soon? Archiving two different formats in parallel makes any analysis of the analyzer results very difficult. |
No, probably not soon. We're planning to do this on-demand, if and and when we run into memory issues with other package managers. But that's purely out of a lack of time and other priorities; otherwise I'd of course like to migrate consistently to the new format as soon as possible.
Are you referring to using |
This would also be a chance to align the implementation of the different package managers a bit if the
Yes, good idea, I have already prepared that change and will make a PR once this is merged. |
Indeed, we should take that opportunity to work a bit towards a more DRY implementation of the package managers which @haikoschol was mentioning a long time ago (I thought we had an issue for that, but I cannot find it anymore). |
This PR addresses #3321 by proposing a more efficient storage format for the dependency representation in Gradle analyzer results. The basic idea is to not repeat the dependency trees for all scopes, but construct an optimized dependency graph, in which dependencies are shared. Ideally, in this graph each dependency occurs once. However, as sometimes packages are referenced from scopes with different sets of dependencies, multiple occurrences are possible.
The most important class of this PR is the new
GradleDependencyGraphBuilder
class that integrates the project's dependencies into a dependency graph (represented by theDependencyGraph
class). The class is currently specific to Gradle, but it should be possible to make it more generic, so that other analyzer implementations can make use of this optimized format as well.Here are some numbers to give an impression of the size reduction of the dependency representation. The numbers are based on the Mozilla Focus Android project, which was used for testing. This project uses a large number of scopes causing the dependency graph in the old representation to become really huge:
Original representation:
New representation:
Remarks:
DependencyRef
is the counter-part ofPackageRef
in the new dependency graph representation. It references dependencies by a numeric index rather than an identifier.Scope
s is constructed, but also in an optimized form (shared references to packages). Therefore, the number ofPackageRef
instances is low.