Skip to content

931: binary cataloger exclusion defaults#1948

Merged
spiffcs merged 18 commits into
mainfrom
931-binary-cataloger-defaults
Aug 8, 2023
Merged

931: binary cataloger exclusion defaults#1948
spiffcs merged 18 commits into
mainfrom
931-binary-cataloger-defaults

Conversation

@spiffcs
Copy link
Copy Markdown
Contributor

@spiffcs spiffcs commented Jul 20, 2023

Binary cataloger exclusion defaults

Fixes #931

PR #1948 introduces a new implicit exclusion for packages that overlap by file ownership and have certain characteristics:

// 1) the relationship between packages is OwnershipByFileOverlap
// 2) the parent is an "os" package
// 3) the child is a synthetic package generated by the binary cataloger
// 4) the package names are identical

Packages found by the following catalogers will dedupe synthetic binary packages given an overlap as described above:

apkdb,
alpm,
deb
nix,
rpm (file and db)

I've added an integration test that captures the new default where scanning an alpine image with busybox goes from:

alpine-baselayout       3.4.3-r1     apk
alpine-baselayout-data  3.4.3-r1     apk
alpine-keys             2.4-r1       apk
apk-tools               2.14.0-r2    apk
busybox                 1.36.1       binary
busybox                 1.36.1-r0    apk
busybox-binsh           1.36.1-r0    apk
ca-certificates-bundle  20230506-r0  apk
libc-utils              0.7.2-r5     apk
libcrypto3              3.1.1-r1     apk
libssl3                 3.1.1-r1     apk
musl                    1.2.4-r0     apk
musl-utils              1.2.4-r0     apk
scanelf                 1.3.7-r1     apk
ssl_client              1.36.1-r0    apk
zlib                    1.2.13-r1    apk

to

alpine-baselayout       3.4.3-r1     apk
alpine-baselayout-data  3.4.3-r1     apk
alpine-keys             2.4-r1       apk
apk-tools               2.14.0-r2    apk
busybox                 1.36.1-r0    apk
busybox-binsh           1.36.1-r0    apk
ca-certificates-bundle  20230506-r0  apk
libc-utils              0.7.2-r5     apk
libcrypto3              3.1.1-r1     apk
libssl3                 3.1.1-r1     apk
musl                    1.2.4-r0     apk
musl-utils              1.2.4-r0     apk
scanelf                 1.3.7-r1     apk
ssl_client              1.36.1-r0    apk
zlib                    1.2.13-r1    apk

spiffcs added 5 commits July 20, 2023 08:53
Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jul 20, 2023

Benchmark Test Results

Benchmark results from the latest changes vs base branch
goos: linux%0Agoarch: amd64%0Apkg: github.com/anchore/syft/test/integration%0Acpu: Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz%0A                                                              │ ./.tmp/benchmark-6cc1b86.txt │%0A                                                              │            sec/op            │%0AImagePackageCatalogers/alpmdb-cataloger-2                                       15.43m ±  8%25%0AImagePackageCatalogers/apkdb-cataloger-2                                        1.012m ±  6%25%0AImagePackageCatalogers/binary-cataloger-2                                       262.6µ ±  8%25%0AImagePackageCatalogers/dpkgdb-cataloger-2                                       771.9µ ±  8%25%0AImagePackageCatalogers/dotnet-portable-executable-cataloger-2                   28.92µ ±  8%25%0AImagePackageCatalogers/go-module-binary-cataloger-2                             128.6µ ±  5%25%0AImagePackageCatalogers/java-cataloger-2                                         16.19m ± 25%25%0AImagePackageCatalogers/graalvm-native-image-cataloger-2                         134.2µ ±  5%25%0AImagePackageCatalogers/javascript-package-cataloger-2                           537.6µ ±  5%25%0AImagePackageCatalogers/nix-store-cataloger-2                                    401.9µ ±  3%25%0AImagePackageCatalogers/php-composer-installed-cataloger-2                       1.071m ±  7%25%0AImagePackageCatalogers/portage-cataloger-2                                      685.0µ ±  8%25%0AImagePackageCatalogers/python-package-cataloger-2                               4.479m ±  6%25%0AImagePackageCatalogers/r-package-cataloger-2                                    322.7µ ±  7%25%0AImagePackageCatalogers/rpm-db-cataloger-2                                       738.0µ ±  3%25%0AImagePackageCatalogers/ruby-gemspec-cataloger-2                                 1.385m ±  7%25%0AImagePackageCatalogers/sbom-cataloger-2                                         159.3µ ±  2%25%0Ageomean                                                                         664.2µ%0A%0A                                                              │ ./.tmp/benchmark-6cc1b86.txt │%0A                                                              │             B/op             │%0AImagePackageCatalogers/alpmdb-cataloger-2                                       5.144Mi ± 0%25%0AImagePackageCatalogers/apkdb-cataloger-2                                        205.7Ki ± 0%25%0AImagePackageCatalogers/binary-cataloger-2                                       30.46Ki ± 0%25%0AImagePackageCatalogers/dpkgdb-cataloger-2                                       172.6Ki ± 0%25%0AImagePackageCatalogers/dotnet-portable-executable-cataloger-2                   3.695Ki ± 0%25%0AImagePackageCatalogers/go-module-binary-cataloger-2                             9.906Ki ± 0%25%0AImagePackageCatalogers/java-cataloger-2                                         2.843Mi ± 0%25%0AImagePackageCatalogers/graalvm-native-image-cataloger-2                         8.595Ki ± 0%25%0AImagePackageCatalogers/javascript-package-cataloger-2                           94.20Ki ± 0%25%0AImagePackageCatalogers/nix-store-cataloger-2                                    49.33Ki ± 0%25%0AImagePackageCatalogers/php-composer-installed-cataloger-2                       186.6Ki ± 0%25%0AImagePackageCatalogers/portage-cataloger-2                                      120.2Ki ± 0%25%0AImagePackageCatalogers/python-package-cataloger-2                               1.003Mi ± 0%25%0AImagePackageCatalogers/r-package-cataloger-2                                    53.29Ki ± 0%25%0AImagePackageCatalogers/rpm-db-cataloger-2                                       181.4Ki ± 0%25%0AImagePackageCatalogers/ruby-gemspec-cataloger-2                                 144.1Ki ± 0%25%0AImagePackageCatalogers/sbom-cataloger-2                                         14.20Ki ± 0%25%0Ageomean                                                                         100.6Ki%0A%0A                                                              │ ./.tmp/benchmark-6cc1b86.txt │%0A                                                              │          allocs/op           │%0AImagePackageCatalogers/alpmdb-cataloger-2                                        88.14k ± 0%25%0AImagePackageCatalogers/apkdb-cataloger-2                                         4.190k ± 0%25%0AImagePackageCatalogers/binary-cataloger-2                                         848.0 ± 0%25%0AImagePackageCatalogers/dpkgdb-cataloger-2                                        3.145k ± 0%25%0AImagePackageCatalogers/dotnet-portable-executable-cataloger-2                     132.0 ± 0%25%0AImagePackageCatalogers/go-module-binary-cataloger-2                               281.0 ± 0%25%0AImagePackageCatalogers/java-cataloger-2                                          40.19k ± 0%25%0AImagePackageCatalogers/graalvm-native-image-cataloger-2                           228.0 ± 0%25%0AImagePackageCatalogers/javascript-package-cataloger-2                            1.342k ± 0%25%0AImagePackageCatalogers/nix-store-cataloger-2                                      898.0 ± 0%25%0AImagePackageCatalogers/php-composer-installed-cataloger-2                        4.080k ± 0%25%0AImagePackageCatalogers/portage-cataloger-2                                       2.272k ± 0%25%0AImagePackageCatalogers/python-package-cataloger-2                                16.45k ± 0%25%0AImagePackageCatalogers/r-package-cataloger-2                                      929.0 ± 0%25%0AImagePackageCatalogers/rpm-db-cataloger-2                                        3.992k ± 0%25%0AImagePackageCatalogers/ruby-gemspec-cataloger-2                                  2.447k ± 0%25%0AImagePackageCatalogers/sbom-cataloger-2                                           394.0 ± 0%25%0Ageomean                                                                          2.062k

Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
@spiffcs spiffcs changed the title 931: binary cataloger defaults 931: binary cataloger exclusion defaults Jul 20, 2023
@spiffcs spiffcs self-assigned this Jul 27, 2023
@spiffcs
Copy link
Copy Markdown
Contributor Author

spiffcs commented Jul 31, 2023

Quick update on this PR - after some discussion we're going to dial back the feature to not be exposed to the user yet and just fix this for the narrower cases of name --> name overlap in the case of os --> binary(synthetic package) matches

spiffcs added 2 commits August 7, 2023 15:25
Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
@spiffcs spiffcs force-pushed the 931-binary-cataloger-defaults branch from ace0cb0 to d45458e Compare August 7, 2023 20:21
spiffcs and others added 6 commits August 7, 2023 16:22
Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
@spiffcs spiffcs marked this pull request as ready for review August 7, 2023 22:01
Comment thread syft/lib.go
Copy link
Copy Markdown
Contributor

@kzantow kzantow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blockers, but left some feedback.

Comment thread syft/pkg/cataloger/package_exclusions.go Outdated
Comment thread syft/pkg/cataloger/package_exclusions.go Outdated
Comment thread syft/pkg/cataloger/package_exclusions.go Outdated
Comment on lines +15 to +22
type CategoryType string

const (
OsCatalogerType CategoryType = "os"
BinaryCatalogerType CategoryType = "binary"
)

var CatalogerTypeIndex = map[CategoryType][]string{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could these just be simplified into 2 variables? something like:

var parentCatalogerTypes = []string { .... }
var childCatalogerTypes = []string { .... }

Copy link
Copy Markdown
Contributor Author

@spiffcs spiffcs Aug 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Yea that would be a good simplification here.

My only hesitancy to change it back to that is the original config object we had discussed on the issue:
#931 (comment)

I think keeping this as is has two advantages:

  1. Is clear to future users/contributors that Os/Binary categorization types were an explicit choice as and additional condition. The parent child designation loses this nuance a little.
  2. It keeps us open to category based configuration options we may want to consider in the future

WDYT?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the concern is that we want to be explicit about OS and binary cataloger types, these could be named

var osCatalogerTypes = []string { .... }
var binaryCatalogerTypes = []string { .... }

keeps us open to category based configuration options we may want to consider in the future

I'm all for forward-thinking such as being open to more configuration. The suggestion was more that since we're not doing that at the moment, we don't necessarily know what that would look like (although you had an option originally), so it might be better to just make whatever changes at such time as we do change the feature. Again, this is not a blocker and I'll leave it to your discernment.

Comment thread syft/pkg/cataloger/package_exclusions.go Outdated
Comment thread test/integration/package_ownership_relationship_test.go Outdated
@kzantow
Copy link
Copy Markdown
Contributor

kzantow commented Aug 8, 2023

One more question I forgot: should this PR include a boolean config option to revert this behavior?

spiffcs added 3 commits August 8, 2023 10:31
Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
@spiffcs
Copy link
Copy Markdown
Contributor Author

spiffcs commented Aug 8, 2023

One more question I forgot: should this PR include a boolean config option to revert this behavior?

Yea good call - this should now be added with 58f6d69

I've opted for the new behavior of exclusions to be the default since we've identified the synthetic binary packages in some cases to be a mistake. Users can add the following to their configs to reenable the old flow:

exclude-binary-overlap-by-ownership: false

Copy link
Copy Markdown
Contributor

@kzantow kzantow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note the comment about the change to encode_decode_cycle_test, I think this might be a blocker (and an accidental commit?)

Comment thread syft/pkg/cataloger/rpm/cataloger.go
Comment thread syft/pkg/cataloger/package_exclusions.go Outdated
Comment on lines +15 to +22
type CategoryType string

const (
OsCatalogerType CategoryType = "os"
BinaryCatalogerType CategoryType = "binary"
)

var CatalogerTypeIndex = map[CategoryType][]string{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the concern is that we want to be explicit about OS and binary cataloger types, these could be named

var osCatalogerTypes = []string { .... }
var binaryCatalogerTypes = []string { .... }

keeps us open to category based configuration options we may want to consider in the future

I'm all for forward-thinking such as being open to more configuration. The suggestion was more that since we're not doing that at the moment, we don't necessarily know what that would look like (although you had an option originally), so it might be better to just make whatever changes at such time as we do change the feature. Again, this is not a blocker and I'll leave it to your discernment.

Comment thread syft/pkg/cataloger/package_exclusions.go Outdated
Comment thread test/integration/encode_decode_cycle_test.go
Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
Copy link
Copy Markdown
Contributor

@kzantow kzantow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍 -- and definitely agree that having the default behavior as you noted to exclude these entries in the SBOM

@spiffcs spiffcs merged commit 466da7c into main Aug 8, 2023
@spiffcs spiffcs deleted the 931-binary-cataloger-defaults branch August 8, 2023 17:00
Comment thread README.md

# allows users to exclude synthetic binary packages from the sbom
# these packages are removed if an overlap with a non-synthetic package is found
exclude-overlap-by-ownership: true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parallelism int
}

func DefaultConfig() Config {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why delete the DefaultConfig method?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function was only used as a part of *_test.go files. It was moved here:

func defaultConfig() cataloger.Config {
return cataloger.Config{
Search: cataloger.DefaultSearchConfig(),
Parallelism: 1,
LinuxKernel: kernel.DefaultLinuxCatalogerConfig(),
Python: python.DefaultCatalogerConfig(),
ExcludeBinaryOverlapByOwnership: true,
}
}

Apologies for the boy scout change on an unrelated PR - my IDE was yelling about this being deadcode and I could not figure out why - the refactor over to test resolved that issue

// 3) the child is a synthetic package generated by the binary cataloger
// 4) the package names are identical
// This exclude was implemented as a way to help resolve: https://github.com/anchore/syft/issues/931
func Exclude(r artifact.Relationship, c *pkg.Collection) bool {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function seems very specific, but has a very generic name. I think the name should probably be tweaked to be a little more specific.

)

var (
osCatalogerTypes = []string{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the filtering should be based on the package type, not the cataloger names.

@spiffcs spiffcs mentioned this pull request Aug 9, 2023
GijsCalis pushed a commit to GijsCalis/syft that referenced this pull request Feb 19, 2024
…chore#1948)

Fixes anchore#931

PR anchore#1948 introduces a new implicit exclusion for binary packages that overlap by file ownership and have certain characteristics:

1) the relationship between packages is OwnershipByFileOverlap
2) the parent package is an "os" package - see changelog for included catalogers
3) the child is a synthetic package generated by the binary cataloger - see changelog for included catalogers
4) the package names are identical

---------

Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Package duplicated by different cataloger

3 participants