-
Notifications
You must be signed in to change notification settings - Fork 595
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose option to scan without indexing #3360
base: main
Are you sure you want to change the base?
Expose option to scan without indexing #3360
Conversation
8a76ec3
to
fb1ca8b
Compare
- Implements some new functions on unindexed directory which previously had 'panic' for their impl - Fixed RelativeFileByPath for unindexed directory resolver Signed-off-by: adammcclenaghan <[email protected]>
d913786
to
019df1c
Compare
@@ -228,6 +229,9 @@ func (cfg *Catalog) AddFlags(flags clio.FlagSet) { | |||
|
|||
flags.StringVarP(&cfg.Source.BasePath, "base-path", "", | |||
"base directory for scanning, no links will be followed above this directory, and all paths will be reported relative to this directory") | |||
|
|||
flags.BoolVarP(&cfg.Unindexed, "unindexed", "", | |||
"whether to index the file system or not, indexing can improve scan times but incurs a memory overhead, default false.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we expose a CLI option for this? I feel like this is more of a low-level option, which hints at leaving it only in the config (and overridable via env vars too).
if err != nil { | ||
return nil, fmt.Errorf("unable to create directory resolver: %w", err) | ||
if s.config.Unindexed { | ||
s.resolver = fileresolver.NewFromRootedUnindexedDirectory(s.config.Path, s.config.Base) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'll need to modify this resolver to accept exclusions -- I think it would be surprising to users to have different exclusion behavior based off of changing only an index related configuration. If this is too difficult or infeasible then we should at least warn the user with log.Warn()
.
@@ -10,20 +10,22 @@ import ( | |||
"github.com/anchore/syft/syft/source" | |||
) | |||
|
|||
func NewSourceProvider(path string, exclude source.ExcludeConfig, alias source.Alias, basePath string) source.Provider { | |||
func NewSourceProvider(path string, exclude source.ExcludeConfig, alias source.Alias, basePath string, unindexed bool) source.Provider { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're in v1.0 would this be considered a breaking change for user's who are using this method for syft as a library?
@anchore/tools before we go to merge this sans other comments is this something we want to put more 👀 on before merge, release, and then potentially break some people's usage?
Description
There was some discussion recently regarding supporting scanning without indexing in this issue
In that issue, the main complaint is related to the time taken to scan.
The most promising solution presented was:
However, on memory constrained systems with large disks, file system indexing can use considerable memory.
I am raising this PR as I'd like to propose exposing the unindexed resolver, as a form of preferring slower scan times in favour of lower memory usage.
As a result, I would like to propose getting this change in without the added complexity of
Allow the user to opt into non-** globs
. Perhaps adding that is a reasonable next step as part of a follow up PR?This partially addresses #3145 but does not necessarily fix as it will not improve their scan times, however it will allow for reduced memory imapct on memory constrained systems.
As part of this change, I've changed the behaviour of
RelativeFileByPath
and added a unit test, the existing implementation didn't seem to work.When
package.go::fetchMd5Contents
ran with the unindexed resolver it did not resolve correctly because thelocation
field passed into the function points at the DPKGstatus
file, so the existing code which tried to dopath.Clean(path.Join(l.RealPath, p))
wound up trying to join something like/var/lib/dpkg/status
with/info
to search/var/ib/dpkg/status/info
which is incorrect as we want to search in/var/lib/dpkg/info
.One other thing I've noticed while testing this change is that I get more 'location' files for DPKG artifacts when scanning with the unindexed resolver than with the indexed resolver, however I'm not sure this is a bug with the unindexed resolver, perhaps there is a bug with the existing indexed resolver?
With the indexed resolver, I get these locations for one of my DPKG artifacts:
With the unindexed resolver, I get these locations for the same DPKG artifact:
The difference in behaviour relates to
parse_dpkg_db.go::findDpkgInfoFiles
and this call in particular:Here we pass something like:
/var/lib/dpkg/info/
acpid.*
Giving us a
FilesByGlob
path ofvar/lib/dpkg/info/acpid.*
On my system, I believe we should be matching all of these:
However, with the indexed directory resolver the pattern
var/lib/dpkg/info/acpid.*
gets converted in ther.chroot.ToNativeGlob(pattern)
call to/var/lib/dpkg/info/acpid./*
which returns no matches. This explains why the indexed resolver doesn't find locations likeacpid.prerm
in the output. Themd5sums
andconffiles
are found bypackage.go::fetchMd5Contents
andpackage.go::fetchConffileContents
respectively and they useRelativeFileByPath
which explains why they still work.Maybe I'm unclear on what
FilesByGlob
should be returning here, and perhaps the issue is with the existing unindexed_directory impl, but I'd like to get some clarification on what we expect the behaviour to be here.Type of change
Checklist: