-
Notifications
You must be signed in to change notification settings - Fork 2k
Fix legacy json flag for Github and Gitlab private repos #4386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix legacy json flag for Github and Gitlab private repos #4386
Conversation
c8cbe27
to
dda5192
Compare
34e2614
to
d9633a4
Compare
1c8ce6b
to
bbedf6c
Compare
558cdda
to
e8b1127
Compare
main.go
Outdated
tmpDir := filepath.Join(os.TempDir(), "trufflehog_"+strconv.Itoa(os.Getpid())) | ||
// We need to persist the repo(s) if we're using legacy JSON output | ||
// because it requires commit SHAs in the output. | ||
persistRepo := *gitNoCleanup || *githubNoCleanup || *gitlabNoCleanup | ||
if *jsonLegacy && !persistRepo { | ||
if err := os.MkdirAll(tmpDir, os.ModePerm); err != nil { | ||
return scanMetrics, fmt.Errorf("failed to create temporary directory: %v", err) | ||
} | ||
*gitNoCleanup, *githubNoCleanup, *gitlabNoCleanup = true, true, true | ||
*gitClonePath, *githubClonePath, *gitlabClonePath = tmpDir, tmpDir, tmpDir | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few concerns about this approach:
-
First, we shouldn't be setting flags for sources that aren't being run. For example, if I'm only using the
git
source, this approach still sets flags forgitlab
andgithub
too. I haven't tested whether that causes issues, but it doesn't seem like a good idea. A cleaner solution would be to move this logic into each source block so that we only set flags relevant to the source that's actually running. -
Second, why are we creating a temp directory here? Let's say a user sets the
--clone-path
flag but doesn't set--no-cleanup
. That means they want to use a custom path but don’t need to retain the repo afterward. However, the code hits theif
condition at line 716, overwrites their--clone-path
with a temp path, and clones the repo there instead. So we're kind of misleading the user here. They think we’re cloning to the path they specified, but we silently use a temp path behind the scenes which is not great. -
Third, if someone passes both
--no-cleanup
and--clone-path
, then (as far as I can tell) this logic doesn't apply at all. So what happens in that case? Do we run into the same error again?
Right now, the only scenario where this seems to work cleanly is a basic scan with no --clone-path
and no --no-cleanup
set and in that approach we by default use temp path to clone repository and cleanup afterwards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, we shouldn't be setting flags for sources that aren't being run. For example, if I'm only using the
git
source, this approach still sets flags forgitlab
andgithub
too. I haven't tested whether that causes issues, but it doesn't seem like a good idea. A cleaner solution would be to move this logic into each source block so that we only set flags relevant to the source that's actually running.
I initially combined them into one block to avoid repeating if checks in each source block, but I’ve now updated the code to handle them separately.
Second, why are we creating a temp directory here? Let's say a user sets the
--clone-path
flag but doesn't set--no-cleanup
. That means they want to use a custom path but don’t need to retain the repo afterward. However, the code hits theif
condition at line 716, overwrites their--clone-path
with a temp path, and clones the repo there instead. So we're kind of misleading the user here. They think we’re cloning to the path they specified, but we silently use a temp path behind the scenes which is not great.
Nice catch. I wasn't fully aware of the --clone-path
functionality. I'll address this.
Third, if someone passes both
--no-cleanup
and--clone-path
, then (as far as I can tell) this logic doesn't apply at all. So what happens in that case? Do we run into the same error again?
That’s the expected behavior. If both --no-cleanup
and --clone-path
are set, there’s nothing additional we need to do, we simply rely on the provided clone path and use it for printing results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your initial review. I’ve updated the implementation based on your feedback and included a list of the tests I ran along with the observed behavior. Please take a look when you get a chance
f7731b9
to
d7098d0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a Q about source metadata--will ping in Slack
@@ -99,6 +99,7 @@ message Git { | |||
string repository = 4; | |||
string timestamp = 5; | |||
int64 line = 6; | |||
string repository_local_path = 7; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's your thinking behind putting this info in the source metadata? My read is that we don't super need it for secrets so like, we could just not do it and save a fair amount of code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to do this so the legacy printer on the detector side can print the right thing
if !s.conn.GetPrintLegacyJson() { | ||
if strings.HasPrefix(path, filepath.Join(os.TempDir(), "trufflehog")) || (!s.conn.NoCleanup && s.conn.GetClonePath() != "") { | ||
defer os.RemoveAll(path) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eek, I didn't realize there was so much duplication between the git sources. OK that goes on the list haha 📝
(to be clear, nothing to change here, just remarking)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK looks good! If it makes sense you can do this thing, but totally up to you
@kashifkhan0771 please re-review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Awesome work ❤️
|
Description:
This PR fixes an issue with scanning private Git repositories (GitHub/GitLab) when using the legacy JSON output format.
Previously, the scan process attempted to clone the repository a second time during result printing. Since authentication was not passed along in this second clone, the process failed for private repositories. As a result, legacy JSON output could not be generated for private Git repositories.
Fix
Implementation Details
PrintLegacyJSON
.main.go
, if the--jsonLegacy
flag is set, cleanup is performed at the end.Benefits
Checklist:
make test-community
)?make lint
this requires golangci-lint)?