Skip to content

Conversation

shahzadhaider1
Copy link
Contributor

@shahzadhaider1 shahzadhaider1 commented Aug 12, 2025

Description:

This PR fixes an issue with scanning private Git repositories (GitHub/GitLab) when using the legacy JSON output format.

Previously, the scan process attempted to clone the repository a second time during result printing. Since authentication was not passed along in this second clone, the process failed for private repositories. As a result, legacy JSON output could not be generated for private Git repositories.

Fix

  • Repositories are now persisted after the initial scan instead of being deleted immediately.
  • When generating legacy JSON output, the scan reuses the already cloned repositories, eliminating the need for a second clone and preventing authentication errors.
  • After results are printed, we perform a cleanup step to remove any cloned repositories and free up system resources.

Implementation Details

  • Added a new flag: PrintLegacyJSON.
  • Sources check this flag, if true, they skip immediate cleanup after cloning/scanning.
  • In main.go, if the --jsonLegacy flag is set, cleanup is performed at the end.

Benefits

  • Legacy JSON output now works reliably for both public and private GitHub/GitLab repositories.
  • Avoids redundant cloning -> improves performance.
  • Ensures proper cleanup after scan results are printed.

Checklist:

  • Tests passing (make test-community)?
  • Lint passing (make lint this requires golangci-lint)?
source-legacy-json-bug

@shahzadhaider1 shahzadhaider1 force-pushed the fix/OSS-283-legacy-json-gitlab-private-repo branch from c8cbe27 to dda5192 Compare August 12, 2025 14:13
@shahzadhaider1 shahzadhaider1 force-pushed the fix/OSS-283-legacy-json-gitlab-private-repo branch from 34e2614 to d9633a4 Compare August 20, 2025 15:36
@shahzadhaider1 shahzadhaider1 force-pushed the fix/OSS-283-legacy-json-gitlab-private-repo branch from 1c8ce6b to bbedf6c Compare August 21, 2025 07:30
@shahzadhaider1 shahzadhaider1 force-pushed the fix/OSS-283-legacy-json-gitlab-private-repo branch from 558cdda to e8b1127 Compare August 22, 2025 11:04
@shahzadhaider1 shahzadhaider1 self-assigned this Aug 22, 2025
@shahzadhaider1 shahzadhaider1 marked this pull request as ready for review August 22, 2025 11:16
@shahzadhaider1 shahzadhaider1 requested review from a team as code owners August 22, 2025 11:16
@kashifkhan0771 kashifkhan0771 changed the title fix legacy json flag for gitlab private repos Fix legacy json flag for Github and Gitlab private repos Aug 22, 2025
main.go Outdated
Comment on lines 712 to 723
tmpDir := filepath.Join(os.TempDir(), "trufflehog_"+strconv.Itoa(os.Getpid()))
// We need to persist the repo(s) if we're using legacy JSON output
// because it requires commit SHAs in the output.
persistRepo := *gitNoCleanup || *githubNoCleanup || *gitlabNoCleanup
if *jsonLegacy && !persistRepo {
if err := os.MkdirAll(tmpDir, os.ModePerm); err != nil {
return scanMetrics, fmt.Errorf("failed to create temporary directory: %v", err)
}
*gitNoCleanup, *githubNoCleanup, *gitlabNoCleanup = true, true, true
*gitClonePath, *githubClonePath, *gitlabClonePath = tmpDir, tmpDir, tmpDir
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few concerns about this approach:

  • First, we shouldn't be setting flags for sources that aren't being run. For example, if I'm only using the git source, this approach still sets flags for gitlab and github too. I haven't tested whether that causes issues, but it doesn't seem like a good idea. A cleaner solution would be to move this logic into each source block so that we only set flags relevant to the source that's actually running.

  • Second, why are we creating a temp directory here? Let's say a user sets the --clone-path flag but doesn't set --no-cleanup. That means they want to use a custom path but don’t need to retain the repo afterward. However, the code hits the if condition at line 716, overwrites their --clone-path with a temp path, and clones the repo there instead. So we're kind of misleading the user here. They think we’re cloning to the path they specified, but we silently use a temp path behind the scenes which is not great.

  • Third, if someone passes both --no-cleanup and --clone-path, then (as far as I can tell) this logic doesn't apply at all. So what happens in that case? Do we run into the same error again?

Right now, the only scenario where this seems to work cleanly is a basic scan with no --clone-path and no --no-cleanup set and in that approach we by default use temp path to clone repository and cleanup afterwards.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, we shouldn't be setting flags for sources that aren't being run. For example, if I'm only using the git source, this approach still sets flags for gitlab and github too. I haven't tested whether that causes issues, but it doesn't seem like a good idea. A cleaner solution would be to move this logic into each source block so that we only set flags relevant to the source that's actually running.

I initially combined them into one block to avoid repeating if checks in each source block, but I’ve now updated the code to handle them separately.

Second, why are we creating a temp directory here? Let's say a user sets the --clone-path flag but doesn't set --no-cleanup. That means they want to use a custom path but don’t need to retain the repo afterward. However, the code hits the if condition at line 716, overwrites their --clone-path with a temp path, and clones the repo there instead. So we're kind of misleading the user here. They think we’re cloning to the path they specified, but we silently use a temp path behind the scenes which is not great.

Nice catch. I wasn't fully aware of the --clone-path functionality. I'll address this.

Third, if someone passes both --no-cleanup and --clone-path, then (as far as I can tell) this logic doesn't apply at all. So what happens in that case? Do we run into the same error again?

That’s the expected behavior. If both --no-cleanup and --clone-path are set, there’s nothing additional we need to do, we simply rely on the provided clone path and use it for printing results.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your initial review. I’ve updated the implementation based on your feedback and included a list of the tests I ran along with the observed behavior. Please take a look when you get a chance

@shahzadhaider1 shahzadhaider1 force-pushed the fix/OSS-283-legacy-json-gitlab-private-repo branch from f7731b9 to d7098d0 Compare August 26, 2025 11:07
Copy link
Contributor

@camgunz camgunz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a Q about source metadata--will ping in Slack

@@ -99,6 +99,7 @@ message Git {
string repository = 4;
string timestamp = 5;
int64 line = 6;
string repository_local_path = 7;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's your thinking behind putting this info in the source metadata? My read is that we don't super need it for secrets so like, we could just not do it and save a fair amount of code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to do this so the legacy printer on the detector side can print the right thing

Comment on lines +740 to +743
if !s.conn.GetPrintLegacyJson() {
if strings.HasPrefix(path, filepath.Join(os.TempDir(), "trufflehog")) || (!s.conn.NoCleanup && s.conn.GetClonePath() != "") {
defer os.RemoveAll(path)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eek, I didn't realize there was so much duplication between the git sources. OK that goes on the list haha 📝

(to be clear, nothing to change here, just remarking)

Copy link
Contributor

@camgunz camgunz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK looks good! If it makes sense you can do this thing, but totally up to you

@shahzadhaider1
Copy link
Contributor Author

@kashifkhan0771 please re-review.

Copy link
Contributor

@kashifkhan0771 kashifkhan0771 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Awesome work ❤️

@kashifkhan0771
Copy link
Contributor

kashifkhan0771 commented Sep 3, 2025

  • Make sure to test the changes with both Public/Private repositories for all git sources.

@shahzadhaider1 shahzadhaider1 merged commit 2114e77 into trufflesecurity:main Sep 3, 2025
16 of 19 checks passed
@shahzadhaider1 shahzadhaider1 deleted the fix/OSS-283-legacy-json-gitlab-private-repo branch September 3, 2025 06:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants