Skip to content

Commit 27ab668

Browse files
authored
Merge pull request gitleaks#48 from zricethezav/develop
Develop
2 parents 73387cb + 87ee13f commit 27ab668

File tree

165 files changed

+45208
-206
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

165 files changed

+45208
-206
lines changed

CHANGELOG.md

+26
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
CHANGELOG
2+
=========
3+
4+
0.2.0
5+
-----
6+
Version 0.2.0 of Gitleaks is the first version update since this got relatively popular. Based on the issues raised it seems that folks want better support for integration into their pipelines. I hear ya. This is what this update tries to provide. So... what are the changes?
7+
8+
* Additionally regex checking
9+
* $HOME/.gitleaks/ directory for clones and reports
10+
* Clone into temp dir option
11+
* Persistent repos for Orgs and Users (no more re-cloning)
12+
* Pagination for Org/User list... no more partial repo lists
13+
* Since commit option
14+
* Updated README
15+
* Multi-staged Docker build
16+
* Travis CI
17+
18+
19+
0.1.0
20+
-----
21+
22+
Version 0.1.0 of Gitleaks demonstrates:
23+
24+
* full git history search
25+
* regex/entropy checks
26+
* report generation

Gopkg.lock

+65
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Gopkg.toml

+42
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
# Gopkg.toml example
2+
#
3+
# Refer to https://golang.github.io/dep/docs/Gopkg.toml.html
4+
# for detailed Gopkg.toml documentation.
5+
#
6+
# required = ["github.com/user/thing/cmd/thing"]
7+
# ignored = ["github.com/user/project/pkgX", "bitbucket.org/user/project/pkgA/pkgY"]
8+
#
9+
# [[constraint]]
10+
# name = "github.com/user/project"
11+
# version = "1.0.0"
12+
#
13+
# [[constraint]]
14+
# name = "github.com/user/project2"
15+
# branch = "dev"
16+
# source = "github.com/myfork/project2"
17+
#
18+
# [[override]]
19+
# name = "github.com/x/y"
20+
# version = "2.4.0"
21+
#
22+
# [prune]
23+
# non-go = false
24+
# go-tests = true
25+
# unused-packages = true
26+
27+
28+
[[constraint]]
29+
name = "github.com/google/go-github"
30+
version = "15.0.0"
31+
32+
[[constraint]]
33+
branch = "master"
34+
name = "github.com/mitchellh/go-homedir"
35+
36+
[[constraint]]
37+
branch = "master"
38+
name = "golang.org/x/oauth2"
39+
40+
[prune]
41+
go-tests = true
42+
unused-packages = true

README.md

+65-25
Original file line numberDiff line numberDiff line change
@@ -5,11 +5,6 @@
55

66
## Check git repos for secrets and keys
77

8-
### Features
9-
10-
* Search all commits on all branches in topological order
11-
* Regex/Entropy checks
12-
138
#### Installing
149

1510
```bash
@@ -24,34 +19,80 @@ go get -u github.com/zricethezav/gitleaks
2419
./gitleaks {git url}
2520
```
2621

27-
This example will clone the target `{git url}` and run a diff on all commits. A report will be outputted to `{repo_name}_leaks.json`
28-
Gitleaks scans all lines of all commits and checks if there are any regular expression matches. The regexs are defined in `main.go`. Work largely based on [https://people.eecs.berkeley.edu/~rohanpadhye/files/key_leaks-msr15.pdf](https://people.eecs.berkeley.edu/~rohanpadhye/files/key_leaks-msr15.pdf) and regexes from https://github.com/dxa4481/truffleHog and https://github.com/anshumanbh/git-all-secrets.
29-
30-
##### gitLeaks User
31-
```bash
32-
./gitleaks -u {user git url}
22+
Gitleaks will clone the target `<git url>` to `$HOME/.gitleaks/clones/<repo name>` and run a regex check against all diffs of all commits on all remotes in topological order. If any leaks are found gitleaks will output the leak in json, Ex:
3323
```
34-
##### gitLeaks Org
24+
{
25+
"line": "-const AWS_KEY = \"AKIALALEMEL33243OLIAE\"",
26+
"commit": "eaeffdc65b4c73ccb67e75d96bd8743be2c85973",
27+
"string": "AKIALALEMEL33243OLIA",
28+
"reason": "AWS",
29+
"commitMsg": "remove fake key",
30+
"time": "2018-02-04 19:43:28 -0600",
31+
"author": "Zachary Rice",
32+
"file": "main.go",
33+
"repoURL": "https://github.com/zricethezav/gronit"
34+
}
35+
```
36+
Gitleaks will not re-clone repos unless the temporary flag is set (see Options section), instead gitleaks will `fetch` all new changes before the scan. This works for users and organization repos as well. Regex's for the scan are defined in `main.go`, feel free to open a PR and contribute if you have additional regex you want included. Work largely based on [https://people.eecs.berkeley.edu/~rohanpadhye/files/key_leaks-msr15.pdf](https://people.eecs.berkeley.edu/~rohanpadhye/files/key_leaks-msr15.pdf) and regexes from https://github.com/dxa4481/truffleHog and https://github.com/anshumanbh/git-all-secrets.
37+
38+
#### Example with Report
3539
```bash
36-
./gitleaks -o {org git url}
40+
gitleaks --json https://github.com/zricethezav/gronit
41+
```
42+
This will run gitleaks on one of my projects, gronit and create the following structure in `$HOME/.gitleaks`:
3743
```
44+
.
45+
├── clones
46+
│   └── zricethezav
47+
│   └── gronit
48+
│   ├── README.md
49+
│   ├── main.go
50+
│   ├── options.go
51+
│   ├── server.go
52+
│   └── utils.go
53+
└── report
54+
└── zricethezav
55+
└── gronit_leaks.json
56+
```
57+
The clones directory contains the repo owner (me) and any repos gitleaks has scanned. Next time we run gitleaks on gronit again we will `fetch` gronit rather than `clone`. Reports are written out to `$HOME/.gitleaks/report/<owner>/<repo>_leaks.json`
3858

39-
#### Help
59+
#### Options
4060
```
4161
usage: gitleaks [options] <url>
4262
4363
Options:
44-
-c Concurrency factor (default is 10)
45-
-u --user Git user url
46-
-r --repo Git repo url
47-
-o --org Git organization url
48-
-s --since Scan until this commit (SHA)
49-
-b --b64Entropy Base64 entropy cutoff (default is 70)
50-
-x --hexEntropy Hex entropy cutoff (default is 40)
51-
-e --entropy Enable entropy
52-
--strict Enables stopwords
53-
-h --help Display this message
64+
-c --concurrency Upper bound on concurrent diffs
65+
-u --user Git user url
66+
-r --repo Git repo url
67+
-o --org Git organization url
68+
-s --since Commit to stop at
69+
-b --b64Entropy Base64 entropy cutoff (default is 70)
70+
-x --hexEntropy Hex entropy cutoff (default is 40)
71+
-e --entropy Enable entropy
72+
-j --json Output gitleaks report
73+
--token Github API token
74+
--strict Enables stopwords
75+
-h --help Display this message
76+
5477
```
78+
79+
##### Options Explained
80+
81+
| Option | Explanation |
82+
| ------------- | ------------- |
83+
| -c --concurrency | Set the limit on the number of concurrent diffs. If unbounded, your system would throw a `too many open files` error. Tweak `ulimit` for quicker scans at your own risk. Ex: `gitleaks -c 100 <repo_url>` |
84+
| -u --user | Target git user. Reports and clones are dumped to `$HOME/.gitleaks/clones/<user>/<user_repos>` and `$HOME/.gitleaks/reports/<user>/<gitleaks_reports>`. Ex: `gitleaks -u <user_git_url>`.
85+
| -o --org | Target git organization. Reports and clones are dumped to `$HOME/.gitleaks/clones/<org>/<org_repos>` and `$HOME/.gitleaks/reports/<org>/<gitleaks_reports>`. Ex: `gitleaks -o <org_git_url>`
86+
| -r --repo | Default behavior is to have gitleaks target a specific repo, so this option is unecessary, but... Target git repo. Reports and clones are dumped to `$HOME/.gitleaks/clones/<owner>/<repos>` and `$HOME/.gitleaks/reports/<owner>/<gitleaks_reports>`
87+
| -s --since | Since argument accepts a commit hash and will scan the repo history up to and including this hash. Ex: `gitleaks -s <HASH> <repo_url>`
88+
| -b --b64Entropy | Entropy cutoff for base 64 characters. Ex: `gitleaks -e -b 70 <repo_url>` |
89+
| -x --hexEntropy | Entropy cutoff for hex characters. Ex: `gitleaks -e -x 70 <repo_url>` |
90+
| -e --entroy | Enable entropy checks. Ex: `gitleaks -e <repo_url>` |
91+
| -j --json | Enable report generation. Ex: `gitleaks --json <repo_url>` |
92+
| -t --temporary | Cloned repos will be cloned into a temp directory and removed after gitleaks exits. Ex: `gitleaks -t <repo_url>` |
93+
| --token | NOTE: you should use env var `GITHUB_TOKEN` instead of this flag. Github API token needed for scanning private repos and pagination on repo fetching from github's api. |
94+
| -- strict | Enable stopwords. Ex: `gitleaks --strict <repo_url>` |
95+
5596
NOTE: your mileage may vary so if you aren't getting the results you expected try updating the regexes to fit your needs or try tweaking the entropy cutoffs and stopwords. Entropy cutoff for base64 alphabets seemed to give good results around 70 and hex alphabets seemed to give good results around 40. Entropy is calculated using [Shannon entropy](http://www.bearcave.com/misl/misl_tech/wavelets/compression/shannon.html).
5697

5798

@@ -69,4 +110,3 @@ docker build -t gitleaks .
69110
docker run --rm --name=gitleaks gitleaks https://github.com/zricethezav/gitleaks
70111
```
71112

72-

checks.go

+26-8
Original file line numberDiff line numberDiff line change
@@ -1,32 +1,50 @@
11
package main
22

33
import (
4+
_ "fmt"
45
"math"
56
"strings"
67
)
78

9+
// TODO LOCAL REPO!!!!
10+
811
// checks Regex and if enabled, entropy and stopwords
9-
func doChecks(diff string, commit string) []LeakElem {
10-
var match string
11-
var leaks []LeakElem
12-
var leak LeakElem
12+
func doChecks(diff string, commit Commit, opts *Options, repo RepoDesc) []LeakElem {
13+
var (
14+
match string
15+
leaks []LeakElem
16+
leak LeakElem
17+
)
18+
1319
lines := strings.Split(diff, "\n")
20+
file := "unable to determine file"
1421
for _, line := range lines {
22+
if strings.Contains(line, "diff --git a") {
23+
idx := fileDiffRegex.FindStringIndex(line)
24+
if len(idx) == 2 {
25+
file = line[idx[1]:]
26+
}
27+
}
28+
1529
for leakType, re := range regexes {
1630
match = re.FindString(line)
1731
if len(match) == 0 ||
1832
(opts.Strict && containsStopWords(line)) ||
19-
(opts.Entropy && !checkShannonEntropy(line)) {
33+
(opts.Entropy && !checkShannonEntropy(line, opts)) {
2034
continue
2135
}
2236

2337
leak = LeakElem{
2438
Line: line,
25-
Commit: commit,
39+
Commit: commit.Hash,
2640
Offender: match,
2741
Reason: leakType,
42+
Msg: commit.Msg,
43+
Time: commit.Time,
44+
Author: commit.Author,
45+
File: file,
46+
RepoURL: repo.url,
2847
}
29-
3048
leaks = append(leaks, leak)
3149
}
3250
}
@@ -35,7 +53,7 @@ func doChecks(diff string, commit string) []LeakElem {
3553
}
3654

3755
// checkShannonEntropy checks entropy of target
38-
func checkShannonEntropy(target string) bool {
56+
func checkShannonEntropy(target string, opts *Options) bool {
3957
var (
4058
sum float64
4159
targetBase64Len int

checks_test.go

+15-8
Original file line numberDiff line numberDiff line change
@@ -4,24 +4,25 @@ import (
44
"testing"
55
)
66

7-
func init() {
8-
opts = &Options{
7+
func TestCheckRegex(t *testing.T) {
8+
var results []LeakElem
9+
opts := &Options{
910
Concurrency: 10,
1011
B64EntropyCutoff: 70,
1112
HexEntropyCutoff: 40,
1213
Entropy: false,
1314
}
14-
}
15-
16-
func TestCheckRegex(t *testing.T) {
17-
var results []LeakElem
15+
repo := RepoDesc{
16+
url: "someurl",
17+
}
18+
commit := Commit{}
1819
checks := map[string]int{
1920
"aws=\"AKIALALEMEL33243OLIAE": 1,
2021
"aws\"afewafewafewafewaf\"": 0,
2122
}
2223

2324
for k, v := range checks {
24-
results = doChecks(k, "commit")
25+
results = doChecks(k, commit, opts, repo)
2526
if v != len(results) {
2627
t.Errorf("regexCheck failed on string %s", k)
2728
}
@@ -30,14 +31,20 @@ func TestCheckRegex(t *testing.T) {
3031

3132
func TestEntropy(t *testing.T) {
3233
var enoughEntropy bool
34+
opts := &Options{
35+
Concurrency: 10,
36+
B64EntropyCutoff: 70,
37+
HexEntropyCutoff: 40,
38+
Entropy: false,
39+
}
3340
checks := map[string]bool{
3441
"reddit_api_secret = settings./.http}": false,
3542
"heroku_client_secret = simple": false,
3643
"reddit_api_secret = \"4ok1WFf57-EMswEfAFGewa\"": true,
3744
"aws_secret= \"AKIAIMNOJVGFDXXFE4OA\"": true,
3845
}
3946
for k, v := range checks {
40-
enoughEntropy = checkShannonEntropy(k)
47+
enoughEntropy = checkShannonEntropy(k, opts)
4148
if v != enoughEntropy {
4249
t.Errorf("checkEntropy failed for %s. Expected %t, got %t", k, v, enoughEntropy)
4350
}

0 commit comments

Comments
 (0)