Skip to content

Commit

Permalink
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add experimental 'git survey' builtin (#5174)
Browse files Browse the repository at this point in the history
This introduces `git survey` to Git for Windows ahead of upstream for
the express purpose of getting the path-based analysis in the hands of
more folks.

The inspiration of this builtin is
[`git-sizer`](https://github.com/github/git-sizer), but since that
command relies on `git cat-file --batch` to get the contents of objects,
it has limits to how much information it can provide.

This is mostly a rewrite of the `git survey` builtin that was introduced
into the `microsoft/git` fork in microsoft#667. That version had a
lot more bells and whistles, including an analysis much closer to what
`git-sizer` provides.

The biggest difference in this version is that this one is focused on
using the path-walk API in order to visit batches of objects based on a
common path. This allows identifying, for instance, the path that is
contributing the most to the on-disk size across all versions at that
path.

For example, here are the top ten paths contributing to my local Git
repository (which includes `microsoft/git` and `gitster/git`):

```
TOP FILES BY DISK SIZE
============================================================================
                                    Path | Count | Disk Size | Inflated Size
-----------------------------------------+-------+-----------+--------------
                       whats-cooking.txt |  1373 |  11637459 |      37226854
             t/helper/test-gvfs-protocol |     2 |   6847105 |      17233072
                      git-rebase--helper |     1 |   6027849 |      15269664
                          compat/mingw.c |  6111 |   5194453 |     463466970
             t/helper/test-parse-options |     1 |   3420385 |       8807968
                  t/helper/test-pkt-line |     1 |   3408661 |       8778960
      t/helper/test-dump-untracked-cache |     1 |   3408645 |       8780816
            t/helper/test-dump-fsmonitor |     1 |   3406639 |       8776656
                                po/vi.po |   104 |   1376337 |      51441603
                                po/de.po |   210 |   1360112 |      71198603
```

This kind of analysis has been helpful in identifying the reasons for
growth in a few internal monorepos. Those findings motivated the changes
in #5157 and #5171.

With this early version in Git for Windows, we can expand the reach of
the experimental tool in advance of it being contributed to the upstream
project.

Unfortunately, this will mean that in the next `microsoft/git` rebase,
@jeffhostetler's version will need to be pulled out since there are
enough conflicts. These conflicts include how tables are stored and
generated, as the version in this PR is slightly more general to allow
for different kinds of data.
dscho authored and Git for Windows Build Agent committed Jan 6, 2025
2 parents 11b91cf + 3703f39 commit 5d2243b
Showing 12 changed files with 1,138 additions and 0 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -165,6 +165,7 @@
/git-submodule
/git-submodule--helper
/git-subtree
/git-survey
/git-svn
/git-switch
/git-symbolic-ref
2 changes: 2 additions & 0 deletions Documentation/config.txt
Original file line number Diff line number Diff line change
@@ -536,6 +536,8 @@ include::config/status.txt[]

include::config/submodule.txt[]

include::config/survey.txt[]

include::config/tag.txt[]

include::config/tar.txt[]
14 changes: 14 additions & 0 deletions Documentation/config/survey.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
survey.*::
These variables adjust the default behavior of the `git survey`
command. The intention is that this command could be run in the
background with these options.
+
--
verbose::
This boolean value implies the `--[no-]verbose` option.
progress::
This boolean value implies the `--[no-]progress` option.
top::
This integer value implies `--top=<N>`, specifying the
number of entries in the detail tables.
--
83 changes: 83 additions & 0 deletions Documentation/git-survey.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
git-survey(1)
=============

NAME
----
git-survey - EXPERIMENTAL: Measure various repository dimensions of scale

SYNOPSIS
--------
[verse]
(EXPERIMENTAL!) 'git survey' <options>

DESCRIPTION
-----------

Survey the repository and measure various dimensions of scale.

As repositories grow to "monorepo" size, certain data shapes can cause
performance problems. `git-survey` attempts to measure and report on
known problem areas.

Ref Selection and Reachable Objects
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In this first analysis phase, `git survey` will iterate over the set of
requested branches, tags, and other refs and treewalk over all of the
reachable commits, trees, and blobs and generate various statistics.

OPTIONS
-------

--progress::
Show progress. This is automatically enabled when interactive.

Ref Selection
~~~~~~~~~~~~~

The following options control the set of refs that `git survey` will examine.
By default, `git survey` will look at tags, local branches, and remote refs.
If any of the following options are given, the default set is cleared and
only refs for the given options are added.

--all-refs::
Use all refs. This includes local branches, tags, remote refs,
notes, and stashes. This option overrides all of the following.

--branches::
Add local branches (`refs/heads/`) to the set.

--tags::
Add tags (`refs/tags/`) to the set.

--remotes::
Add remote branches (`refs/remote/`) to the set.

--detached::
Add HEAD to the set.

--other::
Add notes (`refs/notes/`) and stashes (`refs/stash/`) to the set.

OUTPUT
------

By default, `git survey` will print information about the repository in a
human-readable format that includes overviews and tables.

References Summary
~~~~~~~~~~~~~~~~~~

The references summary includes a count of each kind of reference,
including branches, remote refs, and tags (split by "all" and
"annotated").

Reachable Object Summary
~~~~~~~~~~~~~~~~~~~~~~~~

The reachable object summary shows the total number of each kind of Git
object, including tags, commits, trees, and blobs.

GIT
---
Part of the linkgit:git[1] suite
1 change: 1 addition & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1314,6 +1314,7 @@ BUILTIN_OBJS += builtin/sparse-checkout.o
BUILTIN_OBJS += builtin/stash.o
BUILTIN_OBJS += builtin/stripspace.o
BUILTIN_OBJS += builtin/submodule--helper.o
BUILTIN_OBJS += builtin/survey.o
BUILTIN_OBJS += builtin/symbolic-ref.o
BUILTIN_OBJS += builtin/tag.o
BUILTIN_OBJS += builtin/unpack-file.o
1 change: 1 addition & 0 deletions builtin.h
Original file line number Diff line number Diff line change
@@ -231,6 +231,7 @@ int cmd_sparse_checkout(int argc, const char **argv, const char *prefix, struct
int cmd_status(int argc, const char **argv, const char *prefix, struct repository *repo);
int cmd_stash(int argc, const char **argv, const char *prefix, struct repository *repo);
int cmd_stripspace(int argc, const char **argv, const char *prefix, struct repository *repo);
int cmd_survey(int argc, const char **argv, const char *prefix, struct repository *repo);
int cmd_submodule__helper(int argc, const char **argv, const char *prefix, struct repository *repo);
int cmd_switch(int argc, const char **argv, const char *prefix, struct repository *repo);
int cmd_symbolic_ref(int argc, const char **argv, const char *prefix, struct repository *repo);
930 changes: 930 additions & 0 deletions builtin/survey.c

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions command-list.txt
Original file line number Diff line number Diff line change
@@ -187,6 +187,7 @@ git-stash mainporcelain
git-status mainporcelain info
git-stripspace purehelpers
git-submodule mainporcelain
git-survey mainporcelain
git-svn foreignscminterface
git-switch mainporcelain history
git-symbolic-ref plumbingmanipulators
1 change: 1 addition & 0 deletions git.c
Original file line number Diff line number Diff line change
@@ -627,6 +627,7 @@ static struct cmd_struct commands[] = {
{ "status", cmd_status, RUN_SETUP | NEED_WORK_TREE },
{ "stripspace", cmd_stripspace },
{ "submodule--helper", cmd_submodule__helper, RUN_SETUP },
{ "survey", cmd_survey, RUN_SETUP },
{ "switch", cmd_switch, RUN_SETUP | NEED_WORK_TREE },
{ "symbolic-ref", cmd_symbolic_ref, RUN_SETUP },
{ "tag", cmd_tag, RUN_SETUP | DELAY_PAGER_CONFIG },
1 change: 1 addition & 0 deletions meson.build
Original file line number Diff line number Diff line change
@@ -592,6 +592,7 @@ builtin_sources = [
'builtin/stash.c',
'builtin/stripspace.c',
'builtin/submodule--helper.c',
'builtin/survey.c',
'builtin/symbolic-ref.c',
'builtin/tag.c',
'builtin/unpack-file.c',
1 change: 1 addition & 0 deletions t/meson.build
Original file line number Diff line number Diff line change
@@ -957,6 +957,7 @@ integration_tests = [
't8012-blame-colors.sh',
't8013-blame-ignore-revs.sh',
't8014-blame-ignore-fuzzy.sh',
't8100-git-survey.sh',
't9001-send-email.sh',
't9002-column.sh',
't9003-help-autocorrect.sh',
102 changes: 102 additions & 0 deletions t/t8100-git-survey.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
#!/bin/sh

test_description='git survey'

GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME

TEST_PASSES_SANITIZE_LEAK=0
export TEST_PASSES_SANITIZE_LEAK

. ./test-lib.sh

test_expect_success 'git survey -h shows experimental warning' '
test_expect_code 129 git survey -h 2>usage &&
grep "EXPERIMENTAL!" usage
'

test_expect_success 'create a semi-interesting repo' '
test_commit_bulk 10 &&
git tag -a -m one one HEAD~5 &&
git tag -a -m two two HEAD~3 &&
git tag -a -m three three two &&
git tag -a -m four four three &&
git update-ref -d refs/tags/three &&
git update-ref -d refs/tags/two
'

test_expect_success 'git survey --progress' '
GIT_PROGRESS_DELAY=0 git survey --all-refs --progress >out 2>err &&
grep "Preparing object walk" err
'

test_expect_success 'git survey (default)' '
git survey --all-refs >out 2>err &&
test_line_count = 0 err &&
test_oid_cache <<-EOF &&
commits_size_on_disk sha1: 1523
commits_size_on_disk sha256: 1811
commits_size sha1: 2153
commits_size sha256: 2609
trees_size_on_disk sha1: 495
trees_size_on_disk sha256: 635
trees_size sha1: 1706
trees_size sha256: 2366
tags_size sha1: 528
tags_size sha256: 624
tags_size_on_disk sha1: 510
tags_size_on_disk sha256: 569
EOF
tr , " " >expect <<-EOF &&
GIT SURVEY for "$(pwd)"
-----------------------------------------------------
REFERENCES SUMMARY
========================
, Ref Type | Count
-----------------+------
, Branches | 1
Remote refs | 0
Tags (all) | 2
Tags (annotated) | 2
REACHABLE OBJECT SUMMARY
========================
Object Type | Count
------------+------
Tags | 4
Commits | 10
Trees | 10
Blobs | 10
TOTAL OBJECT SIZES BY TYPE
===============================================
Object Type | Count | Disk Size | Inflated Size
------------+-------+-----------+--------------
Commits | 10 | $(test_oid commits_size_on_disk) | $(test_oid commits_size)
Trees | 10 | $(test_oid trees_size_on_disk) | $(test_oid trees_size)
Blobs | 10 | 191 | 101
Tags | 4 | $(test_oid tags_size_on_disk) | $(test_oid tags_size)
EOF
lines=$(wc -l <expect) &&
head -n $lines out >out-trimmed &&
test_cmp expect out-trimmed &&
for type in "DIRECTORIES" "FILES"
do
for metric in "COUNT" "DISK SIZE" "INFLATED SIZE"
do
grep "TOP $type BY $metric" out || return 1
done || return 1
done
'

test_done

0 comments on commit 5d2243b

Please sign in to comment.