Skip to content

Commit

Permalink
survey: started TODO list at bottom of source file
Browse files Browse the repository at this point in the history
  • Loading branch information
jeffhostetler authored and dscho committed Jul 17, 2024
1 parent d06b3ff commit ec39557
Showing 1 changed file with 46 additions and 0 deletions.
46 changes: 46 additions & 0 deletions builtin/survey.c
Original file line number Diff line number Diff line change
Expand Up @@ -2294,3 +2294,49 @@ int cmd_survey(int argc, const char **argv, const char *prefix)

return 0;
}

/*
* NEEDSWORK: The following is a bit of a laundry list of things
* that I'd like to add.
*
* [] Dump stats on all of the packfiles. The number and size of each.
* Whether each is in the .git directory or in an alternate. The state
* of the IDX or MIDX files and etc. Delta chain stats. All of this
* data is relative to the "lived-in" state of the repository. Stuff
* that may change after a GC or repack.
*
* [] Dump stats on each remote. When we fetch from a remote the size
* of the response is related to the set of haves on the server. You
* can see this in `GIT_TRACE_CURL=1 git fetch`. We get a `ls-refs`
* payload that lists all of the branches and tags on the server, so
* at a minimum the RefName and SHA for each. But for annotated tags
* we also get the peeled SHA. The size of this overhead on every
* fetch is proporational to the size of the `git ls-remote` response
* (roughly, although the latter repeats the RefName of the peeled
* tag). If, for example, you have 500K refs on a remote, you're
* going to have a long "haves" message, so every fetch will be slow
* just because of that overhead (not counting new objects to be
* downloaded).
*
* Note that the local set of tags in "refs/tags/" is a union over all
* remotes. However, since most people only have one remote, we can
* probaly estimate the overhead value directly from the size of the
* set of "refs/tags/" that we visited while building the `ref_info`
* and `ref_array` and not need to ask the remote.
*
* [] Dump info on the complexity of the DAG. Criss-cross merges.
* The number of edges that must be touched to compute merge bases.
* Edge length. The number of parallel lanes in the history that must
* be navigated to get to the merge base. What affects the cost of
* the Ahead/Behind computation? How often do criss-crosses occur and
* do they cause various operations to slow down?
*
* [] If there are primary branches (like "main" or "master") are they
* always on the left side of merges? Does the graph have a clean
* left edge? Or are there normal and "backwards" merges? Do these
* cause problems at scale?
*
* [] If we have a hierarchy of FI/RI branches like "L1", "L2, ...,
* can we learn anything about the shape of the repo around these FI
* and RI integrations?
*/

0 comments on commit ec39557

Please sign in to comment.