From 49edce4ff96d003e2b50202105107ef42e1f730e Mon Sep 17 00:00:00 2001
From: Junio C Hamano <gitster@pobox.com>
Date: Fri, 20 Dec 2024 10:02:15 -0800
Subject: [PATCH 01/28] show-index: the short help should say the command reads
 from its input

The short help text given by "git show-index -h" says

    $ git show-index -h
    usage: git show-index [--object-format=<hash-algorithm>]

        --[no-]object-format <hash-algorithm>
                              specify the hash algorithm to use

The command takes a pack .idx file from its standard input.  The
user has to _know_ this, as there is no indication from this output.

Give a hint that the data to work on is fed from its standard input.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/git-show-index.txt | 2 +-
 builtin/show-index.c             | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/Documentation/git-show-index.txt b/Documentation/git-show-index.txt
index e49318a5a0aec7..7e574ea243e58a 100644
--- a/Documentation/git-show-index.txt
+++ b/Documentation/git-show-index.txt
@@ -9,7 +9,7 @@ git-show-index - Show packed archive index
 SYNOPSIS
 --------
 [verse]
-'git show-index' [--object-format=<hash-algorithm>]
+'git show-index' [--object-format=<hash-algorithm>] < <pack-idx-file>
 
 
 DESCRIPTION
diff --git a/builtin/show-index.c b/builtin/show-index.c
index f164c01bbea400..8678b741a47e67 100644
--- a/builtin/show-index.c
+++ b/builtin/show-index.c
@@ -7,7 +7,7 @@
 #include "parse-options.h"
 
 static const char *const show_index_usage[] = {
-	"git show-index [--object-format=<hash-algorithm>]",
+	"git show-index [--object-format=<hash-algorithm>] < <pack-idx-file>",
 	NULL
 };
 

From 0ad3d656521aa16a6496aa855bbde97160a2b2bc Mon Sep 17 00:00:00 2001
From: Patrick Steinhardt <ps@pks.im>
Date: Mon, 30 Dec 2024 11:32:23 +0100
Subject: [PATCH 02/28] object-file: fix race in object collision check

One of the tests in t5616 asserts that git-fetch(1) with `--refetch`
triggers repository maintenance with the correct set of arguments. This
test is flaky and causes us to fail sometimes:

    ++ git -c protocol.version=0 -c gc.autoPackLimit=0 -c maintenance.incremental-repack.auto=1234 -C pc1 fetch --refetch origin
    error: unable to open .git/objects/pack/pack-029d08823bd8a8eab510ad6ac75c823cfd3ed31e.pack: No such file or directory
    fatal: unable to rename temporary file to '.git/objects/pack/pack-029d08823bd8a8eab510ad6ac75c823cfd3ed31e.pack'
    fatal: could not finish pack-objects to repack local links
    fatal: index-pack failed
    error: last command exited with $?=128

The error message is quite confusing as it talks about trying to rename
a temporary packfile. A first hunch would thus be that this packfile
gets written by git-fetch(1), but removed by git-maintenance(1) while it
hasn't yet been finalized, which shouldn't ever happen. And indeed, when
looking closer one notices that the file that is supposedly of temporary
nature does not have the typical `tmp_pack_` prefix.

As it turns out, the "unable to rename temporary file" fatal error is a
red herring and the real error is "unable to open". That error is raised
by `check_collision()`, which is called by `finalize_object_file()` when
moving the new packfile into place. Because t5616 re-fetches objects, we
end up with the exact same pack as we already have in the repository. So
when the concurrent git-maintenance(1) process rewrites the preexisting
pack and unlinks it exactly at the point in time where git-fetch(1)
wants to check the old and new packfiles for equality we will see ENOENT
and thus `check_collision()` returns an error, which gets bubbled up by
`finalize_object_file()` and is then handled by `rename_tmp_packfile()`.
That function does not know about the exact root cause of the error and
instead just claims that the rename has failed.

This race is thus caused by b1b8dfde69 (finalize_object_file():
implement collision check, 2024-09-26), where we have newly introduced
the collision check.

By definition, two files cannot collide with each other when one of them
has been removed. We can thus trivially fix the issue by ignoring ENOENT
when opening either of the files we're about to check for collision.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 object-file.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/object-file.c b/object-file.c
index b1a3463852c451..0293b93bbc5ce9 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1982,13 +1982,15 @@ static int check_collision(const char *filename_a, const char *filename_b)
 
 	fd_a = open(filename_a, O_RDONLY);
 	if (fd_a < 0) {
-		ret = error_errno(_("unable to open %s"), filename_a);
+		if (errno != ENOENT)
+			ret = error_errno(_("unable to open %s"), filename_a);
 		goto out;
 	}
 
 	fd_b = open(filename_b, O_RDONLY);
 	if (fd_b < 0) {
-		ret = error_errno(_("unable to open %s"), filename_b);
+		if (errno != ENOENT)
+			ret = error_errno(_("unable to open %s"), filename_b);
 		goto out;
 	}
 

From c1acf1a31761d0cfddc3ea6d39c92a6528cd9c5c Mon Sep 17 00:00:00 2001
From: Patrick Steinhardt <ps@pks.im>
Date: Mon, 6 Jan 2025 10:24:25 +0100
Subject: [PATCH 03/28] object-file: rename variables in `check_collision()`

Rename variables used in `check_collision()` to clearly identify which
file is the source and which is the destination. This will make the next
step easier to reason about when we start to treat those files different
from one another.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 object-file.c | 40 ++++++++++++++++++++--------------------
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/object-file.c b/object-file.c
index 0293b93bbc5ce9..e2fa1be303cce3 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1974,56 +1974,56 @@ static void write_object_file_prepare_literally(const struct git_hash_algo *algo
 	hash_object_body(algo, &c, buf, len, oid, hdr, hdrlen);
 }
 
-static int check_collision(const char *filename_a, const char *filename_b)
+static int check_collision(const char *source, const char *dest)
 {
-	char buf_a[4096], buf_b[4096];
-	int fd_a = -1, fd_b = -1;
+	char buf_source[4096], buf_dest[4096];
+	int fd_source = -1, fd_dest = -1;
 	int ret = 0;
 
-	fd_a = open(filename_a, O_RDONLY);
-	if (fd_a < 0) {
+	fd_source = open(source, O_RDONLY);
+	if (fd_source < 0) {
 		if (errno != ENOENT)
-			ret = error_errno(_("unable to open %s"), filename_a);
+			ret = error_errno(_("unable to open %s"), source);
 		goto out;
 	}
 
-	fd_b = open(filename_b, O_RDONLY);
-	if (fd_b < 0) {
+	fd_dest = open(dest, O_RDONLY);
+	if (fd_dest < 0) {
 		if (errno != ENOENT)
-			ret = error_errno(_("unable to open %s"), filename_b);
+			ret = error_errno(_("unable to open %s"), dest);
 		goto out;
 	}
 
 	while (1) {
 		ssize_t sz_a, sz_b;
 
-		sz_a = read_in_full(fd_a, buf_a, sizeof(buf_a));
+		sz_a = read_in_full(fd_source, buf_source, sizeof(buf_source));
 		if (sz_a < 0) {
-			ret = error_errno(_("unable to read %s"), filename_a);
+			ret = error_errno(_("unable to read %s"), source);
 			goto out;
 		}
 
-		sz_b = read_in_full(fd_b, buf_b, sizeof(buf_b));
+		sz_b = read_in_full(fd_dest, buf_dest, sizeof(buf_dest));
 		if (sz_b < 0) {
-			ret = error_errno(_("unable to read %s"), filename_b);
+			ret = error_errno(_("unable to read %s"), dest);
 			goto out;
 		}
 
-		if (sz_a != sz_b || memcmp(buf_a, buf_b, sz_a)) {
+		if (sz_a != sz_b || memcmp(buf_source, buf_dest, sz_a)) {
 			ret = error(_("files '%s' and '%s' differ in contents"),
-				    filename_a, filename_b);
+				    source, dest);
 			goto out;
 		}
 
-		if (sz_a < sizeof(buf_a))
+		if (sz_a < sizeof(buf_source))
 			break;
 	}
 
 out:
-	if (fd_a > -1)
-		close(fd_a);
-	if (fd_b > -1)
-		close(fd_b);
+	if (fd_source > -1)
+		close(fd_source);
+	if (fd_dest > -1)
+		close(fd_dest);
 	return ret;
 }
 

From cfae50e40eb72d6116ad56c616b3322474df4a75 Mon Sep 17 00:00:00 2001
From: Patrick Steinhardt <ps@pks.im>
Date: Mon, 6 Jan 2025 10:24:26 +0100
Subject: [PATCH 04/28] object-file: don't special-case missing source file in
 collision check

In 0ad3d65652 (object-file: fix race in object collision check,
2024-12-30) we have started to ignore ENOENT when opening either the
source or destination file of the collision check. This was done to
handle races more gracefully in case either of the potentially-colliding
disappears.

The fix is overly broad though: while the destination file may indeed
vanish racily, this shouldn't ever happen for the source file, which is
a temporary object file (either loose or in packfile format) that we
have just created. So if any concurrent process would have removed that
temporary file it would indicate an actual issue.

Stop treating ENOENT specially for the source file so that we always
bubble up this error.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 object-file.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/object-file.c b/object-file.c
index e2fa1be303cce3..c1bd746d9e238e 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1982,8 +1982,7 @@ static int check_collision(const char *source, const char *dest)
 
 	fd_source = open(source, O_RDONLY);
 	if (fd_source < 0) {
-		if (errno != ENOENT)
-			ret = error_errno(_("unable to open %s"), source);
+		ret = error_errno(_("unable to open %s"), source);
 		goto out;
 	}
 

From d7fcbe2c56468ac780c689b02c6a9e056ce39c12 Mon Sep 17 00:00:00 2001
From: Patrick Steinhardt <ps@pks.im>
Date: Mon, 6 Jan 2025 10:24:27 +0100
Subject: [PATCH 05/28] object-file: retry linking file into place when
 occluding file vanishes

Prior to 0ad3d65652 (object-file: fix race in object collision check,
2024-12-30), callers could expect that a successful return from
`finalize_object_file()` means that either the file was moved into
place, or the identical bytes were already present. If neither of those
happens, we'd return an error.

Since that commit, if the destination file disappears between our
link(3p) call and the collision check, we'd return success without
actually checking the contents, and without retrying the link. This
solves the common case that the files were indeed the same, but it means
that we may corrupt the repository if they weren't (this implies a hash
collision, but the whole point of this function is protecting against
hash collisions).

We can't be pessimistic and assume they're different; that hurts the
common case that the mentioned commit was trying to fix. But after
seeing that the destination file went away, we can retry linking again.
Adapt the code to do so when we see that the destination file has racily
vanished. This should generally succeed as we have just observed that
the destination file does not exist anymore, except in the very unlikely
event that it gets recreated by another concurrent process again.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 object-file.c | 25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/object-file.c b/object-file.c
index c1bd746d9e238e..008ddf59a56a6e 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1974,6 +1974,8 @@ static void write_object_file_prepare_literally(const struct git_hash_algo *algo
 	hash_object_body(algo, &c, buf, len, oid, hdr, hdrlen);
 }
 
+#define CHECK_COLLISION_DEST_VANISHED -2
+
 static int check_collision(const char *source, const char *dest)
 {
 	char buf_source[4096], buf_dest[4096];
@@ -1990,6 +1992,8 @@ static int check_collision(const char *source, const char *dest)
 	if (fd_dest < 0) {
 		if (errno != ENOENT)
 			ret = error_errno(_("unable to open %s"), dest);
+		else
+			ret = CHECK_COLLISION_DEST_VANISHED;
 		goto out;
 	}
 
@@ -2037,8 +2041,11 @@ int finalize_object_file(const char *tmpfile, const char *filename)
 int finalize_object_file_flags(const char *tmpfile, const char *filename,
 			       enum finalize_object_file_flags flags)
 {
-	struct stat st;
-	int ret = 0;
+	unsigned retries = 0;
+	int ret;
+
+retry:
+	ret = 0;
 
 	if (object_creation_mode == OBJECT_CREATION_USES_RENAMES)
 		goto try_rename;
@@ -2059,6 +2066,8 @@ int finalize_object_file_flags(const char *tmpfile, const char *filename,
 	 * left to unlink.
 	 */
 	if (ret && ret != EEXIST) {
+		struct stat st;
+
 	try_rename:
 		if (!stat(filename, &st))
 			ret = EEXIST;
@@ -2074,9 +2083,17 @@ int finalize_object_file_flags(const char *tmpfile, const char *filename,
 			errno = saved_errno;
 			return error_errno(_("unable to write file %s"), filename);
 		}
-		if (!(flags & FOF_SKIP_COLLISION_CHECK) &&
-		    check_collision(tmpfile, filename))
+		if (!(flags & FOF_SKIP_COLLISION_CHECK)) {
+			ret = check_collision(tmpfile, filename);
+			if (ret == CHECK_COLLISION_DEST_VANISHED) {
+				if (retries++ > 5)
+					return error(_("unable to write repeatedly vanishing file %s"),
+						     filename);
+				goto retry;
+			}
+			else if (ret)
 				return -1;
+		}
 		unlink_or_warn(tmpfile);
 	}
 

From 8d24d56ce1da13caff82cfa8950413309e08da13 Mon Sep 17 00:00:00 2001
From: Jeff King <peff@peff.net>
Date: Tue, 7 Jan 2025 02:05:01 -0500
Subject: [PATCH 06/28] test-lib: invert return value of
 check_test_results_san_file_empty

We have a function to check whether LSan logged any leaks. It returns
success for no leaks, and non-zero otherwise. This is the simplest thing
for its callers, who want to say "if no leaks then return early". But
because it's implemented as a shell pipeline, you end up with the
awkward:

  ! find ... |
  xargs grep leaks |
  grep -v false-positives

where the "!" is actually negating the final grep. Switch the return
value (and name) to return success when there are leaks. This should
make the code a little easier to read, and the negation in the callers
still reads pretty naturally.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 t/test-lib-functions.sh | 2 +-
 t/test-lib.sh           | 8 ++++----
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 78e054ab503a65..c25cee0ad8651a 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -927,7 +927,7 @@ test_expect_success () {
 		test -n "$test_skip_test_preamble" ||
 		say >&3 "expecting success of $TEST_NUMBER.$test_count '$1': $test_body"
 		if test_run_ "$test_body" &&
-		   check_test_results_san_file_empty_
+		   ! check_test_results_san_file_has_entries_
 		then
 			test_ok_ "$1"
 		else
diff --git a/t/test-lib.sh b/t/test-lib.sh
index d1f62adbf82931..be3553e40e9052 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1169,12 +1169,12 @@ test_atexit_handler () {
 	teardown_malloc_check
 }
 
-check_test_results_san_file_empty_ () {
-	test -z "$TEST_RESULTS_SAN_FILE" && return 0
+check_test_results_san_file_has_entries_ () {
+	test -z "$TEST_RESULTS_SAN_FILE" && return 1
 
 	# stderr piped to /dev/null because the directory may have
 	# been "rmdir"'d already.
-	! find "$TEST_RESULTS_SAN_DIR" \
+	find "$TEST_RESULTS_SAN_DIR" \
 		-type f \
 		-name "$TEST_RESULTS_SAN_FILE_PFX.*" 2>/dev/null |
 	xargs grep ^DEDUP_TOKEN |
@@ -1182,7 +1182,7 @@ check_test_results_san_file_empty_ () {
 }
 
 check_test_results_san_file_ () {
-	if check_test_results_san_file_empty_
+	if ! check_test_results_san_file_has_entries_
 	then
 		return
 	fi &&

From b9a9df93a3f5580c7f7b8cc099aad1c204ced8a4 Mon Sep 17 00:00:00 2001
From: Jeff King <peff@peff.net>
Date: Tue, 7 Jan 2025 02:07:52 -0500
Subject: [PATCH 07/28] test-lib: simplify lsan results check

We want to know if there are any leaks logged by LSan in the results
directory, so we run "find" on the containing directory and pipe it to
xargs. We can accomplish the same thing by just globbing in the shell
and passing the result to grep, which has a few advantages:

  - it's one fewer process to run

  - we can glob on the TEST_RESULTS_SAN_FILE pattern, which is what we
    checked at the beginning of the function, and is the same glob used
    to show the logs in check_test_results_san_file_

  - this correctly handles the case where TEST_OUTPUT_DIRECTORY has a
    space in it. For example doing:

       mkdir "/tmp/foo bar"
       TEST_OUTPUT_DIRECTORY="/tmp/foo bar" make SANITIZE=leak test

    would yield a lot of:

      grep: /tmp/foo: No such file or directory
      grep: bar/test-results/t0006-date.leak/trace.test-tool.582311: No such file or directory

    when there are leaks. We could do the same thing with "xargs
    --null", but that isn't portable.

We are now subject to command-line length limits, but that is also true
of the globbing cat used to show the logs themselves. This hasn't been a
problem in practice.

We do need to use "grep -s" for the case that the glob does not expand
(i.e., there are not any log files at all). This option is in POSIX, and
has been used in t7407 for several years without anybody complaining.
This also also naturally handles the case where the surrounding
directory has already been removed (in which case there are likewise no
files!), dropping the need to comment about it.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 t/test-lib.sh | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/t/test-lib.sh b/t/test-lib.sh
index be3553e40e9052..898c2267b8a400 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1172,12 +1172,7 @@ test_atexit_handler () {
 check_test_results_san_file_has_entries_ () {
 	test -z "$TEST_RESULTS_SAN_FILE" && return 1
 
-	# stderr piped to /dev/null because the directory may have
-	# been "rmdir"'d already.
-	find "$TEST_RESULTS_SAN_DIR" \
-		-type f \
-		-name "$TEST_RESULTS_SAN_FILE_PFX.*" 2>/dev/null |
-	xargs grep ^DEDUP_TOKEN |
+	grep -s ^DEDUP_TOKEN "$TEST_RESULTS_SAN_FILE".* |
 	grep -qv sanitizer::GetThreadStackTopAndBottom
 }
 

From 164a2516eb622fdf032ce526ec97e79a53bf2893 Mon Sep 17 00:00:00 2001
From: Jeff King <peff@peff.net>
Date: Tue, 7 Jan 2025 02:08:31 -0500
Subject: [PATCH 08/28] test-lib: add a few comments to LSan log checking

Commit b119a687d4 (test-lib: ignore leaks in the sanitizer's thread
code, 2025-01-01) added code to suppress a false positive in the leak
checker. But if you're just reading the code, the obscure grep call is a
bit of a head-scratcher. Let's add a brief comment explaining what's
going on (and anybody digging further can find this commit or that one
for all the details).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 t/test-lib.sh | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/t/test-lib.sh b/t/test-lib.sh
index 898c2267b8a400..9f27a49995160e 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1172,6 +1172,11 @@ test_atexit_handler () {
 check_test_results_san_file_has_entries_ () {
 	test -z "$TEST_RESULTS_SAN_FILE" && return 1
 
+	# Lines marked with DEDUP_TOKEN show unique leaks. We only care that we
+	# found at least one.
+	#
+	# But also suppress any false positives caused by bugs or races in the
+	# sanitizer itself.
 	grep -s ^DEDUP_TOKEN "$TEST_RESULTS_SAN_FILE".* |
 	grep -qv sanitizer::GetThreadStackTopAndBottom
 }

From 0b432748507a12b92677653104b18834d83cfb10 Mon Sep 17 00:00:00 2001
From: M Hickford <mirth.hickford@gmail.com>
Date: Thu, 9 Jan 2025 22:45:20 +0000
Subject: [PATCH 09/28] credential-cache: respect authtype capability

Previously, credential-cache populated authtype regardless whether
"get" request had authtype capability. As documented in
git-credential.txt, authtype "should not be sent unless the appropriate
capability ... is provided".

Add test. Without this change, the test failed because "credential fill"
printed an incomplete credential with only protocol and host attributes
(the unexpected authtype attribute was discarded by credential.c).

Signed-off-by: M Hickford <mirth.hickford@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/credential-cache--daemon.c |  4 ++--
 t/lib-credential.sh                | 15 +++++++++++++++
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/builtin/credential-cache--daemon.c b/builtin/credential-cache--daemon.c
index 4952b225477227..9cdc3077ecfa02 100644
--- a/builtin/credential-cache--daemon.c
+++ b/builtin/credential-cache--daemon.c
@@ -141,9 +141,9 @@ static void serve_one_client(FILE *in, FILE *out)
 				fprintf(out, "username=%s\n", e->item.username);
 			if (e->item.password)
 				fprintf(out, "password=%s\n", e->item.password);
-			if (credential_has_capability(&c.capa_authtype, CREDENTIAL_OP_HELPER) && e->item.authtype)
+			if (credential_has_capability(&c.capa_authtype, CREDENTIAL_OP_RESPONSE) && e->item.authtype)
 				fprintf(out, "authtype=%s\n", e->item.authtype);
-			if (credential_has_capability(&c.capa_authtype, CREDENTIAL_OP_HELPER) && e->item.credential)
+			if (credential_has_capability(&c.capa_authtype, CREDENTIAL_OP_RESPONSE) && e->item.credential)
 				fprintf(out, "credential=%s\n", e->item.credential);
 			if (e->item.password_expiry_utc != TIME_MAX)
 				fprintf(out, "password_expiry_utc=%"PRItime"\n",
diff --git a/t/lib-credential.sh b/t/lib-credential.sh
index 58b9c740605890..cc6bf9aa5f3717 100644
--- a/t/lib-credential.sh
+++ b/t/lib-credential.sh
@@ -566,6 +566,21 @@ helper_test_authtype() {
 		EOF
 	'
 
+	test_expect_success "helper ($HELPER) gets authtype and credential only if request has authtype capability" '
+		check fill $HELPER <<-\EOF
+		protocol=https
+		host=git.example.com
+		--
+		protocol=https
+		host=git.example.com
+		username=askpass-username
+		password=askpass-password
+		--
+		askpass: Username for '\''https://git.example.com'\'':
+		askpass: Password for '\''https://askpass-username@git.example.com'\'':
+		EOF
+	'
+
 	test_expect_success "helper ($HELPER) stores authtype and credential with username" '
 		check approve $HELPER <<-\EOF
 		capability[]=authtype

From 447cdec2e9b2e45b14ea26f7fcaf054f7ded12bc Mon Sep 17 00:00:00 2001
From: Bence Ferdinandy <bence@ferdinandy.com>
Date: Sun, 12 Jan 2025 17:51:22 +0100
Subject: [PATCH 10/28] fetch set_head: fix non-mirror remotes in bare
 repositories

In b1b713f722 (fetch set_head: handle mirrored bare repositories,
2024-11-22) it was implicitly assumed that all remotes will be mirrors
in a bare repository, thus fetching a non-mirrored remote could lead to
HEAD pointing to a non-existent reference. Make sure we only overwrite
HEAD if we are in a bare repository and fetching from a mirror.
Otherwise, proceed as normally, and create
refs/remotes/<nonmirrorremote>/HEAD instead.

Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com>
Reported-by: Christian Hesse <list@eworm.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/fetch.c   | 15 ++++++++-------
 t/t5505-remote.sh | 10 ++++++++++
 t/t5510-fetch.sh  | 13 +++++++++++++
 3 files changed, 31 insertions(+), 7 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index fe2b26c74aecab..625d45be8ba565 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1618,9 +1618,9 @@ static void report_set_head(const char *remote, const char *head_name,
 }
 
 static int set_head(const struct ref *remote_refs, int follow_remote_head,
-		const char *no_warn_branch)
+		const char *no_warn_branch, int mirror)
 {
-	int result = 0, create_only, is_bare, was_detached;
+	int result = 0, create_only, baremirror, was_detached;
 	struct strbuf b_head = STRBUF_INIT, b_remote_head = STRBUF_INIT,
 		      b_local_head = STRBUF_INIT;
 	const char *remote = gtransport->remote->name;
@@ -1655,9 +1655,9 @@ static int set_head(const struct ref *remote_refs, int follow_remote_head,
 
 	if (!head_name)
 		goto cleanup;
-	is_bare = is_bare_repository();
-	create_only = follow_remote_head == FOLLOW_REMOTE_ALWAYS ? 0 : !is_bare;
-	if (is_bare) {
+	baremirror = is_bare_repository() && mirror;
+	create_only = follow_remote_head == FOLLOW_REMOTE_ALWAYS ? 0 : !baremirror;
+	if (baremirror) {
 		strbuf_addstr(&b_head, "HEAD");
 		strbuf_addf(&b_remote_head, "refs/heads/%s", head_name);
 	} else {
@@ -1665,7 +1665,7 @@ static int set_head(const struct ref *remote_refs, int follow_remote_head,
 		strbuf_addf(&b_remote_head, "refs/remotes/%s/%s", remote, head_name);
 	}
 		/* make sure it's valid */
-	if (!is_bare && !refs_ref_exists(refs, b_remote_head.buf)) {
+	if (!baremirror && !refs_ref_exists(refs, b_remote_head.buf)) {
 		result = 1;
 		goto cleanup;
 	}
@@ -1925,7 +1925,8 @@ static int do_fetch(struct transport *transport,
 		}
 	}
 	if (set_head(remote_refs, transport->remote->follow_remote_head,
-		transport->remote->no_warn_branch))
+		transport->remote->no_warn_branch,
+		transport->remote->mirror))
 		;
 		/*
 		 * Way too many cases where this can go wrong
diff --git a/t/t5505-remote.sh b/t/t5505-remote.sh
index 519f7973e31287..c75cfe968f30e1 100755
--- a/t/t5505-remote.sh
+++ b/t/t5505-remote.sh
@@ -589,6 +589,16 @@ test_expect_success 'add --mirror setting HEAD' '
 	)
 '
 
+test_expect_success 'non-mirror fetch does not interfere with mirror' '
+	mkdir headnotmain &&
+	(
+		cd headnotmain &&
+		git init --bare -b notmain &&
+		git remote add -f other ../two &&
+		test "$(git symbolic-ref HEAD)" = "refs/heads/notmain"
+	)
+'
+
 test_expect_success 'add --mirror=fetch' '
 	mkdir mirror-fetch &&
 	git init -b main mirror-fetch/parent &&
diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh
index 2d9587059f5083..cfa63ae0867241 100755
--- a/t/t5510-fetch.sh
+++ b/t/t5510-fetch.sh
@@ -84,6 +84,19 @@ test_expect_success "fetch test remote HEAD" '
 	branch=$(git rev-parse refs/remotes/origin/main) &&
 	test "z$head" = "z$branch"'
 
+test_expect_success "fetch test remote HEAD in bare repository" '
+	cd "$D" &&
+	git init --bare barerepo &&
+	cd barerepo &&
+	git remote add upstream ../two &&
+	git fetch upstream &&
+	git rev-parse --verify refs/remotes/upstream/HEAD &&
+	git rev-parse --verify refs/remotes/upstream/main &&
+	head=$(git rev-parse refs/remotes/upstream/HEAD) &&
+	branch=$(git rev-parse refs/remotes/upstream/main) &&
+	test "z$head" = "z$branch"'
+
+
 test_expect_success "fetch test remote HEAD change" '
 	cd "$D" &&
 	cd two &&

From 71e19a003197960cec38d30e71b49d182bcf8510 Mon Sep 17 00:00:00 2001
From: Elijah Newren <newren@gmail.com>
Date: Mon, 13 Jan 2025 17:13:36 +0000
Subject: [PATCH 11/28] object-name: fix resolution of object names containing
 curly braces

Given a branch name of 'foo{bar', commands like

    git cat-file -p foo{bar:README.md

should succeed (assuming that branch had a README.md file, of course).
However, the change in cce91a2caef9 (Change 'master@noon' syntax to
'master@{noon}'., 2006-05-19) presumed that curly braces would always
come after an '@' or '^' and be paired, causing e.g. 'foo{bar:README.md'
to entirely miss the ':' and assume there's no object being referenced.
In short, git would report:

    fatal: Not a valid object name foo{bar:README.md

Change the parsing to only make the assumption of paired curly braces
immediately after either a '@' or '^' character appears.

Add tests for this, as well as for a few other test cases that initial
versions of this patch broke:
  * 'foo@@{...}'
  * 'foo^{/${SEARCH_TEXT_WITH_COLON}}:${PATH}'

Note that we'd prefer not duplicating the special logic for "@^" characters
here, because if get_oid_basic() or interpret_nth_prior_checkout() or
get_oid_basic() or similar gain extra methods of using curly braces,
then the logic in get_oid_with_context_1() would need to be updated as
well.  But it's not clear how to refactor all of these to have a simple
common callpoint with the specialized logic.

Reported-by: Gabriel Amaral <gabriel-amaral@github.com>
Helped-by: Michael Haggerty <mhagger@github.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 object-name.c       |  8 +++++---
 t/t1006-cat-file.sh | 31 ++++++++++++++++++++++++++++++-
 2 files changed, 35 insertions(+), 4 deletions(-)

diff --git a/object-name.c b/object-name.c
index 88d1313028cf0b..f43fdba24017ea 100644
--- a/object-name.c
+++ b/object-name.c
@@ -2052,12 +2052,14 @@ static enum get_oid_result get_oid_with_context_1(struct repository *repo,
 		return -1;
 	}
 	for (cp = name, bracket_depth = 0; *cp; cp++) {
-		if (*cp == '{')
+		if (strchr("@^", *cp) && cp[1] == '{') {
+			cp++;
 			bracket_depth++;
-		else if (bracket_depth && *cp == '}')
+		} else if (bracket_depth && *cp == '}') {
 			bracket_depth--;
-		else if (!bracket_depth && *cp == ':')
+		} else if (!bracket_depth && *cp == ':') {
 			break;
+		}
 	}
 	if (*cp == ':') {
 		struct object_id tree_oid;
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index ff9bf213aa2c73..398865d6ebe9c6 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -240,7 +240,8 @@ test_expect_success "setup" '
 	git config extensions.objectformat $test_hash_algo &&
 	git config extensions.compatobjectformat $test_compat_hash_algo &&
 	echo_without_newline "$hello_content" > hello &&
-	git update-index --add hello
+	git update-index --add hello &&
+	git commit -m "add hello file"
 '
 
 run_blob_tests () {
@@ -602,6 +603,34 @@ test_expect_success FUNNYNAMES '--batch-check, -Z with newline in input' '
 	test_cmp expect actual
 '
 
+test_expect_success 'setup with curly braches in input' '
+	git branch "foo{bar" HEAD &&
+	git branch "foo@" HEAD
+'
+
+test_expect_success 'object reference with curly brace' '
+	git cat-file -p "foo{bar:hello" >actual &&
+	git cat-file -p HEAD:hello >expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'object reference with at-sign' '
+	git cat-file -p "foo@@{0}:hello" >actual &&
+	git cat-file -p HEAD:hello >expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'setup with commit with colon' '
+	git commit-tree -m "testing: just a bunch of junk" HEAD^{tree} >out &&
+	git branch other $(cat out)
+'
+
+test_expect_success 'object reference via commit text search' '
+	git cat-file -p "other^{/testing:}:hello" >actual &&
+	git cat-file -p HEAD:hello >expect &&
+	test_cmp expect actual
+'
+
 test_expect_success 'setup blobs which are likely to delta' '
 	test-tool genrandom foo 10240 >foo &&
 	{ cat foo && echo plus; } >foo-plus &&

From 191f0c8db22267cab55472961524c70a1d692025 Mon Sep 17 00:00:00 2001
From: Elijah Newren <newren@gmail.com>
Date: Mon, 13 Jan 2025 17:13:37 +0000
Subject: [PATCH 12/28] object-name: be more strict in parsing describe-like
 output

From Documentation/revisions.txt:
    '<describeOutput>', e.g. 'v1.7.4.2-679-g3bee7fb'::
      Output from `git describe`; i.e. a closest tag, optionally
      followed by a dash and a number of commits, followed by a dash, a
      'g', and an abbreviated object name.
which means that output of the format
    ${REFNAME}-${INTEGER}-g${HASH}
should parse to fully expanded ${HASH}.  This is fine.  However, we
currently don't validate any of ${REFNAME}-${INTEGER}, we only parse
-g${HASH} and assume the rest is valid.  That is problematic, since it
breaks things like

    git cat-file -p branchname:path/to/file/named/i-gaffed

which, when commit (or tree or blob) affed exists, will not return us
information about the file we are looking for but will instead
erroneously tell us about object affed.

A few additional notes:
  - This is a slight backward incompatibility break, because we used
    to allow ${GARBAGE}-g${HASH} as a way to spell ${HASH}.  However,
    a backward incompatible break is necessary, because there is no
    other way for someone to be more specific and disambiguate that they
    want the blob master:path/to/who-gabbed instead of the object abbed.
  - There is a possibility that check_refname_format() rules change in
    the future.  However, we can only realistically loosen the rules
    for what that function accepts rather than tighten.  If we were to
    tighten the rules, some real world repositories may already have
    refnames that suddenly become unacceptable and we break those
    repositories.  As such, any describe-like syntax of the form
    ${VALID_FOR_A_REFNAME}-${INTEGER}-g${HASH} that is valid with the
    changes in this commit will remain valid in the future.
  - The fact that check_refname_format() rules could loosen in the
    future is probably also an important reason to make this change.  If
    the rules loosen, there might be additional cases within
    ${GARBAGE}-g${HASH} that become ambiguous in the future.  While
    abbreviated hashes can be disambiguated by abbreviating less, it may
    well be that these alternative object names have no way of being
    disambiguated (much like pathnames cannot be).  Accepting all random
    ${GARBAGE} thus makes it difficult for us to allow future
    extensions to object naming.

So, tighten up the parsing to make sure ${REFNAME} and ${INTEGER} are
present in the string, and would be considered a valid ref and
non-negative integer.

Also, add a few tests for git describe using object names of the form
    ${REVISION_NAME}${MODIFIERS}
since an early version of this patch failed on constructs like
    git describe v2.48.0-rc2-161-g6c2274cdbc^0

Reported-by: Gabriel Amaral <gabriel-amaral@github.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 object-name.c       | 55 ++++++++++++++++++++++++++++++++++++++++++++-
 t/t6120-describe.sh | 24 ++++++++++++++++++++
 2 files changed, 78 insertions(+), 1 deletion(-)

diff --git a/object-name.c b/object-name.c
index f43fdba24017ea..945d5bdef25217 100644
--- a/object-name.c
+++ b/object-name.c
@@ -1272,6 +1272,58 @@ static int peel_onion(struct repository *r, const char *name, int len,
 	return 0;
 }
 
+/*
+ * Documentation/revisions.txt says:
+ *    '<describeOutput>', e.g. 'v1.7.4.2-679-g3bee7fb'::
+ *      Output from `git describe`; i.e. a closest tag, optionally
+ *      followed by a dash and a number of commits, followed by a dash, a
+ *      'g', and an abbreviated object name.
+ *
+ * which means that the stuff before '-g${HASH}' needs to be a valid
+ * refname, a dash, and a non-negative integer.  This function verifies
+ * that.
+ *
+ * In particular, we do not want to treat
+ *   branchname:path/to/file/named/i-gaffed
+ * as a request for commit affed.
+ *
+ * More generally, we should probably not treat
+ *   'refs/heads/./../.../ ~^:/?*[////\\\&}/busted.lock-g050e0ef6ead'
+ * as a request for object 050e0ef6ead either.
+ *
+ * We are called with name[len] == '-' and name[len+1] == 'g', i.e.
+ * we are verifying ${REFNAME}-{INTEGER} part of the name.
+ */
+static int ref_and_count_parts_valid(const char *name, int len)
+{
+	struct strbuf sb;
+	const char *cp;
+	int flags = REFNAME_ALLOW_ONELEVEL;
+	int ret = 1;
+
+	/* Ensure we have at least one digit */
+	if (!isxdigit(name[len-1]))
+		return 0;
+
+	/* Skip over digits backwards until we get to the dash */
+	for (cp = name + len - 2; name < cp; cp--) {
+		if (*cp == '-')
+			break;
+		if (!isxdigit(*cp))
+			return 0;
+	}
+	/* Ensure we found the leading dash */
+	if (*cp != '-')
+		return 0;
+
+	len = cp - name;
+	strbuf_init(&sb, len);
+	strbuf_add(&sb, name, len);
+	ret = !check_refname_format(sb.buf, flags);
+	strbuf_release(&sb);
+	return ret;
+}
+
 static int get_describe_name(struct repository *r,
 			     const char *name, int len,
 			     struct object_id *oid)
@@ -1285,7 +1337,8 @@ static int get_describe_name(struct repository *r,
 			/* We must be looking at g in "SOMETHING-g"
 			 * for it to be describe output.
 			 */
-			if (ch == 'g' && cp[-1] == '-') {
+			if (ch == 'g' && cp[-1] == '-' &&
+			    ref_and_count_parts_valid(name, cp - 1 - name)) {
 				cp++;
 				len -= cp - name;
 				return get_short_oid(r,
diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
index 3f6160d702bc20..76843a61691cb5 100755
--- a/t/t6120-describe.sh
+++ b/t/t6120-describe.sh
@@ -82,11 +82,13 @@ check_describe R-2-gHASH HEAD^^
 check_describe A-3-gHASH HEAD^^2
 check_describe B HEAD^^2^
 check_describe R-1-gHASH HEAD^^^
+check_describe R-1-gHASH R-1-g$(git rev-parse --short HEAD^^)~1
 
 check_describe c-7-gHASH --tags HEAD
 check_describe c-6-gHASH --tags HEAD^
 check_describe e-1-gHASH --tags HEAD^^
 check_describe c-2-gHASH --tags HEAD^^2
+check_describe c-2-gHASH --tags c-2-g$(git rev-parse --short HEAD^^2)^0
 check_describe B --tags HEAD^^2^
 check_describe e --tags HEAD^^^
 check_describe e --tags --exact-match HEAD^^^
@@ -725,4 +727,26 @@ test_expect_success '--exact-match does not show --always fallback' '
 	test_must_fail git describe --exact-match --always
 '
 
+test_expect_success 'avoid being fooled by describe-like filename' '
+	test_when_finished rm out &&
+
+	git rev-parse --short HEAD >out &&
+	FILENAME=filename-g$(cat out) &&
+	touch $FILENAME &&
+	git add $FILENAME &&
+	git commit -m "Add $FILENAME" &&
+
+	git cat-file -t HEAD:$FILENAME >actual &&
+
+	echo blob >expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'do not be fooled by invalid describe format ' '
+	test_when_finished rm out &&
+
+	git rev-parse --short HEAD >out &&
+	test_must_fail git cat-file -t "refs/tags/super-invalid/./../...../ ~^:/?*[////\\\\\\&}/busted.lock-42-g"$(cat out)
+'
+
 test_done

From bc67b4ab5f8bc268ecd2d9bb7dc1b7bf26884a8e Mon Sep 17 00:00:00 2001
From: Karthik Nayak <karthik.188@gmail.com>
Date: Wed, 15 Jan 2025 11:54:51 +0000
Subject: [PATCH 13/28] reftable: write correct max_update_index to header

In 297c09eabb (refs: allow multiple reflog entries for the same refname,
2024-12-16), the reftable backend learned to handle multiple reflog
entries within the same transaction. This was done modifying the
`update_index` for reflogs with multiple indices. During writing the
logs, the `max_update_index` of the writer was modified to ensure the
limits were raised to the modified `update_index`s.

However, since ref entries are written before the modification to the
`max_update_index`, if there are multiple blocks to be written, the
reftable backend writes the header with the old `max_update_index`. When
all logs are finally written, the footer will be written with the new
`min_update_index`. This causes a mismatch between the header and the
footer and causes the reftable file to be corrupted. The existing tests
only spawn a single block and since headers are lazily written with the
first block, the tests didn't capture this bug.

To fix the issue, the appropriate `max_update_index` limit must be set
even before the first block is written. Add a `max_index` field to the
transaction which holds the `max_index` within all its updates, then
propagate this value to the reftable backend, wherein this is used to
the set the `max_update_index` correctly.

Add a test which creates a few thousand reference updates with multiple
reflog entries, which should trigger the bug.

Reported-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 refs.c                  |  7 +++++++
 refs/refs-internal.h    |  1 +
 refs/reftable-backend.c | 20 ++++++++++----------
 t/t1460-refs-migrate.sh | 12 ++++++++++++
 4 files changed, 30 insertions(+), 10 deletions(-)

diff --git a/refs.c b/refs.c
index c55583986940d8..a5851a7de00878 100644
--- a/refs.c
+++ b/refs.c
@@ -1297,6 +1297,13 @@ int ref_transaction_update_reflog(struct ref_transaction *transaction,
 	update->flags &= ~REF_HAVE_OLD;
 	update->index = index;
 
+	/*
+	 * Reference backends may need to know the max index to optimize
+	 * their writes. So we store the max_index on the transaction level.
+	 */
+	if (index > transaction->max_index)
+		transaction->max_index = index;
+
 	return 0;
 }
 
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 79b287c5ec5c7d..2aaff91ab4826a 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -203,6 +203,7 @@ struct ref_transaction {
 	enum ref_transaction_state state;
 	void *backend_data;
 	unsigned int flags;
+	unsigned int max_index;
 };
 
 /*
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index bec5962debea7b..68db2baa8f15c4 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -852,6 +852,7 @@ struct write_transaction_table_arg {
 	size_t updates_nr;
 	size_t updates_alloc;
 	size_t updates_expected;
+	unsigned int max_index;
 };
 
 struct reftable_transaction_data {
@@ -1302,7 +1303,6 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
 	struct reftable_log_record *logs = NULL;
 	struct ident_split committer_ident = {0};
 	size_t logs_nr = 0, logs_alloc = 0, i;
-	uint64_t max_update_index = ts;
 	const char *committer_info;
 	int ret = 0;
 
@@ -1312,7 +1312,12 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
 
 	QSORT(arg->updates, arg->updates_nr, transaction_update_cmp);
 
-	reftable_writer_set_limits(writer, ts, ts);
+	/*
+	 * During reflog migration, we add indexes for a single reflog with
+	 * multiple entries. Each entry will contain a different update_index,
+	 * so set the limits accordingly.
+	 */
+	reftable_writer_set_limits(writer, ts, ts + arg->max_index);
 
 	for (i = 0; i < arg->updates_nr; i++) {
 		struct reftable_transaction_update *tx_update = &arg->updates[i];
@@ -1414,12 +1419,6 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
 				 */
 				log->update_index = ts + u->index;
 
-				/*
-				 * Note the max update_index so the limit can be set later on.
-				 */
-				if (log->update_index > max_update_index)
-					max_update_index = log->update_index;
-
 				log->refname = xstrdup(u->refname);
 				memcpy(log->value.update.new_hash,
 				       u->new_oid.hash, GIT_MAX_RAWSZ);
@@ -1483,8 +1482,6 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
 	 * and log blocks.
 	 */
 	if (logs) {
-		reftable_writer_set_limits(writer, ts, max_update_index);
-
 		ret = reftable_writer_add_logs(writer, logs, logs_nr);
 		if (ret < 0)
 			goto done;
@@ -1505,6 +1502,9 @@ static int reftable_be_transaction_finish(struct ref_store *ref_store UNUSED,
 	struct reftable_transaction_data *tx_data = transaction->backend_data;
 	int ret = 0;
 
+	if (tx_data->args)
+		tx_data->args->max_index = transaction->max_index;
+
 	for (size_t i = 0; i < tx_data->args_nr; i++) {
 		ret = reftable_addition_add(tx_data->args[i].addition,
 					    write_transaction_table, &tx_data->args[i]);
diff --git a/t/t1460-refs-migrate.sh b/t/t1460-refs-migrate.sh
index f59bc4860f19c4..307b2998efe6ac 100755
--- a/t/t1460-refs-migrate.sh
+++ b/t/t1460-refs-migrate.sh
@@ -227,6 +227,18 @@ do
 	done
 done
 
+test_expect_success 'multiple reftable blocks with multiple entries' '
+	test_when_finished "rm -rf repo" &&
+	git init --ref-format=files repo &&
+	test_commit -C repo first &&
+	printf "create refs/heads/ref-%d HEAD\n" $(test_seq 5000) >stdin &&
+	git -C repo update-ref --stdin <stdin &&
+	test_commit -C repo second &&
+	printf "update refs/heads/ref-%d HEAD\n" $(test_seq 3000) >stdin &&
+	git -C repo update-ref --stdin <stdin &&
+	test_migration repo reftable
+'
+
 test_expect_success 'migrating from files format deletes backend files' '
 	test_when_finished "rm -rf repo" &&
 	git init --ref-format=files repo &&

From 5e58db65750a20ae6d0c620f8a2010e038d408f5 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ren=C3=A9=20Scharfe?= <l.s.r@web.de>
Date: Sat, 18 Jan 2025 18:11:34 +0100
Subject: [PATCH 14/28] ref-filter: move ahead-behind bases into used_atom
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

verify_ref_format() parses a ref-filter format string and stores
recognized items in the static array "used_atom".  For
"ahead-behind:<committish>" it stores the committish part in a
string_list member "bases" of struct ref_format.

ref_sorting_options() also parses bare ref-filter format items and
stores stores recognized ones in "used_atom" as well.  The committish
parts go to a dummy struct ref_format in parse_sorting_atom(), though,
and are leaked and forgotten.

If verify_ref_format() is called before ref_sorting_options(), like in
git for-each-ref, then all works well if the sort key is included in the
format string.  If it isn't then sorting cannot work as the committishes
are missing.

If ref_sorting_options() is called first, like in git branch, then we
have the additional issue that if the sort key is included in the format
string then filter_ahead_behind() can't see its committish, will not
generate any results for it and thus it will be expanded to an empty
string.

Fix those issues by replacing the string_list with a field in used_atom
for storing the committish.  This way it can be shared for handling both
ref-filter format strings and sorting options in the same command.

Reported-by: Ross Goldberg <ross.goldberg@gmail.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/branch.c         |  2 +-
 ref-filter.c             | 50 ++++++++++++++++++++++++----------------
 ref-filter.h             |  5 ----
 t/t3203-branch-output.sh | 28 ++++++++++++++++++++++
 4 files changed, 59 insertions(+), 26 deletions(-)

diff --git a/builtin/branch.c b/builtin/branch.c
index 6e7b0cfddbd607..fbb9536282dcaa 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -473,7 +473,7 @@ static void print_ref_list(struct ref_filter *filter, struct ref_sorting *sortin
 	if (verify_ref_format(format))
 		die(_("unable to parse format string"));
 
-	filter_ahead_behind(the_repository, format, &array);
+	filter_ahead_behind(the_repository, &array);
 	ref_array_sort(sorting, &array);
 
 	if (column_active(colopts)) {
diff --git a/ref-filter.c b/ref-filter.c
index 23054694c2c960..250e412c47181a 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -235,6 +235,9 @@ static struct used_atom {
 			enum { S_BARE, S_GRADE, S_SIGNER, S_KEY,
 			       S_FINGERPRINT, S_PRI_KEY_FP, S_TRUST_LEVEL } option;
 		} signature;
+		struct {
+			struct commit *commit;
+		} base;
 		struct strvec describe_args;
 		struct refname_atom refname;
 		char *head;
@@ -891,18 +894,15 @@ static int rest_atom_parser(struct ref_format *format UNUSED,
 	return 0;
 }
 
-static int ahead_behind_atom_parser(struct ref_format *format,
-				    struct used_atom *atom UNUSED,
+static int ahead_behind_atom_parser(struct ref_format *format UNUSED,
+				    struct used_atom *atom,
 				    const char *arg, struct strbuf *err)
 {
-	struct string_list_item *item;
-
 	if (!arg)
 		return strbuf_addf_ret(err, -1, _("expected format: %%(ahead-behind:<committish>)"));
 
-	item = string_list_append(&format->bases, arg);
-	item->util = lookup_commit_reference_by_name(arg);
-	if (!item->util)
+	atom->u.base.commit = lookup_commit_reference_by_name(arg);
+	if (!atom->u.base.commit)
 		die("failed to find '%s'", arg);
 
 	return 0;
@@ -3084,22 +3084,30 @@ static void reach_filter(struct ref_array *array,
 }
 
 void filter_ahead_behind(struct repository *r,
-			 struct ref_format *format,
 			 struct ref_array *array)
 {
 	struct commit **commits;
-	size_t commits_nr = format->bases.nr + array->nr;
+	size_t bases_nr, commits_nr;
 
-	if (!format->bases.nr || !array->nr)
+	if (!array->nr)
 		return;
 
-	ALLOC_ARRAY(commits, commits_nr);
-	for (size_t i = 0; i < format->bases.nr; i++)
-		commits[i] = format->bases.items[i].util;
+	for (size_t i = bases_nr = 0; i < used_atom_cnt; i++) {
+		if (used_atom[i].atom_type == ATOM_AHEADBEHIND)
+			bases_nr++;
+	}
+	if (!bases_nr)
+		return;
 
-	ALLOC_ARRAY(array->counts, st_mult(format->bases.nr, array->nr));
+	ALLOC_ARRAY(commits, st_add(bases_nr, array->nr));
+	for (size_t i = 0, j = 0; i < used_atom_cnt; i++) {
+		if (used_atom[i].atom_type == ATOM_AHEADBEHIND)
+			commits[j++] = used_atom[i].u.base.commit;
+	}
 
-	commits_nr = format->bases.nr;
+	ALLOC_ARRAY(array->counts, st_mult(bases_nr, array->nr));
+
+	commits_nr = bases_nr;
 	array->counts_nr = 0;
 	for (size_t i = 0; i < array->nr; i++) {
 		const char *name = array->items[i]->refname;
@@ -3108,8 +3116,8 @@ void filter_ahead_behind(struct repository *r,
 		if (!commits[commits_nr])
 			continue;
 
-		CALLOC_ARRAY(array->items[i]->counts, format->bases.nr);
-		for (size_t j = 0; j < format->bases.nr; j++) {
+		CALLOC_ARRAY(array->items[i]->counts, bases_nr);
+		for (size_t j = 0; j < bases_nr; j++) {
 			struct ahead_behind_count *count;
 			count = &array->counts[array->counts_nr++];
 			count->tip_index = commits_nr;
@@ -3277,9 +3285,12 @@ static inline int can_do_iterative_format(struct ref_filter *filter,
 	 * - filtering on reachability
 	 * - including ahead-behind information in the formatted output
 	 */
+	for (size_t i = 0; i < used_atom_cnt; i++) {
+		if (used_atom[i].atom_type == ATOM_AHEADBEHIND)
+			return 0;
+	}
 	return !(filter->reachable_from ||
 		 filter->unreachable_from ||
-		 format->bases.nr ||
 		 format->is_base_tips.nr);
 }
 
@@ -3303,7 +3314,7 @@ void filter_and_format_refs(struct ref_filter *filter, unsigned int type,
 	} else {
 		struct ref_array array = { 0 };
 		filter_refs(&array, filter, type);
-		filter_ahead_behind(the_repository, format, &array);
+		filter_ahead_behind(the_repository, &array);
 		filter_is_base(the_repository, format, &array);
 		ref_array_sort(sorting, &array);
 		print_formatted_ref_array(&array, format);
@@ -3647,7 +3658,6 @@ void ref_format_init(struct ref_format *format)
 
 void ref_format_clear(struct ref_format *format)
 {
-	string_list_clear(&format->bases, 0);
 	string_list_clear(&format->is_base_tips, 0);
 	ref_format_init(format);
 }
diff --git a/ref-filter.h b/ref-filter.h
index 754038ab078669..5f3dd6c9318915 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -99,9 +99,6 @@ struct ref_format {
 	/* Internal state to ref-filter */
 	int need_color_reset_at_eol;
 
-	/* List of bases for ahead-behind counts. */
-	struct string_list bases;
-
 	/* List of bases for is-base indicators. */
 	struct string_list is_base_tips;
 
@@ -117,7 +114,6 @@ struct ref_format {
 }
 #define REF_FORMAT_INIT {             \
 	.use_color = -1,              \
-	.bases = STRING_LIST_INIT_DUP, \
 	.is_base_tips = STRING_LIST_INIT_DUP, \
 }
 
@@ -205,7 +201,6 @@ struct ref_array_item *ref_array_push(struct ref_array *array,
  * If this is not called, then any ahead-behind atoms will be blank.
  */
 void filter_ahead_behind(struct repository *r,
-			 struct ref_format *format,
 			 struct ref_array *array);
 
 /*
diff --git a/t/t3203-branch-output.sh b/t/t3203-branch-output.sh
index 500c9d0e727a29..a6bd88a58d0a8c 100755
--- a/t/t3203-branch-output.sh
+++ b/t/t3203-branch-output.sh
@@ -368,6 +368,34 @@ test_expect_success 'git branch --format with ahead-behind' '
 	test_cmp expect actual
 '
 
+test_expect_success 'git branch `--sort=[-]ahead-behind` option' '
+	cat >expect <<-\EOF &&
+	(HEAD detached from fromtag) 0 0
+	refs/heads/ambiguous 0 0
+	refs/heads/branch-two 0 0
+	refs/heads/branch-one 1 0
+	refs/heads/main 1 0
+	refs/heads/ref-to-branch 1 0
+	refs/heads/ref-to-remote 1 0
+	EOF
+	git branch --format="%(refname) %(ahead-behind:HEAD)" \
+		--sort=refname --sort=ahead-behind:HEAD >actual &&
+	test_cmp expect actual &&
+
+	cat >expect <<-\EOF &&
+	(HEAD detached from fromtag) 0 0
+	refs/heads/branch-one 1 0
+	refs/heads/main 1 0
+	refs/heads/ref-to-branch 1 0
+	refs/heads/ref-to-remote 1 0
+	refs/heads/ambiguous 0 0
+	refs/heads/branch-two 0 0
+	EOF
+	git branch --format="%(refname) %(ahead-behind:HEAD)" \
+		--sort=refname --sort=-ahead-behind:HEAD >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'git branch with --format=%(rest) must fail' '
 	test_must_fail git branch --format="%(rest)" >actual
 '

From 7ee4fd18ace71d187ee3ea5ba745a6a3493e0e0a Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ren=C3=A9=20Scharfe?= <l.s.r@web.de>
Date: Sat, 18 Jan 2025 18:11:51 +0100
Subject: [PATCH 15/28] ref-filter: move is-base tip to used_atom
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The string_list "is_base_tips" in struct ref_format stores the
committish part of "is-base:<committish>".  It has the same problems
that its sibling string_list "bases" had.  Fix them the same way as the
previous commit did for the latter, by replacing the string_list with
fields in "used_atom".

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 ref-filter.c          | 56 +++++++++++++++++++++++++------------------
 ref-filter.h          |  5 ----
 t/t6600-test-reach.sh | 29 ++++++++++++++++++++++
 3 files changed, 62 insertions(+), 28 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index 250e412c47181a..1c474224a04240 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -236,6 +236,7 @@ static struct used_atom {
 			       S_FINGERPRINT, S_PRI_KEY_FP, S_TRUST_LEVEL } option;
 		} signature;
 		struct {
+			char *name;
 			struct commit *commit;
 		} base;
 		struct strvec describe_args;
@@ -908,18 +909,16 @@ static int ahead_behind_atom_parser(struct ref_format *format UNUSED,
 	return 0;
 }
 
-static int is_base_atom_parser(struct ref_format *format,
-			       struct used_atom *atom UNUSED,
+static int is_base_atom_parser(struct ref_format *format UNUSED,
+			       struct used_atom *atom,
 			       const char *arg, struct strbuf *err)
 {
-	struct string_list_item *item;
-
 	if (!arg)
 		return strbuf_addf_ret(err, -1, _("expected format: %%(is-base:<committish>)"));
 
-	item = string_list_append(&format->is_base_tips, arg);
-	item->util = lookup_commit_reference_by_name(arg);
-	if (!item->util)
+	atom->u.base.name = xstrdup(arg);
+	atom->u.base.commit = lookup_commit_reference_by_name(arg);
+	if (!atom->u.base.commit)
 		die("failed to find '%s'", arg);
 
 	return 0;
@@ -3009,6 +3008,8 @@ void ref_array_clear(struct ref_array *array)
 			free(atom->u.head);
 		else if (atom->atom_type == ATOM_DESCRIBE)
 			strvec_clear(&atom->u.describe_args);
+		else if (atom->atom_type == ATOM_ISBASE)
+			free(atom->u.base.name);
 		else if (atom->atom_type == ATOM_TRAILERS ||
 			 (atom->atom_type == ATOM_CONTENTS &&
 			  atom->u.contents.option == C_TRAILERS)) {
@@ -3133,14 +3134,20 @@ void filter_ahead_behind(struct repository *r,
 }
 
 void filter_is_base(struct repository *r,
-		    struct ref_format *format,
 		    struct ref_array *array)
 {
 	struct commit **bases;
-	size_t bases_nr = 0;
+	size_t bases_nr = 0, is_base_nr;
 	struct ref_array_item **back_index;
 
-	if (!format->is_base_tips.nr || !array->nr)
+	if (!array->nr)
+		return;
+
+	for (size_t i = is_base_nr = 0; i < used_atom_cnt; i++) {
+		if (used_atom[i].atom_type == ATOM_ISBASE)
+			is_base_nr++;
+	}
+	if (!is_base_nr)
 		return;
 
 	CALLOC_ARRAY(back_index, array->nr);
@@ -3150,7 +3157,7 @@ void filter_is_base(struct repository *r,
 		const char *name = array->items[i]->refname;
 		struct commit *c = lookup_commit_reference_by_name_gently(name, 1);
 
-		CALLOC_ARRAY(array->items[i]->is_base, format->is_base_tips.nr);
+		CALLOC_ARRAY(array->items[i]->is_base, is_base_nr);
 
 		if (!c)
 			continue;
@@ -3160,15 +3167,20 @@ void filter_is_base(struct repository *r,
 		bases_nr++;
 	}
 
-	for (size_t i = 0; i < format->is_base_tips.nr; i++) {
-		struct commit *tip = format->is_base_tips.items[i].util;
-		int base_index = get_branch_base_for_tip(r, tip, bases, bases_nr);
+	for (size_t i = 0, j = 0; i < used_atom_cnt; i++) {
+		struct commit *tip;
+		int base_index;
+
+		if (used_atom[i].atom_type != ATOM_ISBASE)
+			continue;
 
+		tip = used_atom[i].u.base.commit;
+		base_index = get_branch_base_for_tip(r, tip, bases, bases_nr);
 		if (base_index < 0)
 			continue;
 
 		/* Store the string for use in output later. */
-		back_index[base_index]->is_base[i] = xstrdup(format->is_base_tips.items[i].string);
+		back_index[base_index]->is_base[j++] = xstrdup(used_atom[i].u.base.name);
 	}
 
 	free(back_index);
@@ -3260,8 +3272,7 @@ struct ref_sorting {
 };
 
 static inline int can_do_iterative_format(struct ref_filter *filter,
-					  struct ref_sorting *sorting,
-					  struct ref_format *format)
+					  struct ref_sorting *sorting)
 {
 	/*
 	 * Reference backends sort patterns lexicographically by refname, so if
@@ -3288,17 +3299,17 @@ static inline int can_do_iterative_format(struct ref_filter *filter,
 	for (size_t i = 0; i < used_atom_cnt; i++) {
 		if (used_atom[i].atom_type == ATOM_AHEADBEHIND)
 			return 0;
+		if (used_atom[i].atom_type == ATOM_ISBASE)
+			return 0;
 	}
-	return !(filter->reachable_from ||
-		 filter->unreachable_from ||
-		 format->is_base_tips.nr);
+	return !(filter->reachable_from || filter->unreachable_from);
 }
 
 void filter_and_format_refs(struct ref_filter *filter, unsigned int type,
 			    struct ref_sorting *sorting,
 			    struct ref_format *format)
 {
-	if (can_do_iterative_format(filter, sorting, format)) {
+	if (can_do_iterative_format(filter, sorting)) {
 		int save_commit_buffer_orig;
 		struct ref_filter_and_format_cbdata ref_cbdata = {
 			.filter = filter,
@@ -3315,7 +3326,7 @@ void filter_and_format_refs(struct ref_filter *filter, unsigned int type,
 		struct ref_array array = { 0 };
 		filter_refs(&array, filter, type);
 		filter_ahead_behind(the_repository, &array);
-		filter_is_base(the_repository, format, &array);
+		filter_is_base(the_repository, &array);
 		ref_array_sort(sorting, &array);
 		print_formatted_ref_array(&array, format);
 		ref_array_clear(&array);
@@ -3658,6 +3669,5 @@ void ref_format_init(struct ref_format *format)
 
 void ref_format_clear(struct ref_format *format)
 {
-	string_list_clear(&format->is_base_tips, 0);
 	ref_format_init(format);
 }
diff --git a/ref-filter.h b/ref-filter.h
index 5f3dd6c9318915..0ba94df65198a2 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -99,9 +99,6 @@ struct ref_format {
 	/* Internal state to ref-filter */
 	int need_color_reset_at_eol;
 
-	/* List of bases for is-base indicators. */
-	struct string_list is_base_tips;
-
 	struct {
 		int max_count;
 		int omit_empty;
@@ -114,7 +111,6 @@ struct ref_format {
 }
 #define REF_FORMAT_INIT {             \
 	.use_color = -1,              \
-	.is_base_tips = STRING_LIST_INIT_DUP, \
 }
 
 /*  Macros for checking --merged and --no-merged options */
@@ -210,7 +206,6 @@ void filter_ahead_behind(struct repository *r,
  * If this is not called, then any is-base atoms will be blank.
  */
 void filter_is_base(struct repository *r,
-		    struct ref_format *format,
 		    struct ref_array *array);
 
 void ref_filter_init(struct ref_filter *filter);
diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
index 2591f8b8b39bf4..6638d1aa1dcebe 100755
--- a/t/t6600-test-reach.sh
+++ b/t/t6600-test-reach.sh
@@ -733,4 +733,33 @@ test_expect_success 'for-each-ref is-base:multiple' '
 		--format="%(refname)[%(is-base:commit-2-3)-%(is-base:commit-6-5)]" --stdin
 '
 
+test_expect_success 'for-each-ref is-base: --sort' '
+	cat >input <<-\EOF &&
+	refs/heads/commit-1-1
+	refs/heads/commit-4-2
+	refs/heads/commit-4-4
+	refs/heads/commit-8-4
+	EOF
+
+	cat >expect <<-\EOF &&
+	refs/heads/commit-1-1
+	refs/heads/commit-4-4
+	refs/heads/commit-8-4
+	refs/heads/commit-4-2
+	EOF
+	run_all_modes git for-each-ref \
+		--format="%(refname)" --stdin \
+		--sort=refname --sort=is-base:commit-2-3 &&
+
+	cat >expect <<-\EOF &&
+	refs/heads/commit-4-2
+	refs/heads/commit-1-1
+	refs/heads/commit-4-4
+	refs/heads/commit-8-4
+	EOF
+	run_all_modes git for-each-ref \
+		--format="%(refname)" --stdin \
+		--sort=refname --sort=-is-base:commit-2-3
+'
+
 test_done

From c5490ce9d1b625516b17253d1d2d0352730b7b84 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ren=C3=A9=20Scharfe?= <l.s.r@web.de>
Date: Sat, 18 Jan 2025 18:11:59 +0100
Subject: [PATCH 16/28] ref-filter: remove ref_format_clear()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Now that ref_format_clear() no longer releases any memory we don't need
it anymore.  Remove it and its counterpart, ref_format_init().

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/branch.c       |  1 -
 builtin/for-each-ref.c |  1 -
 builtin/tag.c          |  1 -
 builtin/verify-tag.c   |  1 -
 ref-filter.c           | 11 -----------
 ref-filter.h           |  3 ---
 6 files changed, 18 deletions(-)

diff --git a/builtin/branch.c b/builtin/branch.c
index fbb9536282dcaa..9a29de5bf1dd41 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -884,7 +884,6 @@ int cmd_branch(int argc,
 		string_list_clear(&output, 0);
 		ref_sorting_release(sorting);
 		ref_filter_clear(&filter);
-		ref_format_clear(&format);
 
 		ret = 0;
 		goto out;
diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
index 715745a262aa35..8085ebd8fe97b5 100644
--- a/builtin/for-each-ref.c
+++ b/builtin/for-each-ref.c
@@ -108,7 +108,6 @@ int cmd_for_each_ref(int argc,
 	filter_and_format_refs(&filter, flags, sorting, &format);
 
 	ref_filter_clear(&filter);
-	ref_format_clear(&format);
 	ref_sorting_release(sorting);
 	strvec_clear(&vec);
 	return 0;
diff --git a/builtin/tag.c b/builtin/tag.c
index c4bd1458318900..e8a344b9264b9c 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -698,7 +698,6 @@ int cmd_tag(int argc,
 cleanup:
 	ref_sorting_release(sorting);
 	ref_filter_clear(&filter);
-	ref_format_clear(&format);
 	strbuf_release(&buf);
 	strbuf_release(&ref);
 	strbuf_release(&reflog_msg);
diff --git a/builtin/verify-tag.c b/builtin/verify-tag.c
index a7f20618ffd528..f6b97048a57d37 100644
--- a/builtin/verify-tag.c
+++ b/builtin/verify-tag.c
@@ -69,6 +69,5 @@ int cmd_verify_tag(int argc,
 		if (format.format)
 			pretty_print_ref(name, &oid, &format);
 	}
-	ref_format_clear(&format);
 	return had_error;
 }
diff --git a/ref-filter.c b/ref-filter.c
index 1c474224a04240..2ec72d66bbb256 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -3660,14 +3660,3 @@ void ref_filter_clear(struct ref_filter *filter)
 	free_commit_list(filter->unreachable_from);
 	ref_filter_init(filter);
 }
-
-void ref_format_init(struct ref_format *format)
-{
-	struct ref_format blank = REF_FORMAT_INIT;
-	memcpy(format, &blank, sizeof(blank));
-}
-
-void ref_format_clear(struct ref_format *format)
-{
-	ref_format_init(format);
-}
diff --git a/ref-filter.h b/ref-filter.h
index 0ba94df65198a2..013d4cfa64b310 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -211,7 +211,4 @@ void filter_is_base(struct repository *r,
 void ref_filter_init(struct ref_filter *filter);
 void ref_filter_clear(struct ref_filter *filter);
 
-void ref_format_init(struct ref_format *format);
-void ref_format_clear(struct ref_format *format);
-
 #endif /*  REF_FILTER_H  */

From 75bc40de27966ec02fc0fb91b875a47fff022f9b Mon Sep 17 00:00:00 2001
From: Junio C Hamano <gitster@pobox.com>
Date: Sun, 19 Jan 2025 08:23:08 -0500
Subject: [PATCH 17/28] bswap.h: squelch potential sparse -Wcast-truncate
 warnings

In put_be32(), we right-shift a uint32_t value various amounts and then
assign the low 8-bits to individual "unsigned char" bytes, throwing away
the high bits. For shifts smaller than 24 bits, those thrown away bits
will be arbitrary bits from the original uint32_t.

This works exactly as we want, but if you feed a constant, then sparse
complains. For example if we write this (which we plan to do in a future
patch):

  put_be32(hdr, PACK_SIGNATURE);

then "make sparse" produces:

  compat/bswap.h:175:22: error: cast truncates bits from constant value (5041 becomes 41)
  compat/bswap.h:176:22: error: cast truncates bits from constant value (504143 becomes 43)
  compat/bswap.h:177:22: error: cast truncates bits from constant value (5041434b becomes 4b)

And the same issue exists in the other put_be*() functions, when used
with a constant.

We can silence this warning by explicitly masking off the truncated
bits. The compiler is smart enough to know the result is the same, and
the asm generated by gcc (with both -O0 and -O2) is identical.

Curiously this line already exists:

	put_be32(&hdr_version, INDEX_EXTENSION_VERSION2);

in the fsmonitor.c file, but it does not get flagged because the CPP
macro expands to a small integer (2).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 compat/bswap.h | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/compat/bswap.h b/compat/bswap.h
index 512f6f4b9937c8..b34054f2bd7284 100644
--- a/compat/bswap.h
+++ b/compat/bswap.h
@@ -171,23 +171,23 @@ static inline uint64_t get_be64(const void *ptr)
 static inline void put_be32(void *ptr, uint32_t value)
 {
 	unsigned char *p = ptr;
-	p[0] = value >> 24;
-	p[1] = value >> 16;
-	p[2] = value >>  8;
-	p[3] = value >>  0;
+	p[0] = (value >> 24) & 0xff;
+	p[1] = (value >> 16) & 0xff;
+	p[2] = (value >>  8) & 0xff;
+	p[3] = (value >>  0) & 0xff;
 }
 
 static inline void put_be64(void *ptr, uint64_t value)
 {
 	unsigned char *p = ptr;
-	p[0] = value >> 56;
-	p[1] = value >> 48;
-	p[2] = value >> 40;
-	p[3] = value >> 32;
-	p[4] = value >> 24;
-	p[5] = value >> 16;
-	p[6] = value >>  8;
-	p[7] = value >>  0;
+	p[0] = (value >> 56) & 0xff;
+	p[1] = (value >> 48) & 0xff;
+	p[2] = (value >> 40) & 0xff;
+	p[3] = (value >> 32) & 0xff;
+	p[4] = (value >> 24) & 0xff;
+	p[5] = (value >> 16) & 0xff;
+	p[6] = (value >>  8) & 0xff;
+	p[7] = (value >>  0) & 0xff;
 }
 
 #endif /* COMPAT_BSWAP_H */

From b3c9b6138373ccc8ca3568593b3ae0fe72c25ad0 Mon Sep 17 00:00:00 2001
From: Jeff King <peff@peff.net>
Date: Sun, 19 Jan 2025 08:23:37 -0500
Subject: [PATCH 18/28] packfile: factor out --pack_header argument parsing

Both index-pack and unpack-objects accept a --pack_header argument. This
is an undocumented internal argument used by receive-pack and fetch to
pass along information about the header of the pack, which they've
already read from the incoming stream.

In preparation for a bugfix, let's factor the duplicated code into a
common helper.

The callers are still responsible for identifying the option. While this
could likewise be factored out, it is more flexible this way (e.g., if
they ever started using parse-options and wanted to handle both the
stuck and unstuck forms).

Likewise, the callers are responsible for reporting errors, though they
both just call die(). I've tweaked unpack-objects to match index-pack in
marking the error for translation.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 builtin/index-pack.c     | 14 +++-----------
 builtin/unpack-objects.c | 16 ++++------------
 packfile.c               | 17 +++++++++++++++++
 packfile.h               |  6 ++++++
 4 files changed, 30 insertions(+), 23 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 763b01372aade4..bab42dfc2a491a 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1801,18 +1801,10 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 					nr_threads = 1;
 				}
 			} else if (starts_with(arg, "--pack_header=")) {
-				struct pack_header *hdr;
-				char *c;
-
-				hdr = (struct pack_header *)input_buffer;
-				hdr->hdr_signature = htonl(PACK_SIGNATURE);
-				hdr->hdr_version = htonl(strtoul(arg + 14, &c, 10));
-				if (*c != ',')
-					die(_("bad %s"), arg);
-				hdr->hdr_entries = htonl(strtoul(c + 1, &c, 10));
-				if (*c)
+				if (parse_pack_header_option(arg + 14,
+							     input_buffer,
+							     &input_len) < 0)
 					die(_("bad %s"), arg);
-				input_len = sizeof(*hdr);
 			} else if (!strcmp(arg, "-v")) {
 				verbose = 1;
 			} else if (!strcmp(arg, "--progress-title")) {
diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index 08fa2a7a743dc9..31614472749bf9 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -15,6 +15,7 @@
 #include "progress.h"
 #include "decorate.h"
 #include "fsck.h"
+#include "packfile.h"
 
 static int dry_run, quiet, recover, has_errors, strict;
 static const char unpack_usage[] = "git unpack-objects [-n] [-q] [-r] [--strict]";
@@ -639,18 +640,9 @@ int cmd_unpack_objects(int argc, const char **argv, const char *prefix UNUSED)
 				continue;
 			}
 			if (starts_with(arg, "--pack_header=")) {
-				struct pack_header *hdr;
-				char *c;
-
-				hdr = (struct pack_header *)buffer;
-				hdr->hdr_signature = htonl(PACK_SIGNATURE);
-				hdr->hdr_version = htonl(strtoul(arg + 14, &c, 10));
-				if (*c != ',')
-					die("bad %s", arg);
-				hdr->hdr_entries = htonl(strtoul(c + 1, &c, 10));
-				if (*c)
-					die("bad %s", arg);
-				len = sizeof(*hdr);
+				if (parse_pack_header_option(arg + 14,
+							     buffer, &len) < 0)
+					die(_("bad %s"), arg);
 				continue;
 			}
 			if (skip_prefix(arg, "--max-input-size=", &arg)) {
diff --git a/packfile.c b/packfile.c
index 813584646f762a..e2bdadc7cbd44b 100644
--- a/packfile.c
+++ b/packfile.c
@@ -2294,3 +2294,20 @@ int is_promisor_object(const struct object_id *oid)
 	}
 	return oidset_contains(&promisor_objects, oid);
 }
+
+int parse_pack_header_option(const char *in, unsigned char *out, unsigned int *len)
+{
+	struct pack_header *hdr;
+	char *c;
+
+	hdr = (struct pack_header *)out;
+	hdr->hdr_signature = htonl(PACK_SIGNATURE);
+	hdr->hdr_version = htonl(strtoul(in, &c, 10));
+	if (*c != ',')
+		return -1;
+	hdr->hdr_entries = htonl(strtoul(c + 1, &c, 10));
+	if (*c)
+		return -1;
+	*len = sizeof(*hdr);
+	return 0;
+}
diff --git a/packfile.h b/packfile.h
index eb18ec15dbf3bc..41f38b4832bf91 100644
--- a/packfile.h
+++ b/packfile.h
@@ -210,4 +210,10 @@ int is_promisor_object(const struct object_id *oid);
 int load_idx(const char *path, const unsigned int hashsz, void *idx_map,
 	     size_t idx_size, struct packed_git *p);
 
+/*
+ * Parse a --pack_header option as accepted by index-pack and unpack-objects,
+ * turning it into the matching bytes we'd find in a pack.
+ */
+int parse_pack_header_option(const char *in, unsigned char *out, unsigned int *len);
+
 #endif

From 56c5e82ca8e351bb2093a804e6684a72dd392125 Mon Sep 17 00:00:00 2001
From: Jeff King <peff@peff.net>
Date: Sun, 19 Jan 2025 08:23:44 -0500
Subject: [PATCH 19/28] parse_pack_header_option(): avoid unaligned memory
 writes

In order to recreate a pack header in our in-memory buffer, we cast the
buffer to a "struct pack_header" and assign the individual fields. This
is reported to cause SIGBUS on sparc64 due to alignment issues.

We can work around this by using put_be32() which will write individual
bytes into the buffer.

Reported-by: Koakuma <koachan@protonmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 packfile.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/packfile.c b/packfile.c
index e2bdadc7cbd44b..93f771ad95c949 100644
--- a/packfile.c
+++ b/packfile.c
@@ -2297,17 +2297,20 @@ int is_promisor_object(const struct object_id *oid)
 
 int parse_pack_header_option(const char *in, unsigned char *out, unsigned int *len)
 {
-	struct pack_header *hdr;
+	unsigned char *hdr;
 	char *c;
 
-	hdr = (struct pack_header *)out;
-	hdr->hdr_signature = htonl(PACK_SIGNATURE);
-	hdr->hdr_version = htonl(strtoul(in, &c, 10));
+	hdr = out;
+	put_be32(hdr, PACK_SIGNATURE);
+	hdr += 4;
+	put_be32(hdr, strtoul(in, &c, 10));
+	hdr += 4;
 	if (*c != ',')
 		return -1;
-	hdr->hdr_entries = htonl(strtoul(c + 1, &c, 10));
+	put_be32(hdr, strtoul(c + 1, &c, 10));
+	hdr += 4;
 	if (*c)
 		return -1;
-	*len = sizeof(*hdr);
+	*len = hdr - out;
 	return 0;
 }

From 7215d586d4139a7f30039a4d106f3965ae8f6071 Mon Sep 17 00:00:00 2001
From: Jeff King <peff@peff.net>
Date: Sun, 19 Jan 2025 08:25:47 -0500
Subject: [PATCH 20/28] index-pack, unpack-objects: use get_be32() for reading
 pack header

Both of these commands read the incoming pack into a static unsigned
char buffer in BSS, and then parse it by casting the start of the buffer
to a struct pack_header. This can result in SIGBUS on some platforms if
the compiler doesn't place the buffer in a position that is properly
aligned for 4-byte integers.

This reportedly happens with unpack-objects (but not index-pack) on
sparc64 when compiled with clang (but not gcc). But we are definitely in
the wrong in both spots; since the buffer's type is unsigned char, we
can't depend on larger alignment. When it works it is only because we
are lucky.

We'll fix this by switching to get_be32() to read the headers (just like
the last few commits similarly switched us to put_be32() for writing
into the same buffer).

It would be nice to factor this out into a common helper function, but
the interface ends up quite awkward. Either the caller needs to hardcode
how many bytes we'll need, or it needs to pass us its fill()/use()
functions as pointers. So I've just fixed both spots in the same way;
this is not code that is likely to be repeated a third time (most of the
pack reading code uses an mmap'd buffer, which should be properly
aligned).

I did make one tweak to the shared code: our pack_version_ok() macro
expects us to pass the big-endian value we'd get by casting. We can
introduce a "native" variant which uses the host integer ordering.

Reported-by: Koakuma <koachan@protonmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 builtin/index-pack.c     | 12 +++++++-----
 builtin/unpack-objects.c | 13 +++++++------
 pack.h                   |  3 ++-
 3 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index bab42dfc2a491a..5f0ff1ce04a22f 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -363,16 +363,18 @@ static const char *open_pack_file(const char *pack_name)
 
 static void parse_pack_header(void)
 {
-	struct pack_header *hdr = fill(sizeof(struct pack_header));
+	unsigned char *hdr = fill(sizeof(struct pack_header));
 
 	/* Header consistency check */
-	if (hdr->hdr_signature != htonl(PACK_SIGNATURE))
+	if (get_be32(hdr) != PACK_SIGNATURE)
 		die(_("pack signature mismatch"));
-	if (!pack_version_ok(hdr->hdr_version))
+	hdr += 4;
+	if (!pack_version_ok_native(get_be32(hdr)))
 		die(_("pack version %"PRIu32" unsupported"),
-			ntohl(hdr->hdr_version));
+		    get_be32(hdr));
+	hdr += 4;
 
-	nr_objects = ntohl(hdr->hdr_entries);
+	nr_objects = get_be32(hdr);
 	use(sizeof(struct pack_header));
 }
 
diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index 31614472749bf9..fc3de6dac8913e 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -576,15 +576,16 @@ static void unpack_one(unsigned nr)
 static void unpack_all(void)
 {
 	int i;
-	struct pack_header *hdr = fill(sizeof(struct pack_header));
+	unsigned char *hdr = fill(sizeof(struct pack_header));
 
-	nr_objects = ntohl(hdr->hdr_entries);
-
-	if (ntohl(hdr->hdr_signature) != PACK_SIGNATURE)
+	if (get_be32(hdr) != PACK_SIGNATURE)
 		die("bad pack file");
-	if (!pack_version_ok(hdr->hdr_version))
+	hdr += 4;
+	if (!pack_version_ok_native(get_be32(hdr)))
 		die("unknown pack file version %"PRIu32,
-			ntohl(hdr->hdr_version));
+		    get_be32(hdr));
+	hdr += 4;
+	nr_objects = get_be32(hdr);
 	use(sizeof(struct pack_header));
 
 	if (!quiet)
diff --git a/pack.h b/pack.h
index 3ab9e3f60c0b03..33b42fdad597dc 100644
--- a/pack.h
+++ b/pack.h
@@ -13,7 +13,8 @@ struct repository;
  */
 #define PACK_SIGNATURE 0x5041434b	/* "PACK" */
 #define PACK_VERSION 2
-#define pack_version_ok(v) ((v) == htonl(2) || (v) == htonl(3))
+#define pack_version_ok(v) pack_version_ok_native(ntohl(v))
+#define pack_version_ok_native(v) ((v) == 2 || (v) == 3)
 struct pack_header {
 	uint32_t hdr_signature;
 	uint32_t hdr_version;

From f2d9cf95f7f9f84d54e1a9f96f867e818fcb6a71 Mon Sep 17 00:00:00 2001
From: Jeff King <peff@peff.net>
Date: Sun, 19 Jan 2025 08:25:53 -0500
Subject: [PATCH 21/28] index-pack, unpack-objects: use skip_prefix to avoid
 magic number

When parsing --pack_header=, we manually skip 14 bytes to the data.
Let's use skip_prefix() to do this automatically.

Note that we overwrite our pointer to the front of the string, so we
have to add more context to the error message. We could avoid this by
declaring an extra pointer to hold the value, but I think the modified
message is actually preferable; it should give translators a bit more
context.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 builtin/index-pack.c     | 6 +++---
 builtin/unpack-objects.c | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 5f0ff1ce04a22f..d80e05458121ea 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1802,11 +1802,11 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 					warning(_("no threads support, ignoring %s"), arg);
 					nr_threads = 1;
 				}
-			} else if (starts_with(arg, "--pack_header=")) {
-				if (parse_pack_header_option(arg + 14,
+			} else if (skip_prefix(arg, "--pack_header=", &arg)) {
+				if (parse_pack_header_option(arg,
 							     input_buffer,
 							     &input_len) < 0)
-					die(_("bad %s"), arg);
+					die(_("bad --pack_header: %s"), arg);
 			} else if (!strcmp(arg, "-v")) {
 				verbose = 1;
 			} else if (!strcmp(arg, "--progress-title")) {
diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index fc3de6dac8913e..028cfe175e035e 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -640,10 +640,10 @@ int cmd_unpack_objects(int argc, const char **argv, const char *prefix UNUSED)
 				fsck_set_msg_types(&fsck_options, arg);
 				continue;
 			}
-			if (starts_with(arg, "--pack_header=")) {
-				if (parse_pack_header_option(arg + 14,
+			if (skip_prefix(arg, "--pack_header=", &arg)) {
+				if (parse_pack_header_option(arg,
 							     buffer, &len) < 0)
-					die(_("bad %s"), arg);
+					die(_("bad --pack_header: %s"), arg);
 				continue;
 			}
 			if (skip_prefix(arg, "--max-input-size=", &arg)) {

From 66387793677c660bbafeb8bbf1ec2ac914e6a1d0 Mon Sep 17 00:00:00 2001
From: Karthik Nayak <karthik.188@gmail.com>
Date: Tue, 21 Jan 2025 04:34:10 +0100
Subject: [PATCH 22/28] refs: mark `ref_transaction_update_reflog()` as static

The `ref_transaction_update_reflog()` function is only used within
'refs.c', so mark it as static.

Reported-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 refs.c | 22 +++++++++++++++-------
 refs.h | 14 --------------
 2 files changed, 15 insertions(+), 21 deletions(-)

diff --git a/refs.c b/refs.c
index a5851a7de00878..4c67c95fbed0c9 100644
--- a/refs.c
+++ b/refs.c
@@ -1270,13 +1270,21 @@ int ref_transaction_update(struct ref_transaction *transaction,
 	return 0;
 }
 
-int ref_transaction_update_reflog(struct ref_transaction *transaction,
-				  const char *refname,
-				  const struct object_id *new_oid,
-				  const struct object_id *old_oid,
-				  const char *committer_info, unsigned int flags,
-				  const char *msg, unsigned int index,
-				  struct strbuf *err)
+/*
+ * Similar to`ref_transaction_update`, but this function is only for adding
+ * a reflog update. Supports providing custom committer information. The index
+ * field can be utiltized to order updates as desired. When not used, the
+ * updates default to being ordered by refname.
+ */
+static int ref_transaction_update_reflog(struct ref_transaction *transaction,
+					 const char *refname,
+					 const struct object_id *new_oid,
+					 const struct object_id *old_oid,
+					 const char *committer_info,
+					 unsigned int flags,
+					 const char *msg,
+					 unsigned int index,
+					 struct strbuf *err)
 {
 	struct ref_update *update;
 
diff --git a/refs.h b/refs.h
index b0dfc65ed2e59c..a5bedf48cf6de9 100644
--- a/refs.h
+++ b/refs.h
@@ -727,20 +727,6 @@ int ref_transaction_update(struct ref_transaction *transaction,
 			   unsigned int flags, const char *msg,
 			   struct strbuf *err);
 
-/*
- * Similar to`ref_transaction_update`, but this function is only for adding
- * a reflog update. Supports providing custom committer information. The index
- * field can be utiltized to order updates as desired. When not used, the
- * updates default to being ordered by refname.
- */
-int ref_transaction_update_reflog(struct ref_transaction *transaction,
-				  const char *refname,
-				  const struct object_id *new_oid,
-				  const struct object_id *old_oid,
-				  const char *committer_info, unsigned int flags,
-				  const char *msg, unsigned int index,
-				  struct strbuf *err);
-
 /*
  * Add a reference creation to transaction. new_oid is the value that
  * the reference should have after the update; it must not be

From a89e12dc164f43f9e728ac096db02bd7e403bdbc Mon Sep 17 00:00:00 2001
From: Karthik Nayak <karthik.188@gmail.com>
Date: Tue, 21 Jan 2025 04:34:11 +0100
Subject: [PATCH 23/28] refs: use 'uint64_t' for 'ref_update.index'

The 'ref_update.index' variable is used to store an index for a given
reference update. This index is used to order the updates in a
predetermined order, while the default ordering is alphabetical as per
the refname.

For large repositories with millions of references, it should be safer
to use 'uint64_t'. Let's do that. This also is applied for all other
code sections where we store 'index' and pass it around.

Reported-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 refs.c                  | 4 ++--
 refs/refs-internal.h    | 4 ++--
 refs/reftable-backend.c | 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/refs.c b/refs.c
index 4c67c95fbed0c9..c65a554a7368ed 100644
--- a/refs.c
+++ b/refs.c
@@ -1283,7 +1283,7 @@ static int ref_transaction_update_reflog(struct ref_transaction *transaction,
 					 const char *committer_info,
 					 unsigned int flags,
 					 const char *msg,
-					 unsigned int index,
+					 uint64_t index,
 					 struct strbuf *err)
 {
 	struct ref_update *update;
@@ -2731,7 +2731,7 @@ static int migrate_one_ref(const char *refname, const char *referent UNUSED, con
 }
 
 struct reflog_migration_data {
-	unsigned int index;
+	uint64_t index;
 	const char *refname;
 	struct ref_store *old_refs;
 	struct ref_transaction *transaction;
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 2aaff91ab4826a..ecb25f6d2062f2 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -120,7 +120,7 @@ struct ref_update {
 	 * when migrating reflogs and we want to ensure we carry over the
 	 * same order.
 	 */
-	unsigned int index;
+	uint64_t index;
 
 	/*
 	 * If this ref_update was split off of a symref update via
@@ -203,7 +203,7 @@ struct ref_transaction {
 	enum ref_transaction_state state;
 	void *backend_data;
 	unsigned int flags;
-	unsigned int max_index;
+	uint64_t max_index;
 };
 
 /*
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 68db2baa8f15c4..628bfa0c5be922 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -852,7 +852,7 @@ struct write_transaction_table_arg {
 	size_t updates_nr;
 	size_t updates_alloc;
 	size_t updates_expected;
-	unsigned int max_index;
+	uint64_t max_index;
 };
 
 struct reftable_transaction_data {

From 148560f1e3bbdb9cf27533bed43cdf116bf61d01 Mon Sep 17 00:00:00 2001
From: Karthik Nayak <karthik.188@gmail.com>
Date: Tue, 21 Jan 2025 04:34:12 +0100
Subject: [PATCH 24/28] reftable: prevent 'update_index' changes after adding
 records

The function `reftable_writer_set_limits()` allows updating the
'min_update_index' and 'max_update_index' of a reftable writer. These
values are written to both the writer's header and footer.

Since the header is written during the first block write, any subsequent
changes to the update index would create a mismatch between the header
and footer values. The footer would contain the newer values while the
header retained the original ones.

To fix this bug, prevent callers from updating these values after any
record is written. To do this, modify the function to return an error
whenever the limits are modified after any record adds. Check for record
adds within `reftable_writer_set_limits()` by checking the `last_key`
variable, which is set whenever a new record is added.

Modify all callers of the function to anticipate a return type and
handle it accordingly.

Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 refs/reftable-backend.c         | 20 +++++++++++++++-----
 reftable/reftable-error.h       |  1 +
 reftable/reftable-writer.h      | 24 ++++++++++++++----------
 reftable/stack.c                |  6 ++++--
 reftable/writer.c               | 13 +++++++++++--
 t/unit-tests/t-reftable-stack.c |  8 +++++---
 6 files changed, 50 insertions(+), 22 deletions(-)

diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 628bfa0c5be922..b94a6abae0e4bb 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -1317,7 +1317,9 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
 	 * multiple entries. Each entry will contain a different update_index,
 	 * so set the limits accordingly.
 	 */
-	reftable_writer_set_limits(writer, ts, ts + arg->max_index);
+	ret = reftable_writer_set_limits(writer, ts, ts + arg->max_index);
+	if (ret < 0)
+		goto done;
 
 	for (i = 0; i < arg->updates_nr; i++) {
 		struct reftable_transaction_update *tx_update = &arg->updates[i];
@@ -1640,7 +1642,9 @@ static int write_copy_table(struct reftable_writer *writer, void *cb_data)
 	deletion_ts = creation_ts = reftable_stack_next_update_index(arg->stack);
 	if (arg->delete_old)
 		creation_ts++;
-	reftable_writer_set_limits(writer, deletion_ts, creation_ts);
+	ret = reftable_writer_set_limits(writer, deletion_ts, creation_ts);
+	if (ret < 0)
+		goto done;
 
 	/*
 	 * Add the new reference. If this is a rename then we also delete the
@@ -2160,7 +2164,9 @@ static int write_reflog_existence_table(struct reftable_writer *writer,
 	if (ret <= 0)
 		goto done;
 
-	reftable_writer_set_limits(writer, ts, ts);
+	ret = reftable_writer_set_limits(writer, ts, ts);
+	if (ret < 0)
+		goto done;
 
 	/*
 	 * The existence entry has both old and new object ID set to the
@@ -2219,7 +2225,9 @@ static int write_reflog_delete_table(struct reftable_writer *writer, void *cb_da
 	uint64_t ts = reftable_stack_next_update_index(arg->stack);
 	int ret;
 
-	reftable_writer_set_limits(writer, ts, ts);
+	ret = reftable_writer_set_limits(writer, ts, ts);
+	if (ret < 0)
+		goto out;
 
 	ret = reftable_stack_init_log_iterator(arg->stack, &it);
 	if (ret < 0)
@@ -2295,7 +2303,9 @@ static int write_reflog_expiry_table(struct reftable_writer *writer, void *cb_da
 		if (arg->records[i].value_type == REFTABLE_LOG_UPDATE)
 			live_records++;
 
-	reftable_writer_set_limits(writer, ts, ts);
+	ret = reftable_writer_set_limits(writer, ts, ts);
+	if (ret < 0)
+		return ret;
 
 	if (!is_null_oid(&arg->update_oid)) {
 		struct reftable_ref_record ref = {0};
diff --git a/reftable/reftable-error.h b/reftable/reftable-error.h
index f4048265629fe4..a7e33d964d0cfe 100644
--- a/reftable/reftable-error.h
+++ b/reftable/reftable-error.h
@@ -30,6 +30,7 @@ enum reftable_error {
 
 	/* Misuse of the API:
 	 *  - on writing a record with NULL refname.
+	 *  - on writing a record before setting the writer limits.
 	 *  - on writing a reftable_ref_record outside the table limits
 	 *  - on writing a ref or log record before the stack's
 	 * next_update_inde*x
diff --git a/reftable/reftable-writer.h b/reftable/reftable-writer.h
index e4fc95378835ff..46776578ef5573 100644
--- a/reftable/reftable-writer.h
+++ b/reftable/reftable-writer.h
@@ -109,17 +109,21 @@ int reftable_writer_new(struct reftable_writer **out,
 			int (*flush_func)(void *),
 			void *writer_arg, const struct reftable_write_options *opts);
 
-/* Set the range of update indices for the records we will add. When writing a
-   table into a stack, the min should be at least
-   reftable_stack_next_update_index(), or REFTABLE_API_ERROR is returned.
-
-   For transactional updates to a stack, typically min==max, and the
-   update_index can be obtained by inspeciting the stack. When converting an
-   existing ref database into a single reftable, this would be a range of
-   update-index timestamps.
+/*
+ * Set the range of update indices for the records we will add. When writing a
+ * table into a stack, the min should be at least
+ * reftable_stack_next_update_index(), or REFTABLE_API_ERROR is returned.
+ *
+ * For transactional updates to a stack, typically min==max, and the
+ * update_index can be obtained by inspeciting the stack. When converting an
+ * existing ref database into a single reftable, this would be a range of
+ * update-index timestamps.
+ *
+ * The function should be called before adding any records to the writer. If not
+ * it will fail with REFTABLE_API_ERROR.
  */
-void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
-				uint64_t max);
+int reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
+			       uint64_t max);
 
 /*
   Add a reftable_ref_record. The record should have names that come after
diff --git a/reftable/stack.c b/reftable/stack.c
index c33979536efa3a..95938ec1ef5af2 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -1045,8 +1045,10 @@ static int stack_write_compact(struct reftable_stack *st,
 
 	for (size_t i = first; i <= last; i++)
 		st->stats.bytes += st->readers[i]->size;
-	reftable_writer_set_limits(wr, st->readers[first]->min_update_index,
-				   st->readers[last]->max_update_index);
+	err = reftable_writer_set_limits(wr, st->readers[first]->min_update_index,
+					 st->readers[last]->max_update_index);
+	if (err < 0)
+		goto done;
 
 	err = reftable_merged_table_new(&mt, st->readers + first, subtabs_len,
 					st->opts.hash_id);
diff --git a/reftable/writer.c b/reftable/writer.c
index 0b88e5583b2993..0bd834090c0302 100644
--- a/reftable/writer.c
+++ b/reftable/writer.c
@@ -165,11 +165,20 @@ int reftable_writer_new(struct reftable_writer **out,
 	return 0;
 }
 
-void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
-				uint64_t max)
+int reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
+			       uint64_t max)
 {
+	/*
+	 * The limits should be set before any records are added to the writer.
+	 * Check if any records were added by checking if `last_key` was set.
+	 */
+	if (w->last_key.len)
+		return REFTABLE_API_ERROR;
+
 	w->min_update_index = min;
 	w->max_update_index = max;
+
+	return 0;
 }
 
 static void writer_release(struct reftable_writer *w)
diff --git a/t/unit-tests/t-reftable-stack.c b/t/unit-tests/t-reftable-stack.c
index 52b81475c36aa9..2738f449f0b252 100644
--- a/t/unit-tests/t-reftable-stack.c
+++ b/t/unit-tests/t-reftable-stack.c
@@ -98,7 +98,8 @@ static void t_read_file(void)
 static int write_test_ref(struct reftable_writer *wr, void *arg)
 {
 	struct reftable_ref_record *ref = arg;
-	reftable_writer_set_limits(wr, ref->update_index, ref->update_index);
+	check(!reftable_writer_set_limits(wr, ref->update_index,
+					  ref->update_index));
 	return reftable_writer_add_ref(wr, ref);
 }
 
@@ -138,7 +139,8 @@ static int write_test_log(struct reftable_writer *wr, void *arg)
 {
 	struct write_log_arg *wla = arg;
 
-	reftable_writer_set_limits(wr, wla->update_index, wla->update_index);
+	check(!reftable_writer_set_limits(wr, wla->update_index,
+					  wla->update_index));
 	return reftable_writer_add_log(wr, wla->log);
 }
 
@@ -956,7 +958,7 @@ static void t_reflog_expire(void)
 
 static int write_nothing(struct reftable_writer *wr, void *arg UNUSED)
 {
-	reftable_writer_set_limits(wr, 1, 1);
+	check(!reftable_writer_set_limits(wr, 1, 1));
 	return 0;
 }
 

From 1edca76e805c22daeba4d7ea4193bcfe3df98ef6 Mon Sep 17 00:00:00 2001
From: Adam Murray <ad@canva.com>
Date: Fri, 10 Jan 2025 07:28:20 +0000
Subject: [PATCH 25/28] trace2: prevent segfault on config collection where no
 value specified

When TRACE2 analytics is enabled, a git config option that has no value
causes a segfault.

Steps to Reproduce
GIT_TRACE2=true GIT_TRACE2_CONFIG_PARAMS=status.*
git -c status.relativePaths version
Expected Result
git version 2.46.0
Actual Result
zsh: segmentation fault GIT_TRACE2=true

This adds checks to prevent the segfault and instead return
an empty value.

Signed-off-by: Adam Murray <ad@canva.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/t0210-trace2-normal.sh | 8 ++++++++
 trace2.c                 | 2 +-
 trace2/tr2_tgt_event.c   | 3 ++-
 trace2/tr2_tgt_normal.c  | 5 +++--
 trace2/tr2_tgt_perf.c    | 5 +++--
 5 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/t/t0210-trace2-normal.sh b/t/t0210-trace2-normal.sh
index eff9a59dbd0d5d..b95f0f8a9dc3b0 100755
--- a/t/t0210-trace2-normal.sh
+++ b/t/t0210-trace2-normal.sh
@@ -243,6 +243,14 @@ test_expect_success 'bug messages followed by BUG() are written to trace2' '
 	test_cmp expect actual
 '
 
+test_expect_success 'empty configuration values are handled' '
+  test_when_finished "rm trace2.normal actual expect" &&
+  echo >expect &&
+  GIT_TRACE2="$(pwd)/trace2.normal"  GIT_TRACE2_CONFIG_PARAMS=foo.empty \
+	  git -c foo.empty config foo.empty >actual &&
+	test_cmp expect actual
+'
+
 sane_unset GIT_TRACE2_BRIEF
 
 # Now test without environment variables and get all Trace2 settings
diff --git a/trace2.c b/trace2.c
index 82d16e2783d986..a21b1e91dbca20 100644
--- a/trace2.c
+++ b/trace2.c
@@ -764,7 +764,7 @@ void trace2_def_param_fl(const char *file, int line, const char *param,
 	if (!trace2_enabled)
 		return;
 
-	redacted = redact_arg(value);
+	redacted = value ? redact_arg(value): NULL;
 
 	for_each_wanted_builtin (j, tgt_j)
 		if (tgt_j->pfn_param_fl)
diff --git a/trace2/tr2_tgt_event.c b/trace2/tr2_tgt_event.c
index 69ee40449fa4a7..5a0381791f7eb4 100644
--- a/trace2/tr2_tgt_event.c
+++ b/trace2/tr2_tgt_event.c
@@ -493,7 +493,8 @@ static void fn_param_fl(const char *file, int line, const char *param,
 	event_fmt_prepare(event_name, file, line, NULL, &jw);
 	jw_object_string(&jw, "scope", scope_name);
 	jw_object_string(&jw, "param", param);
-	jw_object_string(&jw, "value", value);
+	if (value)
+		jw_object_string(&jw, "value", value);
 	jw_end(&jw);
 
 	tr2_dst_write_line(&tr2dst_event, &jw.json);
diff --git a/trace2/tr2_tgt_normal.c b/trace2/tr2_tgt_normal.c
index baef48aa6989ce..924736ab36093b 100644
--- a/trace2/tr2_tgt_normal.c
+++ b/trace2/tr2_tgt_normal.c
@@ -307,8 +307,9 @@ static void fn_param_fl(const char *file, int line, const char *param,
 	enum config_scope scope = kvi->scope;
 	const char *scope_name = config_scope_name(scope);
 
-	strbuf_addf(&buf_payload, "def_param scope:%s %s=%s", scope_name, param,
-		    value);
+	strbuf_addf(&buf_payload, "def_param scope:%s %s", scope_name, param);
+	if (value)
+		strbuf_addf(&buf_payload, "=%s", value);
 	normal_io_write_fl(file, line, &buf_payload);
 	strbuf_release(&buf_payload);
 }
diff --git a/trace2/tr2_tgt_perf.c b/trace2/tr2_tgt_perf.c
index 298ae27f9d7f24..4eb9289f950505 100644
--- a/trace2/tr2_tgt_perf.c
+++ b/trace2/tr2_tgt_perf.c
@@ -448,8 +448,9 @@ static void fn_param_fl(const char *file, int line, const char *param,
 	struct strbuf scope_payload = STRBUF_INIT;
 	enum config_scope scope = kvi->scope;
 	const char *scope_name = config_scope_name(scope);
-
-	strbuf_addf(&buf_payload, "%s:%s", param, value);
+	strbuf_addstr(&buf_payload, param);
+	if (value)
+		strbuf_addf(&buf_payload, ":%s", value);
 	strbuf_addf(&scope_payload, "%s:%s", "scope", scope_name);
 
 	perf_io_write_fl(file, line, event_name, NULL, NULL, NULL,

From b7a9905f756b42aa4d2bafc07c489fa69e4fffca Mon Sep 17 00:00:00 2001
From: Jeff King <peff@peff.net>
Date: Mon, 13 Jan 2025 01:26:01 -0500
Subject: [PATCH 26/28] grep: prevent `^$` false match at end of file

In some implementations, `regexec_buf()` assumes that it is fed lines;
Without `REG_NOTEOL` it thinks the end of the buffer is the end of a
line. Which makes sense, but trips up this case because we are not
feeding lines, but rather a whole buffer. So the final newline is not
the start of an empty line, but the true end of the buffer.

This causes an interesting bug:

  $ echo content >file.txt
  $ git grep --no-index -n '^$' file.txt
  file.txt:2:

This bug is fixed by making the end of the buffer consistently the end
of the final line.

The patch was applied from
https://lore.kernel.org/git/20250113062601.GD767856@coredump.intra.peff.net/

Reported-by: Olly Betts <olly@survex.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 grep.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/grep.c b/grep.c
index 4e155ee9e66367..c4bb9f10814024 100644
--- a/grep.c
+++ b/grep.c
@@ -1646,6 +1646,8 @@ static int grep_source_1(struct grep_opt *opt, struct grep_source *gs, int colle
 
 	bol = gs->buf;
 	left = gs->size;
+	if (left && gs->buf[left-1] == '\n')
+		left--;
 	while (left) {
 		const char *eol;
 		int hit;

From 86d0c304264527fd26d7f1d29bab83c6e566d947 Mon Sep 17 00:00:00 2001
From: Jeff King <peff@peff.net>
Date: Tue, 21 Jan 2025 16:52:35 -0500
Subject: [PATCH 27/28] update-ref: do set reflog's `old_oid`

In git 2.48.1, the `git update-ref` subcommand no longer correctly
updates the reflog in some cases. Specifically, it appears that the
`old_oid` field will not be updated when modifying a branch referenced
by another symbolic ref (e.g. HEAD). This doesn't break the `git
reflog` subcommand, but does break references like `HEAD@{1}`, which
appear to read the `old_oid` field:

  git init -b main
  git commit --allow-empty -m "A"
  git commit --allow-empty -m "B"
  git update-ref -m "reason" refs/heads/main HEAD~ HEAD

The `old_oid` field is now empty (all zeroes). This is only the case in
derived reflogs (in this case .git/logs/HEAD). The reflog for
`refs/heads/main` appears to be updated correctly.

This was broken in 297c09eabb (refs: allow multiple reflog entries for
the same refname, 2024-12-16).

The reason for that was that there was assumed the flow of
`lock_ref_for_update()` for reflog only updates was to capture the lock
only. But this is wrong since this misses the `old_oid` population. As
such this patch is the correct fix.

Reported-by: Nika Layzell <nika@thelayzells.com>
Acked-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 refs/files-backend.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 5cfb8b7ca8678e..29f08dced40418 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2615,9 +2615,6 @@ static int lock_ref_for_update(struct files_ref_store *refs,
 
 	update->backend_data = lock;
 
-	if (update->flags & REF_LOG_ONLY)
-		goto out;
-
 	if (update->type & REF_ISSYMREF) {
 		if (update->flags & REF_NO_DEREF) {
 			/*

From 290ad15c95ffece76bebd729edc5a929259a2a74 Mon Sep 17 00:00:00 2001
From: Karthik Nayak <karthik.188@gmail.com>
Date: Wed, 15 Jan 2025 11:54:51 +0000
Subject: [PATCH 28/28] fixup! reftable: write correct max_update_index to
 header

The original commit was missing some initializations. This lead to the
somewhat intuitive (and not reliably reproducible, until the trick was
found to use `sanitize=address,undefined`) symptom that t1400.249 and/or
t2400.171 failed with:

  Assertion failed: (ret != REFTABLE_API_ERROR), function
  reftable_be_transaction_finish, file reftable-backend.c, line 1648.

or

  Assertion failed: (ret != REFTABLE_API_ERROR), function
  write_transaction_table, file reftable-backend.c, line 1619.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 refs/reftable-backend.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 9cfb0cb26721a9..d26b5bf85c9b2c 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -1020,6 +1020,7 @@ static int prepare_transaction_update(struct write_transaction_table_arg **out,
 		arg->updates_nr = 0;
 		arg->updates_alloc = 0;
 		arg->updates_expected = 0;
+		arg->max_index = 0;
 	}
 
 	arg->updates_expected++;
@@ -1634,6 +1635,8 @@ static int reftable_be_transaction_finish(struct ref_store *ref_store UNUSED,
 		tx_data->args->max_index = transaction->max_index;
 
 	for (size_t i = 0; i < tx_data->args_nr; i++) {
+		tx_data->args[i].max_index = transaction->max_index;
+
 		ret = reftable_addition_add(tx_data->args[i].addition,
 					    write_transaction_table, &tx_data->args[i]);
 		if (ret < 0)