Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*Sometimes* gix fetch gets stuck in negotiation with ssh:// remotes (hosted by gitea) #1061

Closed
jalil-salame opened this issue Oct 12, 2023 · 51 comments · Fixed by #1071
Closed
Labels
acknowledged an issue is accepted as shortcoming to be fixed feedback requested

Comments

@jalil-salame
Copy link
Contributor

Current behavior 😯

This happens on some repos sporadically (using v0.29.0, but it has happened long before that).

When you run gix fetch it is stuck in the negotiation phase forever(?) I tend to stop it after a few seconds, but I seem to remember it staying there for a few minutes.

Cancelling it with CTRL+C and rerunning the command causes the same behaviour.

Running a git fetch fixes the repo(?) and now gix fetch works again.

This is the error displayed after sending CTRL+C:

Error: An IO error occurred when talking to the server

Caused by:
    Broken pipe (os error 32)

Expected behavior 🤔

gix fetch should work or time out the negotiation after a resonable amount of time (a few seconds to a minute).

Steps to reproduce 🕹

?????

I see it happenning on my selfhosted gitea repos relatively often (~once every two or so weeks) but I have no idea how to reproduce this.

If you have any idea how I could go about diagnosing the issue I'll make sure to keep it in mind for the next time it happens. For now this is all I have.

@Byron Byron added acknowledged an issue is accepted as shortcoming to be fixed feedback requested labels Oct 13, 2023
@Byron
Copy link
Member

Byron commented Oct 13, 2023

Thanks for reporting!

From the error when hitting Ctrl+C I conclude that it's an ssh URL you are using?

It's too bad to hear that it's still not stable, but given the complexity of the protocol in conjunction with many kinds of transports, it's not entirely surprising either. It's strange that it's not always hanging though, which makes it appear like a dead-lock based on buffer sizes of some pipeline used to communicate. If SSH was trying to provide a response but gix is still trying to send data, it would deadlock. In theory, that shouldn't happen of course but apparently it does in practice. Maybe a flush is missing somewhere… and a write to ssh doesn't actually get there.

The reason it works after a git fetch is probably because then no negotiation is necessary anymore, or it's greatly simplified.

I think the best course of action would be to provide a public repository in its broken (local) state that will consistently let gix hang. From there it can be debugged so it becomes clear where it's hanging.

From my experience, HTTPS works fine, but it's a state-less mode of operation which is different from the stateful SSH connection, which I basically never use myself (note that stateful connections are tested, just not through SSH but through bare TCP).

To repeat, ideally for reproduction you can provide a local copy of a stuck public repository which I can use for reproduction and debugging.

@jalil-salame
Copy link
Contributor Author

Yes, I basically only use SSH for git.

I'll see if I can reproduce it in a small repo then upload the full (.git included?) repo as a test case?

It might be really hard to reproduce as most cases I remember this happening are from private repos...

I'm sorry, this is such a flaky thing to reproduce. Thanks for all the help!

@Byron Byron changed the title Sometimes gix fetch gets stuck in negotiation *Sometimes* gix fetch gets stuck in negotiation with ssh:// remotes (hosted by gitea) Oct 13, 2023
@Byron
Copy link
Member

Byron commented Oct 13, 2023

I'll see if I can reproduce it in a small repo then upload the full (.git included?) repo as a test case?

Yes, that would be optimal.

I'm sorry, this is such a flaky thing to reproduce. Thanks for all the help!

Thanks for helping me to make gix better!

There is also a --trace option, but right now it only prints at the end of an invocation which doesn't happen during hangs. Having an altenrative trace-mode that is instant might alleviate this, even though I don't think it would reveal that much.

There is a light at the end of the tunnel though, as it's definitely planned to offer a built-in native ssh client as transport as well instead of forwarding to the ssh binary. Once that is in place, and if this fixes the issue, it's clear that the cause of this issue is something about how gix communicates with the ssh process via pipelines, which is really easy to get wrong without noticing as many tests never reach certain thresholds that may cause these bugs to appear.

@jalil-salame
Copy link
Contributor Author

Using gix --trace fetch (private repo, can't upload .git repository) (using gix 0.30.0):

$ gix --trace fetch
^C 19:21:55 tracing INFO     run [ 7.06s | 23.06% / 100.00% ]                                                                                                                                    racing
 19:21:55 tracing INFO     ┝━ ThreadSafeRepository::discover() [ 10.2ms | 0.01% / 0.14% ]
 19:21:55 tracing INFO     │  ┕━ open_from_paths() [ 9.61ms | 0.03% / 0.14% ]
 19:21:55 tracing INFO     │     ┝━ gix_path::git::install_config_path() [ 7.43ms | 0.11% ]
 19:21:55 tracing INFO     │     ┕━ gix_odb::Store::at() [ 245µs | 0.00% ]
 19:21:55 tracing DEBUG    ┝━ 🐛 [debug]: gix_transport::SpawnProcessOnDemand | command: GIT_PROTOCOL="version=2" LANG="C" LC_ALL="C" "ssh" "-o" "SendEnv=GIT_PROTOCOL" "gitea@**censored**" "git-upload-pack" "\'jalil/**censored**.git\'"
 19:21:55 tracing INFO     ┕━ fetch::Prepare::receive() [ 5.42s | 0.00% / 76.80% ]
 19:21:55 tracing INFO        ┕━ negotiate [ 5.42s | 0.01% / 76.79% ]
 19:21:55 tracing DEBUG          ┝━ mark_complete_and_common_ref [ 1.52ms | 0.01% / 0.02% ] mappings: 1
 19:21:55 tracing INFO           │  ┝━ mark_all_refs [ 880µs | 0.01% ]
 19:21:55 tracing DEBUG          │  ┝━ mark_alternate_refs [ 1.18µs | 0.00% ] num_odb: 0
 19:21:55 tracing INFO           │  ┝━ mark known_common [ 2.48µs | 0.00% ]
 19:21:55 tracing DEBUG          │  ┕━ mark tips [ 2.77µs | 0.00% ] num_tips: 1
 19:21:55 tracing DEBUG          ┝━ negotiate round [ 5.42s | 76.77% ] round: 1
 19:21:55 tracing DEBUG          ┕━ negotiate round [ 90.7µs | 0.00% ] round: 2
Error: An IO error occurred when talking to the server

Caused by:
    Broken pipe (os error 32)

Killing it after 7s or 1min seems to make no difference to the trace output.

I will make a back up of this repo in case you have a fix you'd like to test.

@Byron
Copy link
Member

Byron commented Oct 14, 2023

Thanks so much, I forgot that it's possible to interrupt and then shut-down the application normally, showing the trace.

We see that it hangs in round two, which probably means it blocks while sending or… it blocks while receiving a reply maybe because the sending didn't get flushed so that would be a local problem. Since I pretty much trust negotiation by now I'd think it might be something silly like a flush that wasn't performed.

Byron added a commit that referenced this issue Oct 15, 2023
Otherwise it might be that the write-end still isn't flushed, so
the receiver didn't get the message it's waiting on, which wouild
cause us to deadlock while waiting for a response from the remote.
@Byron
Copy link
Member

Byron commented Oct 15, 2023

I think I have got something!

The gist is that I found a spot that would only work correctly in a stateless setting, i.e. with HTTP, as this would flush automatically I presume. In stateful connections, this wouldn't happen which leaves data-to-be-sent in the pipe without ever being seen by the remote, which then waits forever. Then we turn around and try to read the response to a message never sent, and wait forever as well.

The fix should alleviate that, so I recommend trying a custom-built binary from when #1067 was merged and see if this fixes the issue.

If you have any questions, please let me know - I'd love for this to be the fix.

Byron added a commit that referenced this issue Oct 15, 2023
Otherwise it might be that the write-end still isn't flushed, so
the receiver didn't get the message it's waiting on, which wouild
cause us to deadlock while waiting for a response from the remote.
Byron added a commit that referenced this issue Oct 15, 2023
Otherwise it might be that the write-end still isn't flushed, so
the receiver didn't get the message it's waiting on, which wouild
cause us to deadlock while waiting for a response from the remote.
@jalil-salame
Copy link
Contributor Author

jalil-salame commented Oct 16, 2023

Seems like it still happens (the bad repo still gets stuck in negotiation, trace looks the same).

But I am not sure I got the version from main. I'm on NixOS so cargo install doesn't work (problems with cmake), so I had to do a hack to get it installed. It should be the latest commit, but first time I do this.

GDB shows this is the stack frame it is stuck on:

#0  0x00007ffff7d1574c in read () from /nix/store/ld03l52xq2ssn4x0g5asypsxqls40497-glibc-2.37-8/lib/libc.so.6
#1  0x00005555561894e3 in <std::process::ChildStdout as std::io::Read>::read::h7716f9d9510e017e ()
#2  0x0000555555ebab45 in gix_transport::client::blocking_io::file::supervise_stderr::<impl std::io::Read for gix_transport::client::blocking_io::file::ReadStdoutFailOnError>::read::h6bb614d77c16cfcc ()
#3  0x0000555555ec3a3b in std::io::default_read_exact::h765482b456176831 ()
#4  0x0000555555eb637a in gix_packetline::read::blocking_io::<impl gix_packetline::StreamingPeekableIter<T>>::read_line_inner_exhaustive::hda32d152ad4b03a9 ()
#5  0x0000555555eb69db in gix_packetline::read::blocking_io::<impl gix_packetline::StreamingPeekableIter<T>>::peek_line::h751a7e10b8e7fb51 ()
#6  0x0000555555eccec5 in <gix_packetline::read::sidebands::blocking_io::WithSidebands<T,alloc::boxed::Box<dyn core::ops::function::FnMut<(bool,&[u8])>+Output = gix_packetline::read::ProgressAction>> as gix_transport::client::blocking_io::bufread_ext::ExtendedBufRead>::peek_data_line::h2d180765cb59cd52 ()
#7  0x0000555555752811 in gix_protocol::fetch::response::blocking_io::<impl gix_protocol::fetch::response::Response>::from_line_reader::hf0e62b51c6345a69 ()
#8  0x0000555555865293 in gix::remote::connection::fetch::receive_pack::<impl gix::remote::connection::fetch::Prepare<T>>::receive::h3ba96dacc5aacb3e ()
#9  0x00005555558c1c7d in gitoxide_core::repository::fetch::function::fetch::he4f05a27c760b3c9 ()
#10 0x00005555557079a9 in gix_trace::<impl gix_trace::enabled::Span>::into_scope::h454454c93e6ddecd ()
#11 0x00005555558ed14a in gitoxide::shared::pretty::prepare_and_run::h86c86f422ce1ff38 ()
#12 0x0000555555715122 in gix::plumbing::main::main::h35af222d41ff9360 ()
#13 0x000055555595dda3 in std::sys_common::backtrace::__rust_begin_short_backtrace::h9b4f271923e9f1ae ()
#14 0x000055555575c7cd in std::rt::lang_start::{{closure}}::h07ae7a6dd00ee5f6 ()
#15 0x0000555556187bb7 in std::rt::lang_start_internal::hbcbc84fca12d560d ()
#16 0x00005555557aa9a5 in main ()

I don't know how to enable debug symbols in NixOS. I'll look into that later.

@Byron
Copy link
Member

Byron commented Oct 17, 2023

Thanks for having taken a look! Could you verify that you see protocol_version in the trace? If not, it's not the latest version, it should be right with negotiate.

It's good to see that it tries to read from the server, which probably equally tries to read from the client because it didn't flush its output pipe yet.

PS: You should be able to install with cargo install --git https://github.com/Byron/gitoxide --bin gix --features max-pure --locked - it doesn't need cmake.

@jalil-salame
Copy link
Contributor Author

The right command is cargo install --git https://github.com/Byron/gitoxide --bin gix --features max-pure --locked gitoxide, but it keeps trying to compile lib-ng-sys and failing because of NixOS's cmake.

I'll whip up a quick flake to fix this and get debug symbols.

@Byron
Copy link
Member

Byron commented Oct 17, 2023

Ah, right! Then the real right command is :): cargo install --git https://github.com/Byron/gitoxide --bin gix --features max-pure --locked --no-default-features gitoxide.

That will definitely not need additional toolchains and is pure Rust.

@jalil-salame
Copy link
Contributor Author

That works, but still gets stuck in negotiation. This is the gdb stack trace (now with debug info!):

#0  0x00007ffff7dd174c in read () from /nix/store/ld03l52xq2ssn4x0g5asypsxqls40497-glibc-2.37-8/lib/libc.so.6
#1  0x00005555575cb0f3 in std::sys::unix::fd::FileDesc::read () at library/std/src/sys/unix/fd.rs:90
#2  std::sys::unix::pipe::AnonPipe::read () at library/std/src/sys/unix/pipe.rs:49
#3  std::process::{impl#16}::read () at library/std/src/process.rs:428
#4  0x00005555565988c2 in gix_transport::client::blocking_io::file::supervise_stderr::{impl#1}::read (self=0x555557ee53c0, buf=...) at gix-transport/src/client/blocking_io/file.rs:163
#5  0x00005555565a5563 in std::io::default_read_exact<gix_transport::client::blocking_io::file::ReadStdoutFailOnError> (this=0x555557ee53c0, buf=...)
    at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/io/mod.rs:467
#6  0x00005555565c5578 in std::io::Read::read_exact<gix_transport::client::blocking_io::file::ReadStdoutFailOnError> (self=0x555557ee53c0, buf=...)
    at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/io/mod.rs:825
#7  0x0000555556595f8f in std::io::impls::{impl#4}::read_exact<(dyn std::io::Read + core::marker::Send)> (self=0x555557f23b40, buf=...)
    at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/io/impls.rs:155
#8  0x00005555565b1462 in gix_packetline::StreamingPeekableIter<alloc::boxed::Box<(dyn std::io::Read + core::marker::Send), alloc::alloc::Global>>::read_line_inner<alloc::boxed::Box<(dyn std::io::Read + core::marker::Send), alloc::alloc::Global>> (reader=0x555557f23b40, buf=...) at gix-packetline/src/read/blocking_io.rs:18
#9  0x00005555565b18b1 in gix_packetline::StreamingPeekableIter<alloc::boxed::Box<(dyn std::io::Read + core::marker::Send), alloc::alloc::Global>>::read_line_inner_exhaustive<alloc::boxed::Box<(dyn std::io::Read + core::marker::Send), alloc::alloc::Global>> (reader=0x555557f23b40, buf=0x555557f23b50, delimiters=..., fail_on_err_lines=true, buf_resize=true)
    at gix-packetline/src/read/blocking_io.rs:45
#10 0x00005555565b2075 in gix_packetline::StreamingPeekableIter<alloc::boxed::Box<(dyn std::io::Read + core::marker::Send), alloc::alloc::Global>>::peek_line<alloc::boxed::Box<(dyn std::io::Read + core::marker::Send), alloc::alloc::Global>> (self=0x555557f23b28) at gix-packetline/src/read/blocking_io.rs:124
#11 0x00005555565b8f57 in gix_packetline::read::sidebands::blocking_io::WithSidebands<alloc::boxed::Box<(dyn std::io::Read + core::marker::Send), alloc::alloc::Global>, alloc::boxed::Box<dyn core::ops::function::FnMut<(bool, &[u8]), Output=gix_packetline::read::ProgressAction>, alloc::alloc::Global>>::peek_data_line<alloc::boxed::Box<(dyn std::io::Read + core::marker::Send), alloc::alloc::Global>, alloc::boxed::Box<dyn core::ops::function::FnMut<(bool, &[u8]), Output=gix_packetline::read::ProgressAction>, alloc::alloc::Global>> (self=0x555557ee6150)
    at gix-packetline/src/read/sidebands/blocking_io.rs:92
#12 0x00005555565ba633 in gix_transport::client::blocking_io::bufread_ext::{impl#4}::peek_data_line<alloc::boxed::Box<(dyn std::io::Read + core::marker::Send), alloc::alloc::Global>> (
    self=0x555557ee6150) at gix-transport/src/client/blocking_io/bufread_ext.rs:107
#13 0x00005555562ad454 in gix_transport::client::blocking_io::bufread_ext::{impl#1}::peek_data_line<(dyn gix_transport::client::blocking_io::bufread_ext::ExtendedBufRead + core::marker::Unpin)> (self=0x7fffffff2130) at gix-transport/src/client/blocking_io/bufread_ext.rs:70
#14 0x000055555622a61e in gix_protocol::fetch::response::Response::from_line_reader<alloc::boxed::Box<(dyn gix_transport::client::blocking_io::bufread_ext::ExtendedBufRead + core::marker::Unpin), alloc::alloc::Global>> (version=gix_transport::Protocol::V1, reader=0x7fffffff2130, client_expects_pack=false) at gix-protocol/src/fetch/response/blocking_io.rs:53
#15 0x00005555561b5145 in gix::remote::connection::fetch::Prepare<alloc::boxed::Box<(dyn gix_transport::client::blocking_io::traits::Transport + core::marker::Send), alloc::alloc::Global>>::receive_inner<alloc::boxed::Box<(dyn gix_transport::client::blocking_io::traits::Transport + core::marker::Send), alloc::alloc::Global>> (self=..., progress=...,
    should_interrupt=0x555557edef98 <gix::interrupt::IS_INTERRUPTED::h91f14977242c8b37>) at gix/src/remote/connection/fetch/receive_pack.rs:221
#16 0x00005555558277ad in gix::remote::connection::fetch::Prepare<alloc::boxed::Box<(dyn gix_transport::client::blocking_io::traits::Transport + core::marker::Send), alloc::alloc::Global>>::receive<alloc::boxed::Box<(dyn gix_transport::client::blocking_io::traits::Transport + core::marker::Send), alloc::alloc::Global>, &mut prodash::progress::utils::DoOrDiscard<prodash::tree::Item>> (self=..., progress=0x7fffffff7db0, should_interrupt=0x555557edef98 <gix::interrupt::IS_INTERRUPTED::h91f14977242c8b37>) at gix/src/remote/connection/fetch/receive_pack.rs:81
#17 0x00005555559be203 in gitoxide_core::repository::fetch::function::fetch<prodash::progress::utils::DoOrDiscard<prodash::tree::Item>, &mut dyn std::io::Write, &mut dyn std::io::Write> (
    repo=..., progress=..., out=..., err=...) at gitoxide-core/src/repository/fetch.rs:64
#18 0x0000555555a620d7 in gix::plumbing::main::main::{closure#11} (progress=..., out=..., err=...) at src/plumbing/main.rs:357
#19 0x0000555555bd359e in gitoxide::shared::pretty::prepare_and_run::{closure#0}<(), core::ops::range::RangeInclusive<u8>, gix::plumbing::main::main::{closure_env#11}> ()
    at src/shared.rs:170
#20 0x000055555590e739 in gix_trace::enabled::Span::into_scope<core::result::Result<(), anyhow::Error>, gitoxide::shared::pretty::prepare_and_run::{closure_env#0}<(), core::ops::range::RangeInclusive<u8>, gix::plumbing::main::main::{closure_env#11}>> (self=<error reading variable: Cannot access memory at address 0xfffffffffffffe00>,
    f=<error reading variable: Cannot access memory at address 0x75>) at gix-trace/src/lib.rs:43
#21 0x0000555555bbd3c2 in gitoxide::shared::pretty::prepare_and_run<(), core::ops::range::RangeInclusive<u8>, gix::plumbing::main::main::{closure_env#11}> (name=..., trace=false,
    verbose=true, progress=false, progress_keep_open=false, range=..., run=...) at src/shared.rs:169
#22 0x0000555555c7b2b8 in gix::plumbing::main::main () at src/plumbing/main.rs:349
#23 0x0000555555cc82a6 in gix::main () at src/gix.rs:9

@Byron
Copy link
Member

Byron commented Oct 17, 2023

I thinks that's it! gix_transport::client::blocking_io::file::supervise_stderr:: tells me that it's not the negotiation that hangs, but the 'supervisor' thread that blocks forever. Maybe that's by design though and it's just the stack-trace that doesn't make the different threads clear. Maybe the main thread is also hanging while reading.

@jalil-salame
Copy link
Contributor Author

I have no idea how to get gdb to debug threads, I'll see if I can figure it out later today.

@Byron
Copy link
Member

Byron commented Oct 18, 2023

I don't think this will be necessary, as I had a look at the code and it's made so that it most definitely can't deadlock.

Can you validate that this is the latest version by checking that the negotiate trace line shows the protocol version? If so, what is the protocol version?

In any case, you could try to set the protocol version to something else, as with gix -c protocol.version=1|2 fetch and see if that makes a difference.

Thanks for all your help.

@jalil-salame
Copy link
Contributor Author

Something is really weird with the protocol version; the trace says transport is trying to set version 2, but negotiate uses version 1:

$ gix --trace fetch
^C 09:32:15 tracing INFO     run [ 7.26s | 20.98% / 100.00% ]                                                                                                                                    racing
 09:32:15 tracing INFO     ┝━ ThreadSafeRepository::discover() [ 16.8ms | 0.01% / 0.23% ]
 09:32:15 tracing INFO     │  ┕━ open_from_paths() [ 15.8ms | 0.12% / 0.22% ]
 09:32:15 tracing INFO     │     ┝━ gix_path::git::install_config_path() [ 6.40ms | 0.09% ]
 09:32:15 tracing INFO     │     ┕━ gix_odb::Store::at() [ 510µs | 0.01% ]
 09:32:15 tracing DEBUG    ┝━ 🐛 [debug]: gix_transport::SpawnProcessOnDemand | command: GIT_PROTOCOL="version=2" LANG="C" LC_ALL="C" "ssh" "-o" "SendEnv=GIT_PROTOCOL" "gitea@**censored**" "git-upload-pack" "\'**censored**.git\'"
 09:32:15 tracing INFO     ┕━ fetch::Prepare::receive() [ 5.72s | 0.01% / 78.79% ]
 09:32:15 tracing DEBUG       ┕━ negotiate [ 5.72s | 0.02% / 78.78% ] protocol_version: 1
 09:32:15 tracing DEBUG          ┝━ mark_complete_and_common_ref [ 5.81ms | 0.05% / 0.08% ] mappings: 1
 09:32:15 tracing INFO           │  ┝━ mark_all_refs [ 2.01ms | 0.03% ]
 09:32:15 tracing DEBUG          │  ┝━ mark_alternate_refs [ 11.9µs | 0.00% ] num_odb: 0
 09:32:15 tracing INFO           │  ┝━ mark known_common [ 21.5µs | 0.00% ]
 09:32:15 tracing DEBUG          │  ┕━ mark tips [ 21.1µs | 0.00% ] num_tips: 1
 09:32:15 tracing DEBUG          ┝━ negotiate round [ 5.71s | 78.67% ] round: 1
 09:32:15 tracing DEBUG          ┕━ negotiate round [ 618µs | 0.01% ] round: 2
Error: An IO error occurred when talking to the server

Caused by:
    Broken pipe (os error 32)

Same when you force protocol v2:

gix -c protocol.version=2 --trace fetch (click to show output)
$ gix -c protocol.version=2 --trace fetch
^C 09:33:19 tracing INFO     run [ 15.3s | 8.78% / 100.00% ]                                                                                                                                     racing
 09:33:19 tracing INFO     ┝━ ThreadSafeRepository::discover() [ 9.95ms | 0.01% / 0.06% ]
 09:33:19 tracing INFO     │  ┕━ open_from_paths() [ 8.81ms | 0.01% / 0.06% ]
 09:33:19 tracing INFO     │     ┝━ gix_path::git::install_config_path() [ 6.80ms | 0.04% ]
 09:33:19 tracing INFO     │     ┕━ gix_odb::Store::at() [ 199µs | 0.00% ]
 09:33:19 tracing DEBUG    ┝━ 🐛 [debug]: gix_transport::SpawnProcessOnDemand | command: GIT_PROTOCOL="version=2" LANG="C" LC_ALL="C" "ssh" "-o" "SendEnv=GIT_PROTOCOL" "gitea@**censored**" "git-upload-pack" "\'**censored**.git\'"
 09:33:19 tracing INFO     ┕━ fetch::Prepare::receive() [ 14.0s | 0.01% / 91.15% ]
 09:33:19 tracing DEBUG       ┕━ negotiate [ 14.0s | 0.01% / 91.15% ] protocol_version: 1
 09:33:19 tracing DEBUG          ┝━ mark_complete_and_common_ref [ 2.48ms | 0.01% / 0.02% ] mappings: 1
 09:33:19 tracing INFO           │  ┝━ mark_all_refs [ 983µs | 0.01% ]
 09:33:19 tracing DEBUG          │  ┝━ mark_alternate_refs [ 4.36µs | 0.00% ] num_odb: 0
 09:33:19 tracing INFO           │  ┝━ mark known_common [ 8.74µs | 0.00% ]
 09:33:19 tracing DEBUG          │  ┕━ mark tips [ 8.10µs | 0.00% ] num_tips: 1
 09:33:19 tracing DEBUG          ┝━ negotiate round [ 14.0s | 91.12% ] round: 1
 09:33:19 tracing DEBUG          ┕━ negotiate round [ 549µs | 0.00% ] round: 2
Error: An IO error occurred when talking to the server

Caused by:
    Broken pipe (os error 32)

Forcing vesion 1 looks normal (still stuck though):

gix -c protocol.version=1 --trace fetch (click to show output)
$ gix -c protocol.version=1 --trace fetch
^C 09:36:28 tracing INFO     run [ 5.82s | 23.78% / 100.00% ]                                                                                                                                    racing
 09:36:28 tracing INFO     ┝━ ThreadSafeRepository::discover() [ 3.38ms | 0.00% / 0.06% ]
 09:36:28 tracing INFO     │  ┕━ open_from_paths() [ 3.18ms | 0.02% / 0.05% ]
 09:36:28 tracing INFO     │     ┝━ gix_path::git::install_config_path() [ 2.03ms | 0.03% ]
 09:36:28 tracing INFO     │     ┕━ gix_odb::Store::at() [ 155µs | 0.00% ]
 09:36:28 tracing DEBUG    ┝━ 🐛 [debug]: gix_transport::SpawnProcessOnDemand | command: LANG="C" LC_ALL="C" "ssh" "gitea@**censored**" "git-upload-pack" "\'**censored**.git\'"
 09:36:28 tracing INFO     ┕━ fetch::Prepare::receive() [ 4.44s | 0.01% / 76.17% ]
 09:36:28 tracing DEBUG       ┕━ negotiate [ 4.44s | 0.02% / 76.15% ] protocol_version: 1
 09:36:28 tracing DEBUG          ┝━ mark_complete_and_common_ref [ 1.97ms | 0.02% / 0.03% ] mappings: 1
 09:36:28 tracing INFO           │  ┝━ mark_all_refs [ 686µs | 0.01% ]
 09:36:28 tracing DEBUG          │  ┝━ mark_alternate_refs [ 4.81µs | 0.00% ] num_odb: 0
 09:36:28 tracing INFO           │  ┝━ mark known_common [ 8.97µs | 0.00% ]
 09:36:28 tracing DEBUG          │  ┕━ mark tips [ 8.36µs | 0.00% ] num_tips: 1
 09:36:28 tracing DEBUG          ┝━ negotiate round [ 4.43s | 76.09% ] round: 1
 09:36:28 tracing DEBUG          ┕━ negotiate round [ 727µs | 0.01% ] round: 2
Error: An IO error occurred when talking to the server

Caused by:
    Broken pipe (os error 32)

@jalil-salame
Copy link
Contributor Author

A bit more context on when it happens (I'm not 100% sure this is the pattern because it happens so infrequently).

I have two computers where I have the same git repos (the ones that get stuck from my selfhosted gitea instance). I have a preference for one PC so I leave the other alone for a while. When I return to the other computer the repos sometimes get stuck.

So how I think the issue could be reproduced is:

  1. Get a gitea repo with an ssh origin.
  2. Clone it on two different locations (maybe two computers?).
  3. Make a few commits on one location (maybe over a certain period of time).
  4. Try to fetch the updated repo on the out of date location.

I'll see if I can reproduce this like that.

@Byron
Copy link
Member

Byron commented Oct 18, 2023

The protocol-discrepancy is due to the server being able to downgrade the client. Setting the protocol version is merely a request, and it can be ignored. V2 is also the default and gitea seems to support V1 only.

The issue also seems to happen if multiple negotiation rounds are needed which is strange.

Which reminds me, please do try different settings for the fetch.negotiationAlgorithm, i.e. gix -c fetch.negotiationAlgorithm=consecutive|skipping|noop. consecutive is the default. Maybe that makes a difference, but I'd expect that in one of them it won't get to the second round which then makes it work.

I start to think it's something about the protocol that is wrong or unexpected that causes the probably custom gitea server side to not send a response.

I think the ultimate answer will be if you could use a way to make the packetlines sent over the wire visible. git can do that with GIT_TRACE_PACKET, but gix cannot yet. Probably the way to go here is to implement a debug-feature that will show everything that it sends or receives on stderr. As it's plain text, the content would be redactable while still revealing enough of what's happening to learn what the difference is between gix and git.

Independently of that, if you could provide a public repo in a state that reproduces it on my side, that would definitely help as well.

@jalil-salame
Copy link
Contributor Author

Both skipping and noop work (version 0.30.0 even).

Also, when setting an option with -c it would be nice to error if the config is wrong (I set skip instead of skipping and didn't notice). Should I post an issue about that?

@jalil-salame
Copy link
Contributor Author

Noop

$ gix --trace -c fetch.negotiationAlgorithm=noop fetch
10:26:20 indexing done 1.2k objects in 0.09s (12.7k objects/s)
 10:26:20 decompressing done 203.5KB in 0.09s (2.2MB/s)
 10:26:20     Resolving done 1.2k objects in 0.05s (23.4k objects/s)
 10:26:20      Decoding done 333.0KB in 0.05s (6.6MB/s)
 10:26:20 writing index file done 34.2KB in 0.00s (63.8MB/s)
 10:26:20  create index file done 1.2k objects in 0.15s (8.1k objects/s)
 10:26:20          read pack done 179.2KB in 0.18s (1003.3KB/s)
 10:26:20            tracing INFO     run [ 1.64s | 85.64% / 100.00% ]
 10:26:20            tracing INFO     ┝━ ThreadSafeRepository::discover() [ 3.12ms | 0.01% / 0.19% ]
 10:26:20            tracing INFO     │  ┕━ open_from_paths() [ 3.00ms | 0.04% / 0.18% ]
 10:26:20            tracing INFO     │     ┝━ gix_path::git::install_config_path() [ 2.25ms | 0.14% ]
 10:26:20            tracing INFO     │     ┕━ gix_odb::Store::at() [ 68.6µs | 0.00% ]
 10:26:20            tracing DEBUG    ┝━ 🐛 [debug]: gix_transport::SpawnProcessOnDemand | command: GIT_PROTOCOL="version=2" LANG="C" LC_ALL="C" "ssh" "-o" "SendEnv=GIT_PROTOCOL" "gitea@**censored**" "git-upload-pack" "\'**censored**.git\'"
 10:26:20            tracing INFO     ┕━ fetch::Prepare::receive() [ 232ms | 0.02% / 14.17% ]
 10:26:20            tracing INFO        ┝━ negotiate [ 50.0ms | 0.03% / 3.05% ]
 10:26:20            tracing DEBUG       │  ┝━ mark_complete_and_common_ref [ 2.00ms | 0.05% / 0.12% ] mappings: 1
 10:26:20            tracing INFO        │  │  ┝━ mark_all_refs [ 1.25ms | 0.08% ]
 10:26:20            tracing DEBUG       │  │  ┝━ mark_alternate_refs [ 1.81µs | 0.00% ] num_odb: 0
 10:26:20            tracing INFO        │  │  ┝━ mark known_common [ 2.53µs | 0.00% ]
 10:26:20            tracing DEBUG       │  │  ┕━ mark tips [ 1.59µs | 0.00% ] num_tips: 1
 10:26:20            tracing DEBUG       │  ┕━ negotiate round [ 47.6ms | 2.90% ] round: 1
 10:26:20            tracing INFO        ┝━ gix_pack::Bundle::write_to_directory() [ 179ms | 10.94% ]
 10:26:20            tracing DEBUG       ┕━ update_refs() [ 2.60ms | 0.05% / 0.16% ] mappings: 1
 10:26:20            tracing DEBUG          ┕━ apply [ 1.71ms | 0.10% ] edits: 1
+refs/heads/*:refs/remotes/origin/*
        0393a83903a10ac8c61112af371140bbc4e0dd75 refs/heads/main -> refs/remotes/origin/main [fast-forward]
pack  file: "./.git/objects/pack/pack-cc62afd989d58966bbd3b54ff6ba124b1b89f2a9.pack"
index file: "./.git/objects/pack/pack-cc62afd989d58966bbd3b54ff6ba124b1b89f2a9.idx"
server sent 2 tips, 1 were filtered due to 1 refspec(s).

Skipping

$ gix --trace -c fetch.negotiationAlgorithm=skipping fetch
 10:28:14 indexing done 6.0 objects in 0.00s (15.6k objects/s)
 10:28:14 decompressing done 1.8KB in 0.00s (4.6MB/s)
 10:28:14     Resolving done 6.0 objects in 0.05s (117.0 objects/s)
 10:28:14      Decoding done 1.8KB in 0.05s (36.0KB/s)
 10:28:14 writing index file done 1.2KB in 0.00s (25.2MB/s)
 10:28:14  create index file done 6.0 objects in 0.05s (116.0 objects/s)
 10:28:14          read pack done 1.6KB in 0.05s (31.1KB/s)
 10:28:14            tracing INFO     run [ 1.10s | 88.69% / 100.00% ]
 10:28:14            tracing INFO     ┝━ ThreadSafeRepository::discover() [ 4.41ms | 0.03% / 0.40% ]
 10:28:14            tracing INFO     │  ┕━ open_from_paths() [ 4.09ms | 0.10% / 0.37% ]
 10:28:14            tracing INFO     │     ┝━ gix_path::git::install_config_path() [ 2.91ms | 0.26% ]
 10:28:14            tracing INFO     │     ┕━ gix_odb::Store::at() [ 104µs | 0.01% ]
 10:28:14            tracing DEBUG    ┝━ 🐛 [debug]: gix_transport::SpawnProcessOnDemand | command: GIT_PROTOCOL="version=2" LANG="C" LC_ALL="C" "ssh" "-o" "SendEnv=GIT_PROTOCOL" "gitea@**censored**" "git-upload-pack" "\'**censored**.git\'"
 10:28:14            tracing INFO     ┕━ fetch::Prepare::receive() [ 120ms | 0.02% / 10.91% ]
 10:28:14            tracing INFO        ┝━ negotiate [ 67.2ms | 0.04% / 6.08% ]
 10:28:14            tracing DEBUG       │  ┝━ mark_complete_and_common_ref [ 1.58ms | 0.06% / 0.14% ] mappings: 1
 10:28:14            tracing INFO        │  │  ┝━ mark_all_refs [ 881µs | 0.08% ]
 10:28:14            tracing DEBUG       │  │  ┝━ mark_alternate_refs [ 1.09µs | 0.00% ] num_odb: 0
 10:28:14            tracing INFO        │  │  ┝━ mark known_common [ 2.68µs | 0.00% ]
 10:28:14            tracing DEBUG       │  │  ┕━ mark tips [ 25.1µs | 0.00% ] num_tips: 1
 10:28:14            tracing DEBUG       │  ┕━ negotiate round [ 65.2ms | 5.90% ] round: 1
 10:28:14            tracing INFO        ┝━ gix_pack::Bundle::write_to_directory() [ 52.4ms | 4.74% ]
 10:28:14            tracing DEBUG       ┕━ update_refs() [ 725µs | 0.03% / 0.07% ] mappings: 1
 10:28:14            tracing DEBUG          ┕━ apply [ 438µs | 0.04% ] edits: 1
+refs/heads/*:refs/remotes/origin/*
        0393a83903a10ac8c61112af371140bbc4e0dd75 refs/heads/main -> refs/remotes/origin/main [fast-forward]
pack  file: "./.git/objects/pack/pack-8bf88c208d478c44cfea799314371755fee90a9e.pack"
index file: "./.git/objects/pack/pack-8bf88c208d478c44cfea799314371755fee90a9e.idx"
server sent 2 tips, 1 were filtered due to 1 refspec(s).

@Byron
Copy link
Member

Byron commented Oct 18, 2023

How many negotiation rounds did you get with skipping? Just one, is my prediction. (Concurrent editing)

Also, when setting an option with -c it would be nice to error if the config is wrong (I set skip instead of skipping and didn't notice). Should I post an issue about that?

In theory, that's a feature, and it's intentionally lenient there. This makes it easy to change and maybe it should change. If you want to change it to strict mode please be my guest.

I am looking into adding tracing support similar to GIT_TRACE_PACKET now.

@jalil-salame
Copy link
Contributor Author

The repo is behind by just one commit, but I cannot reproduce it in Gitea's public git repo (haven't tried for long though).

The date of the last two commits:

Date: Thu Aug 24 15:43:58 2023 +0000 (origin has this one)
Date: Tue Aug 15 17:20:20 2023 +0000 (repo is stuck here)

@jalil-salame
Copy link
Contributor Author

Also, when setting an option with -c it would be nice to error if the config is wrong (I set skip instead of skipping and didn't notice). Should I post an issue about that?

In theory, that's a feature, and it's intentionally lenient there. This makes it easy to change and maybe it should change. If you want to change it to strict mode please be my guest.

My preference is .gitconfig should be lenient -c should be strict. I would rather my scripts/cli commands error quickly than silently do the wrong thing. Is there a config option for that?

@Byron
Copy link
Member

Byron commented Oct 18, 2023

Also, another question, which version of git is running on the gitea server? I'd hope it's old and can be updated.

@Byron
Copy link
Member

Byron commented Oct 18, 2023

From the code that I see, also on the git side, I don't see why it would hang. When decoding the server response it's so incredibly careful that as long as the server sends a pack, it will find it (it's made so that it consumes all the acks that the server sends but that aren't really required to get to the pack). But here it seems the server doesn't do as it's supposed to do when seeing the done of the client.

If the version check doesn't lead anywhere, i.e. it seems new enough, then we'd have to host the server-side locally with git daemon to get additional insights.

@jalil-salame
Copy link
Contributor Author

Also, another question, which version of git is running on the gitea server? I'd hope it's old and can be updated.

It is fairly recent (2.40.1 might also be 2.38.4? but I think that is just a backup) and I have no idea how to update it (although I can probably figure something up).

@Byron
Copy link
Member

Byron commented Oct 19, 2023

Thanks, I'd think that's recent enough, no need to investigate further.

What I'd like to learn is why the protocol V2 request sent by the client isn't respected by the server - my guess is that SSH is locked down and won't allow setting any environment variables. Could that be? Is this something you can validate?

Of course, the avenue above aims at side-stepping the actual problem, which might not be what you want in the first place.
I'd also love this hang to go away forever, but from all I can see is that it should work if the server would respond at all to the "done" that gitoxide sends before it reads.

It would be really helpful to see what git on the other end receives - even though I double-checked that gitoxide will flush all writes before reading, maybe that's not actually working so the "done" never reaches the server. This could be reproduced by hosting the server state locally with git daemon.

Going one step further down the rabbithole I saw that ChildStdin::flush() isn't actually doing anything - so the standard library seems to rightfully assume that writes are flushed immediately. Could it be that this sometimes isn't happening automatically?

@jalil-salame
Copy link
Contributor Author

Thanks, I'd think that's recent enough, no need to investigate further.

What I'd like to learn is why the protocol V2 request sent by the client isn't respected by the server - my guess is that SSH is locked down and won't allow setting any environment variables. Could that be? Is this something you can validate?

This is the issue with the protocol version, just verified it:

$ export GIT_PROTOCOL=v2
$ echo $GIT_PROTOCOL
v2
$ ssh -o 'SendEnv GIT_PROTOCOL' **REDACTED**
Last login: **REDACTED*
$ echo $GIT_PROTOCOL

@jalil-salame
Copy link
Contributor Author

Going one step further down the rabbithole I saw that ChildStdin::flush() isn't actually doing anything - so the standard library seems to rightfully assume that writes are flushed immediately. Could it be that this sometimes isn't happening automatically?

Some fds will flush immediately. I don't know if somewhere someone is assuming that is the case when it isn't so...

@jalil-salame
Copy link
Contributor Author

jalil-salame commented Oct 19, 2023

Trying to reproduce this in gitea.com through forcing protocol version 1. There might be a minimum number of commits required to cause the issue as it succeeds with 3 commits (2 in test repo).

Tested up to 19 commits, no luck reproducing the issue.

@Byron
Copy link
Member

Byron commented Oct 20, 2023

Thanks, I'd think that's recent enough, no need to investigate further.
What I'd like to learn is why the protocol V2 request sent by the client isn't respected by the server - my guess is that SSH is locked down and won't allow setting any environment variables. Could that be? Is this something you can validate?

This is the issue with the protocol version, just verified it:

$ export GIT_PROTOCOL=v2
$ echo $GIT_PROTOCOL
v2
$ ssh -o 'SendEnv GIT_PROTOCOL' **REDACTED**
Last login: **REDACTED*
$ echo $GIT_PROTOCOL

I tested this locally and figured that AcceptEnv GIT_PROTOCOL also needs to be added to /etc/sshd_config to be going through. It's easily verifiable locally as well. This gist gave the clue I needed.

With this configuration change, you'd be able to side-step the issue.

However, ideally there would be extended debug info of Protocol V1 to learn what the problem actually is. For that, the local clone that shows the hanging issue would get a remote like file://path/to/local/server-mirror-of-repo-that-would-be-on-gitea. As gix will start git under the hood, one can influence it with GIT_TRACE_PACKET=1 - now both gix as well as git will output their packet lines. That way it would be possible to see if the done line is actually ending up on the server side. If that works, we know that it's something about ssh and that could be validated by adding the same repository through ssh like ssh://localhost/path/to/… and putting git-upload-pack as script in the PATH which gets to set GIT_TRACE_PACKET=1 before actually launching git-upload-pack $@.

Trying to reproduce this in gitea.com through forcing protocol version 1. There might be a minimum number of commits required to cause the issue as it succeeds with 3 commits (2 in test repo).

Tested up to 19 commits, no luck reproducing the issue.

You could turn on packetline tracing to see what the difference is. I presume that in these cases, it only has one round of negotiation and sends the done right away, which then turns out to work well.

I also have tests with multi-round negotiation for both stateless and stateful connections, for V1 and V2 protocols, and it all works fine (by now, after fixing many hanging issues in the first place). This is what makes this issue here so puzzling - the receiving code is written in such a way that it basically skips over all unnecessary chatter right to the pack, and the pack is expected after sending done no matter what. Why is it not being sent? The best answer I have is that done never makes it to the server, and in the paragraph above I laid out how that could be tested locally, at least in theory.

Thanks again for your tremendous help - I'd love to squelch this bug and then hopefully V1 hangs will be a thing of the past, for good, for real this time :D .

@jalil-salame
Copy link
Contributor Author

Thanks, I'd think that's recent enough, no need to investigate further.
What I'd like to learn is why the protocol V2 request sent by the client isn't respected by the server - my guess is that SSH is locked down and won't allow setting any environment variables. Could that be? Is this something you can validate?

This is the issue with the protocol version, just verified it:

$ export GIT_PROTOCOL=v2
$ echo $GIT_PROTOCOL
v2
$ ssh -o 'SendEnv GIT_PROTOCOL' **REDACTED**
Last login: **REDACTED*
$ echo $GIT_PROTOCOL

I tested this locally and figured that AcceptEnv GIT_PROTOCOL also needs to be added to /etc/sshd_config to be going through. It's easily verifiable locally as well. This gist gave the clue I needed.

With this configuration change, you'd be able to side-step the issue.

I am tempted to never do this so I can keep catching bugs in v1 c:

I also have tests with multi-round negotiation for both stateless and stateful connections, for V1 and V2 protocols, and it all works fine (by now, after fixing many hanging issues in the first place). This is what makes this issue here so puzzling - the receiving code is written in such a way that it basically skips over all unnecessary chatter right to the pack, and the pack is expected after sending done no matter what. Why is it not being sent? The best answer I have is that done never makes it to the server, and in the paragraph above I laid out how that could be tested locally, at least in theory.

How do you go about requiring two negotiation rounds? I thought it would be after 16 commits, but it doesn't seem to be that, and I am not famliar with the git protocol.

I'll try to figure out how to get the repo from the gitea server to debug locally.

And thanks for your help! This whole process has been so pleasant <3

@Byron
Copy link
Member

Byron commented Oct 20, 2023

How do you go about requiring two negotiation rounds? I thought it would be after 16 commits, but it doesn't seem to be that, and I am not famliar with the git protocol.

You might find it amusing that I also don't have a clear-cut way of achieving this. The test-suite adapts a similar test from git itself, and I only went with it without thinking too much about it.

However, in theory it should be easy to get an even higher number of rounds by adding a remote that has nothing to do with the local repository. Then it should try to find common commits and just give up after sending 256 of them. Maybe… that's even the solution for me to reproduce this… .

Indeed, I could easily get it to do a lot of rounds in V1 one without SSH in the middle:

┕━ fetch::Prepare::receive() [ 1.80s | 0.01% / 98.11% ]
 09:57:08            tracing DEBUG       ┝━ negotiate [ 527ms | 0.17% / 28.75% ] protocol_version: 1
 09:57:08            tracing DEBUG       │  ┝━ mark_complete_and_common_ref [ 34.2ms | 1.75% / 1.87% ] mappings: 1752
 09:57:08            tracing INFO        │  │  ┝━ mark_all_refs [ 2.06ms | 0.11% ]
 09:57:08            tracing DEBUG       │  │  ┝━ mark_alternate_refs [ 208ns | 0.00% ] num_odb: 0
 09:57:08            tracing INFO        │  │  ┝━ mark known_common [ 14.1µs | 0.00% ]
 09:57:08            tracing DEBUG       │  │  ┕━ mark tips [ 32.8µs | 0.00% ] num_tips: 913
 09:57:08            tracing DEBUG       │  ┝━ negotiate round [ 2.22ms | 0.12% ] round: 1
 09:57:08            tracing DEBUG       │  ┝━ negotiate round [ 1.03ms | 0.06% ] round: 2
[..]
 09:57:08            tracing DEBUG       │  ┕━ negotiate round [ 273ms | 14.88% ] round: 68

With SSH, it's the same.

 10:01:18            tracing INFO     ┝━ remote::Connection::ref_map() [ 4.00s | 0.02% / 64.82% ]
 10:01:18            tracing INFO     │  ┕━ remote::Connection::fetch_refs() [ 4.00s | 0.01% / 64.80% ]
 10:01:18            tracing DEBUG    │     ┕━ gix_protocol::handshake() [ 4.00s | 64.79% ] service: UploadPack | extra_parameters: []
 10:01:18            tracing DEBUG    │        ┕━ 🐛 [debug]: gix_transport::SpawnProcessOnDemand | command: LANG="C" LC_ALL="C" "ssh" "byron@localhost" "git-upload-pack" "\'~/dev/github.com/Byron/gitoxide\'"
 10:01:18            tracing INFO     ┕━ fetch::Prepare::receive() [ 2.16s | 0.02% / 34.97% ]
 10:01:18            tracing DEBUG       ┝━ negotiate [ 700ms | 0.07% / 11.33% ] protocol_version: 1
 10:01:18            tracing DEBUG       │  ┝━ mark_complete_and_common_ref [ 57.5ms | 0.68% / 0.93% ] mappings: 1752
 10:01:18            tracing INFO        │  │  ┝━ mark_all_refs [ 15.2ms | 0.25% ]
 10:01:18            tracing DEBUG       │  │  ┝━ mark_alternate_refs [ 125ns | 0.00% ] num_odb: 0
 10:01:18            tracing INFO        │  │  ┝━ mark known_common [ 151µs | 0.00% ]
 10:01:18            tracing DEBUG       │  │  ┕━ mark tips [ 160µs | 0.00% ] num_tips: 913
 10:01:18            tracing DEBUG       │  ┝━ negotiate round [ 14.6ms | 0.24% ] round: 1
 10:01:18            tracing DEBUG       │  ┝━ negotiate round [ 3.53ms | 0.06% ] round: 2
[..]
 10:01:18            tracing DEBUG       │  ┕━ negotiate round [ 391ms | 6.33% ] round: 68

So there really, really seems to be something special about that specific repository state we are seeing :/.

I'll try to figure out how to get the repo from the gitea server to debug locally.

It should be fine just to copy it, scp -r <serverpath> localpath should do the trick. From there you can add a remote like you@localhost:~/server-repo-copy to go through SSH or file:///path/to/server-repo-copy to call git-upload-pack directly. Both should probably be tried - if local SSH reproduces, then local git-upload-pack should as well.

If both don't reproduce, one should validate that the interaction pattern truly is the same (and I'd expect that). Otherwise… I don't know.

And thanks for your help! This whole process has been so pleasant <3

Thank you, that's great to hear, as I have a feeling the debugging won't be finished anytime soon 😅.

@jalil-salame
Copy link
Contributor Author

It should be fine just to copy it, scp -r localpath should do the trick. From there you can add a remote like you@localhost:~/server-repo-copy to go through SSH or file:///path/to/server-repo-copy to call git-upload-pack directly. Both should probably be tried - if local SSH reproduces, then local git-upload-pack should as well.

file:///path/to/server-repo-copy reproduces the issue (I can't test ssh without it being slightly painful 'cause NixOS doesn't have a simple "install root service" command). How do I get debug information from git-upload-pack?

@jalil-salame
Copy link
Contributor Author

log of file:// url using v2 protocol
 11:47:52 indexing done 14.0 objects in 0.00s (9.9k objects/s)
 11:47:52 decompressing done 2.9KB in 0.00s (2.1MB/s)
 11:47:52     Resolving done 14.0 objects in 0.05s (273.0 objects/s)
 11:47:52      Decoding done 4.2KB in 0.05s (81.5KB/s)
 11:47:52 writing index file done 1.5KB in 0.00s (5.8MB/s)
 11:47:52  create index file done 14.0 objects in 0.05s (264.0 objects/s)
 11:47:52          read pack done 1.8KB in 0.06s (28.8KB/s)
 11:47:52            tracing INFO     run [ 75.5ms | 0.91% / 100.00% ]
 11:47:52            tracing INFO     ┝━ ThreadSafeRepository::discover() [ 3.65ms | 0.24% / 4.83% ]
 11:47:52            tracing INFO     │  ┕━ open_from_paths() [ 3.47ms | 1.13% / 4.59% ]
 11:47:52            tracing INFO     │     ┝━ gix_path::git::install_config_path() [ 2.51ms | 3.33% ]
 11:47:52            tracing INFO     │     ┕━ gix_odb::Store::at() [ 98.2µs | 0.13% ]
 11:47:52            tracing INFO     ┝━ remote::Connection::ref_map() [ 2.77ms | 0.02% / 3.67% ]
 11:47:52            tracing INFO     │  ┕━ remote::Connection::fetch_refs() [ 2.76ms | 0.05% / 3.66% ]
 11:47:52            tracing DEBUG    │     ┝━ gix_protocol::handshake() [ 2.24ms | 2.97% ] service: UploadPack | extra_parameters: []
 11:47:52            tracing DEBUG    │     │  ┝━ 🐛 [debug]: gix_transport::SpawnProcessOnDemand | command: GIT_PROTOCOL="version=2" "git-upload-pack" "/home/jalil/**CENSORED**.git"
 11:47:52            tracing TRACE    │     │  ┝━ 📍 [trace]: << version 2
 11:47:52            tracing TRACE    │     │  ┝━ 📍 [trace]: << agent=git/2.42.0
 11:47:52            tracing TRACE    │     │  ┝━ 📍 [trace]: << ls-refs=unborn
 11:47:52            tracing TRACE    │     │  ┝━ 📍 [trace]: << fetch=shallow wait-for-done
 11:47:52            tracing TRACE    │     │  ┝━ 📍 [trace]: << server-option
 11:47:52            tracing TRACE    │     │  ┝━ 📍 [trace]: << object-format=sha1
 11:47:52            tracing TRACE    │     │  ┝━ 📍 [trace]: << object-info
 11:47:52            tracing TRACE    │     │  ┕━ 📍 [trace]: << FLUSH
 11:47:52            tracing DEBUG    │     ┕━ gix_protocol::ls_refs() [ 485µs | 0.64% ] capabilities: Capabilities { data: "agent=git/2.42.0\nls-refs=unborn\nfetch=shallow wait-for-done\nserver-option\nobject-format=sha1\nobject-info", value_sep: 10 }
 11:47:52            tracing TRACE    │        ┝━ 📍 [trace]: >> command=ls-refs
 11:47:52            tracing TRACE    │        ┝━ 📍 [trace]: >> agent=git/oxide-0.55.2
 11:47:52            tracing TRACE    │        ┝━ 📍 [trace]: >> DELIM
 11:47:52            tracing TRACE    │        ┝━ 📍 [trace]: >> symrefs
 11:47:52            tracing TRACE    │        ┝━ 📍 [trace]: >> peel
 11:47:52            tracing TRACE    │        ┝━ 📍 [trace]: >> unborn
 11:47:52            tracing TRACE    │        ┝━ 📍 [trace]: >> ref-prefix refs/heads/
 11:47:52            tracing TRACE    │        ┝━ 📍 [trace]: >> ref-prefix refs/tags/
 11:47:52            tracing TRACE    │        ┝━ 📍 [trace]: >> FLUSH
 11:47:52            tracing TRACE    │        ┝━ 📍 [trace]: << ff30028d5fb155dcd6fcc250cb256fccba528245 refs/heads/main
 11:47:52            tracing TRACE    │        ┕━ 📍 [trace]: << FLUSH
 11:47:52            tracing INFO     ┕━ fetch::Prepare::receive() [ 68.4ms | 0.49% / 90.58% ]
 11:47:52            tracing DEBUG       ┝━ negotiate [ 3.31ms | 0.24% / 4.38% ] protocol_version: 2
 11:47:52            tracing DEBUG       │  ┝━ mark_complete_and_common_ref [ 595µs | 0.30% / 0.79% ] mappings: 1
 11:47:52            tracing INFO        │  │  ┝━ mark_all_refs [ 367µs | 0.49% ]
 11:47:52            tracing DEBUG       │  │  ┝━ mark_alternate_refs [ 445ns | 0.00% ] num_odb: 0
 11:47:52            tracing INFO        │  │  ┝━ mark known_common [ 861ns | 0.00% ]
 11:47:52            tracing DEBUG       │  │  ┕━ mark tips [ 1.50µs | 0.00% ] num_tips: 1
 11:47:52            tracing DEBUG       │  ┕━ negotiate round [ 2.53ms | 3.35% ] round: 1
 11:47:52            tracing TRACE       │     ┝━ 📍 [trace]: >> command=fetch
 11:47:52            tracing TRACE       │     ┝━ 📍 [trace]: >> agent=git/oxide-0.55.2
 11:47:52            tracing TRACE       │     ┝━ 📍 [trace]: >> DELIM
 11:47:52            tracing TRACE       │     ┝━ 📍 [trace]: >> thin-pack
 11:47:52            tracing TRACE       │     ┝━ 📍 [trace]: >> ofs-delta
 11:47:52            tracing TRACE       │     ┝━ 📍 [trace]: >> include-tag
 11:47:52            tracing TRACE       │     ┝━ 📍 [trace]: >> want ff30028d5fb155dcd6fcc250cb256fccba528245
 11:47:52            tracing TRACE       │     ┝━ 📍 [trace]: >> have 713418f2c3ef057538264c6c379d45836a86ca06
 11:47:52            tracing TRACE       │     ┝━ 📍 [trace]: >> have 24ed375e274f6c90b846b5579ae01ab8ac77e631
 11:47:52            tracing TRACE       │     ┝━ 📍 [trace]: >> have 4896909faec56497fdd03f7c00ec51570a6c5d88
 11:47:52            tracing TRACE       │     ┝━ 📍 [trace]: >> have ef17d82d35d901a702d561561cedd6d9ee73af14
 11:47:52            tracing TRACE       │     ┝━ 📍 [trace]: >> have 4da23d2886e815c9b3c524826bf3fdd83226a85e
 11:47:52            tracing TRACE       │     ┝━ 📍 [trace]: >> have d29e402d3e90a3bcc352107a2ec7cc499d799479
 11:47:52            tracing TRACE       │     ┝━ 📍 [trace]: >> have ec4b3825f8fe612a88e00ffdca41fb4b51972bd0
 11:47:52            tracing TRACE       │     ┝━ 📍 [trace]: >> have 82a264f12274ca8e1999b73b5fd850c85118f50a
 11:47:52            tracing TRACE       │     ┝━ 📍 [trace]: >> have 2204a3efc5352b1dbcc6d211aa2a53b23f74ff27
 11:47:52            tracing TRACE       │     ┝━ 📍 [trace]: >> have 9a827eb717c1cde2a5e7f5046d5da869fcbe9be4
 11:47:52            tracing TRACE       │     ┝━ 📍 [trace]: >> have b805cc79f48ff3b1adf0f52b5042b17108e0aaa1
 11:47:52            tracing TRACE       │     ┝━ 📍 [trace]: >> have 795e583ee2931e25b7347616e013099fcaa9789c
 11:47:52            tracing TRACE       │     ┝━ 📍 [trace]: >> have 552a1d3b6d81bb0d184bc93996b1dbdc20b0f2ca
 11:47:52            tracing TRACE       │     ┝━ 📍 [trace]: >> have 4661adc46821622bcecc94188e2e1e4382f0c2cc
 11:47:52            tracing TRACE       │     ┝━ 📍 [trace]: >> have d131fb4dd3611b1ceed828b3c3b61266e928d031
 11:47:52            tracing TRACE       │     ┝━ 📍 [trace]: >> have ab24553467cc323133e9e81eeeabf21f08e2d97e
 11:47:52            tracing TRACE       │     ┝━ 📍 [trace]: >> FLUSH
 11:47:52            tracing TRACE       │     ┝━ 📍 [trace]: << acknowledgments
 11:47:52            tracing TRACE       │     ┝━ 📍 [trace]: << ACK 713418f2c3ef057538264c6c379d45836a86ca06
 11:47:52            tracing TRACE       │     ┝━ 📍 [trace]: << ACK 24ed375e274f6c90b846b5579ae01ab8ac77e631
 11:47:52            tracing TRACE       │     ┝━ 📍 [trace]: << ACK 4896909faec56497fdd03f7c00ec51570a6c5d88
 11:47:52            tracing TRACE       │     ┝━ 📍 [trace]: << ACK ef17d82d35d901a702d561561cedd6d9ee73af14
 11:47:52            tracing TRACE       │     ┝━ 📍 [trace]: << ACK 4da23d2886e815c9b3c524826bf3fdd83226a85e
 11:47:52            tracing TRACE       │     ┝━ 📍 [trace]: << ACK d29e402d3e90a3bcc352107a2ec7cc499d799479
 11:47:52            tracing TRACE       │     ┝━ 📍 [trace]: << ACK ec4b3825f8fe612a88e00ffdca41fb4b51972bd0
 11:47:52            tracing TRACE       │     ┝━ 📍 [trace]: << ACK 82a264f12274ca8e1999b73b5fd850c85118f50a
 11:47:52            tracing TRACE       │     ┝━ 📍 [trace]: << ACK 2204a3efc5352b1dbcc6d211aa2a53b23f74ff27
 11:47:52            tracing TRACE       │     ┝━ 📍 [trace]: << ACK 9a827eb717c1cde2a5e7f5046d5da869fcbe9be4
 11:47:52            tracing TRACE       │     ┝━ 📍 [trace]: << ACK b805cc79f48ff3b1adf0f52b5042b17108e0aaa1
 11:47:52            tracing TRACE       │     ┝━ 📍 [trace]: << ACK 795e583ee2931e25b7347616e013099fcaa9789c
 11:47:52            tracing TRACE       │     ┝━ 📍 [trace]: << ACK 552a1d3b6d81bb0d184bc93996b1dbdc20b0f2ca
 11:47:52            tracing TRACE       │     ┝━ 📍 [trace]: << ACK 4661adc46821622bcecc94188e2e1e4382f0c2cc
 11:47:52            tracing TRACE       │     ┝━ 📍 [trace]: << ACK d131fb4dd3611b1ceed828b3c3b61266e928d031
 11:47:52            tracing TRACE       │     ┝━ 📍 [trace]: << ACK ab24553467cc323133e9e81eeeabf21f08e2d97e
 11:47:52            tracing TRACE       │     ┝━ 📍 [trace]: << ready
 11:47:52            tracing TRACE       │     ┝━ 📍 [trace]: << DELIM
 11:47:52            tracing TRACE       │     ┕━ 📍 [trace]: << packfile
**rest is broken by binary data**
log of file:// url using v1 protocol
 12:05:50 tracing INFO     run [ 137s | 0.00% / 100.00% ]
 12:05:50 tracing INFO     ┝━ ThreadSafeRepository::discover() [ 4.90ms | 0.00% / 0.00% ]
 12:05:50 tracing INFO     │  ┕━ open_from_paths() [ 4.45ms | 0.00% / 0.00% ]
 12:05:50 tracing INFO     │     ┝━ gix_path::git::install_config_path() [ 3.18ms | 0.00% ]
 12:05:50 tracing INFO     │     ┕━ gix_odb::Store::at() [ 136µs | 0.00% ]
 12:05:50 tracing INFO     ┝━ remote::Connection::ref_map() [ 3.67ms | 0.00% / 0.00% ]
 12:05:50 tracing INFO     │  ┕━ remote::Connection::fetch_refs() [ 3.66ms | 0.00% / 0.00% ]
 12:05:50 tracing DEBUG    │     ┕━ gix_protocol::handshake() [ 3.63ms | 0.00% ] service: UploadPack | extra_parameters: []
 12:05:50 tracing DEBUG    │        ┝━ 🐛 [debug]: gix_transport::SpawnProcessOnDemand | command: "git-upload-pack" "/home/jalil/**CENSORED**.git"
 12:05:50 tracing TRACE    │        ┝━ 📍 [trace]: << ff30028d5fb155dcd6fcc250cb256fccba528245 HEAD�multi_ack thin-pack side-band side-band-64k ofs-delta shallow deepen-since deepen-not deepen-relative no-progress include-tag multi_ack_detailed symref=HEAD:refs/heads/main object-format=sha1 agent=git/2.42.0
 12:05:50 tracing TRACE    │        ┝━ 📍 [trace]: << ff30028d5fb155dcd6fcc250cb256fccba528245 refs/heads/main
 12:05:50 tracing TRACE    │        ┕━ 📍 [trace]: << FLUSH
 12:05:50 tracing INFO     ┕━ fetch::Prepare::receive() [ 137s | 0.00% / 99.99% ]
 12:05:50 tracing DEBUG       ┕━ negotiate [ 137s | 0.00% / 99.99% ] protocol_version: 1
 12:05:50 tracing DEBUG          ┝━ mark_complete_and_common_ref [ 588µs | 0.00% / 0.00% ] mappings: 1
 12:05:50 tracing INFO           │  ┝━ mark_all_refs [ 370µs | 0.00% ]
 12:05:50 tracing DEBUG          │  ┝━ mark_alternate_refs [ 404ns | 0.00% ] num_odb: 0
 12:05:50 tracing INFO           │  ┝━ mark known_common [ 1.15µs | 0.00% ]
 12:05:50 tracing DEBUG          │  ┕━ mark tips [ 1.46µs | 0.00% ] num_tips: 1
 12:05:50 tracing DEBUG          ┝━ negotiate round [ 137s | 99.99% ] round: 1
 12:05:50 tracing TRACE          │  ┝━ 📍 [trace]: >> want ff30028d5fb155dcd6fcc250cb256fccba528245 thin-pack side-band-64k ofs-delta shallow deepen-since deepen-not deepen-relative multi_ack_detailed agent=git/oxide-0.55.2 include-tag
 12:05:50 tracing TRACE          │  ┝━ 📍 [trace]: >> FLUSH
 12:05:50 tracing TRACE          │  ┝━ 📍 [trace]: >> have 713418f2c3ef057538264c6c379d45836a86ca06
 12:05:50 tracing TRACE          │  ┝━ 📍 [trace]: >> have 24ed375e274f6c90b846b5579ae01ab8ac77e631
 12:05:50 tracing TRACE          │  ┝━ 📍 [trace]: >> have 4896909faec56497fdd03f7c00ec51570a6c5d88
 12:05:50 tracing TRACE          │  ┝━ 📍 [trace]: >> have ef17d82d35d901a702d561561cedd6d9ee73af14
 12:05:50 tracing TRACE          │  ┝━ 📍 [trace]: >> have 4da23d2886e815c9b3c524826bf3fdd83226a85e
 12:05:50 tracing TRACE          │  ┝━ 📍 [trace]: >> have d29e402d3e90a3bcc352107a2ec7cc499d799479
 12:05:50 tracing TRACE          │  ┝━ 📍 [trace]: >> have ec4b3825f8fe612a88e00ffdca41fb4b51972bd0
 12:05:50 tracing TRACE          │  ┝━ 📍 [trace]: >> have 82a264f12274ca8e1999b73b5fd850c85118f50a
 12:05:50 tracing TRACE          │  ┝━ 📍 [trace]: >> have 2204a3efc5352b1dbcc6d211aa2a53b23f74ff27
 12:05:50 tracing TRACE          │  ┝━ 📍 [trace]: >> have 9a827eb717c1cde2a5e7f5046d5da869fcbe9be4
 12:05:50 tracing TRACE          │  ┝━ 📍 [trace]: >> have b805cc79f48ff3b1adf0f52b5042b17108e0aaa1
 12:05:50 tracing TRACE          │  ┝━ 📍 [trace]: >> have 795e583ee2931e25b7347616e013099fcaa9789c
 12:05:50 tracing TRACE          │  ┝━ 📍 [trace]: >> have 552a1d3b6d81bb0d184bc93996b1dbdc20b0f2ca
 12:05:50 tracing TRACE          │  ┝━ 📍 [trace]: >> have 4661adc46821622bcecc94188e2e1e4382f0c2cc
 12:05:50 tracing TRACE          │  ┝━ 📍 [trace]: >> have d131fb4dd3611b1ceed828b3c3b61266e928d031
 12:05:50 tracing TRACE          │  ┝━ 📍 [trace]: >> have ab24553467cc323133e9e81eeeabf21f08e2d97e
 12:05:50 tracing TRACE          │  ┝━ 📍 [trace]: >> FLUSH
 12:05:50 tracing TRACE          │  ┝━ 📍 [trace]: << ACK 713418f2c3ef057538264c6c379d45836a86ca06 common
 12:05:50 tracing TRACE          │  ┝━ 📍 [trace]: << ACK 24ed375e274f6c90b846b5579ae01ab8ac77e631 common
 12:05:50 tracing TRACE          │  ┝━ 📍 [trace]: << ACK 4896909faec56497fdd03f7c00ec51570a6c5d88 common
 12:05:50 tracing TRACE          │  ┝━ 📍 [trace]: << ACK ef17d82d35d901a702d561561cedd6d9ee73af14 common
 12:05:50 tracing TRACE          │  ┝━ 📍 [trace]: << ACK 4da23d2886e815c9b3c524826bf3fdd83226a85e common
 12:05:50 tracing TRACE          │  ┝━ 📍 [trace]: << ACK d29e402d3e90a3bcc352107a2ec7cc499d799479 common
 12:05:50 tracing TRACE          │  ┝━ 📍 [trace]: << ACK ec4b3825f8fe612a88e00ffdca41fb4b51972bd0 common
 12:05:50 tracing TRACE          │  ┝━ 📍 [trace]: << ACK 82a264f12274ca8e1999b73b5fd850c85118f50a common
 12:05:50 tracing TRACE          │  ┝━ 📍 [trace]: << ACK 2204a3efc5352b1dbcc6d211aa2a53b23f74ff27 common
 12:05:50 tracing TRACE          │  ┝━ 📍 [trace]: << ACK 9a827eb717c1cde2a5e7f5046d5da869fcbe9be4 common
 12:05:50 tracing TRACE          │  ┝━ 📍 [trace]: << ACK b805cc79f48ff3b1adf0f52b5042b17108e0aaa1 common
 12:05:50 tracing TRACE          │  ┝━ 📍 [trace]: << ACK 795e583ee2931e25b7347616e013099fcaa9789c common
 12:05:50 tracing TRACE          │  ┝━ 📍 [trace]: << ACK 552a1d3b6d81bb0d184bc93996b1dbdc20b0f2ca common
 12:05:50 tracing TRACE          │  ┝━ 📍 [trace]: << ACK 4661adc46821622bcecc94188e2e1e4382f0c2cc common
 12:05:50 tracing TRACE          │  ┝━ 📍 [trace]: << ACK d131fb4dd3611b1ceed828b3c3b61266e928d031 common
 12:05:50 tracing TRACE          │  ┝━ 📍 [trace]: << ACK ab24553467cc323133e9e81eeeabf21f08e2d97e common
 12:05:50 tracing TRACE          │  ┝━ 📍 [trace]: << ACK ab24553467cc323133e9e81eeeabf21f08e2d97e ready
 12:05:50 tracing TRACE          │  ┕━ 📍 [trace]: << NAK
 12:05:50 tracing DEBUG          ┕━ negotiate round [ 127µs | 0.00% ] round: 2
 12:05:50 tracing TRACE             ┕━ 📍 [trace]: >> done
�[2K
Error: An IO error occurred when talking to the server

Caused by:
    Broken pipe (os error 32)

@Byron
Copy link
Member

Byron commented Oct 20, 2023

🎉!

V1 of the protocol is very close as it reproduces the issue perfectly while using a local git-upload-pack invocation.

I assume that you have exported GIT_TRACE_PACKET=1 but it didn't actually provide the output of the local git invocation. The following patch will correct that:

diff --git a/gix-transport/src/client/blocking_io/file.rs b/gix-transport/src/client/blocking_io/file.rs
index e99f79bc9..446992ff8 100644
--- a/gix-transport/src/client/blocking_io/file.rs
+++ b/gix-transport/src/client/blocking_io/file.rs
@@ -209,7 +209,7 @@ impl client::Transport for SpawnProcessOnDemand {
                 Cow::Owned(command.to_owned()),
             ),
             None => (
-                gix_command::prepare(service.as_str()).stderr(Stdio::null()),
+                gix_command::prepare(service.as_str()).stderr(Stdio::inherit()),
                 None,
                 Cow::Borrowed(OsStr::new(service.as_str())),
             ),

(can be applied with patch -p1 the-patch-above.patch)

The output should contain everything that git sees, followed by the known output above after pressing Ctrl+C.

And that output will finally prove, once and for all, that git does or doesn't see the final done, and it will also become clear if for some reason it sends more than gix consumes (unlikely, but it's a possibility). Then it should become evident what has to be done here.

Thanks again for your help!

@jalil-salame
Copy link
Contributor Author

Here is the log with git-upload-pack's output:

log
13:15:09.294150 pkt-line.c:85           packet:  upload-pack> ff30028d5fb155dcd6fcc250cb256fccba528245 HEAD\0multi_ack thin-pack side-band side-band-64k ofs-delta shallow deepen-since deepen-not deepen-relative no-progress include-tag multi_ack_detailed symref=HEAD:refs/heads/main object-format=sha1 agent=git/2.42.0
13:15:09.294589 pkt-line.c:85           packet:  upload-pack> ff30028d5fb155dcd6fcc250cb256fccba528245 refs/heads/main
13:15:09.294626 pkt-line.c:85           packet:  upload-pack> 0000
13:15:09.296360 pkt-line.c:85           packet:  upload-pack< want ff30028d5fb155dcd6fcc250cb256fccba528245 thin-pack side-band-64k ofs-delta shallow deepen-since deepen-not deepen-relative multi_ack_detailed agent=git/oxide-0.55.2 include-tag
13:15:09.296523 pkt-line.c:85           packet:  upload-pack< 0000
13:15:09.296538 pkt-line.c:85           packet:  upload-pack< have 713418f2c3ef057538264c6c379d45836a86ca06
13:15:09.296678 pkt-line.c:85           packet:  upload-pack> ACK 713418f2c3ef057538264c6c379d45836a86ca06 common
13:15:09.296698 pkt-line.c:85           packet:  upload-pack< have 24ed375e274f6c90b846b5579ae01ab8ac77e631
13:15:09.296781 pkt-line.c:85           packet:  upload-pack> ACK 24ed375e274f6c90b846b5579ae01ab8ac77e631 common
13:15:09.296799 pkt-line.c:85           packet:  upload-pack< have 4896909faec56497fdd03f7c00ec51570a6c5d88
13:15:09.296883 pkt-line.c:85           packet:  upload-pack> ACK 4896909faec56497fdd03f7c00ec51570a6c5d88 common
13:15:09.296902 pkt-line.c:85           packet:  upload-pack< have ef17d82d35d901a702d561561cedd6d9ee73af14
13:15:09.296985 pkt-line.c:85           packet:  upload-pack> ACK ef17d82d35d901a702d561561cedd6d9ee73af14 common
13:15:09.297002 pkt-line.c:85           packet:  upload-pack< have 4da23d2886e815c9b3c524826bf3fdd83226a85e
13:15:09.297075 pkt-line.c:85           packet:  upload-pack> ACK 4da23d2886e815c9b3c524826bf3fdd83226a85e common
13:15:09.297092 pkt-line.c:85           packet:  upload-pack< have d29e402d3e90a3bcc352107a2ec7cc499d799479
13:15:09.297165 pkt-line.c:85           packet:  upload-pack> ACK d29e402d3e90a3bcc352107a2ec7cc499d799479 common
13:15:09.297182 pkt-line.c:85           packet:  upload-pack< have ec4b3825f8fe612a88e00ffdca41fb4b51972bd0
13:15:09.297262 pkt-line.c:85           packet:  upload-pack> ACK ec4b3825f8fe612a88e00ffdca41fb4b51972bd0 common
13:15:09.297279 pkt-line.c:85           packet:  upload-pack< have 82a264f12274ca8e1999b73b5fd850c85118f50a
13:15:09.297352 pkt-line.c:85           packet:  upload-pack> ACK 82a264f12274ca8e1999b73b5fd850c85118f50a common
13:15:09.297370 pkt-line.c:85           packet:  upload-pack< have 2204a3efc5352b1dbcc6d211aa2a53b23f74ff27
13:15:09.297445 pkt-line.c:85           packet:  upload-pack> ACK 2204a3efc5352b1dbcc6d211aa2a53b23f74ff27 common
13:15:09.297462 pkt-line.c:85           packet:  upload-pack< have 9a827eb717c1cde2a5e7f5046d5da869fcbe9be4
13:15:09.297537 pkt-line.c:85           packet:  upload-pack> ACK 9a827eb717c1cde2a5e7f5046d5da869fcbe9be4 common
13:15:09.297554 pkt-line.c:85           packet:  upload-pack< have b805cc79f48ff3b1adf0f52b5042b17108e0aaa1
13:15:09.297628 pkt-line.c:85           packet:  upload-pack> ACK b805cc79f48ff3b1adf0f52b5042b17108e0aaa1 common
13:15:09.297645 pkt-line.c:85           packet:  upload-pack< have 795e583ee2931e25b7347616e013099fcaa9789c
13:15:09.297718 pkt-line.c:85           packet:  upload-pack> ACK 795e583ee2931e25b7347616e013099fcaa9789c common
13:15:09.297734 pkt-line.c:85           packet:  upload-pack< have 552a1d3b6d81bb0d184bc93996b1dbdc20b0f2ca
13:15:09.297807 pkt-line.c:85           packet:  upload-pack> ACK 552a1d3b6d81bb0d184bc93996b1dbdc20b0f2ca common
13:15:09.297824 pkt-line.c:85           packet:  upload-pack< have 4661adc46821622bcecc94188e2e1e4382f0c2cc
13:15:09.297962 pkt-line.c:85           packet:  upload-pack> ACK 4661adc46821622bcecc94188e2e1e4382f0c2cc common
13:15:09.297981 pkt-line.c:85           packet:  upload-pack< have d131fb4dd3611b1ceed828b3c3b61266e928d031
13:15:09.298459 pkt-line.c:85           packet:  upload-pack> ACK d131fb4dd3611b1ceed828b3c3b61266e928d031 common
13:15:09.298491 pkt-line.c:85           packet:  upload-pack< have ab24553467cc323133e9e81eeeabf21f08e2d97e
13:15:09.298530 pkt-line.c:85           packet:  upload-pack> ACK ab24553467cc323133e9e81eeeabf21f08e2d97e common
13:15:09.298543 pkt-line.c:85           packet:  upload-pack< 0000
13:15:09.298604 pkt-line.c:85           packet:  upload-pack> ACK ab24553467cc323133e9e81eeeabf21f08e2d97e ready
13:15:09.298616 pkt-line.c:85           packet:  upload-pack> NAK
 13:15:58 tracing INFO     run [ 49.1s | 0.02% / 100.00% ]
 13:15:58 tracing INFO     ┝━ ThreadSafeRepository::discover() [ 7.39ms | 0.00% / 0.02% ]
 13:15:58 tracing INFO     │  ┕━ open_from_paths() [ 7.02ms | 0.00% / 0.01% ]
 13:15:58 tracing INFO     │     ┝━ gix_path::git::install_config_path() [ 4.97ms | 0.01% ]
 13:15:58 tracing INFO     │     ┕━ gix_odb::Store::at() [ 195µs | 0.00% ]
 13:15:58 tracing INFO     ┝━ remote::Connection::ref_map() [ 3.83ms | 0.00% / 0.01% ]
 13:15:58 tracing INFO     │  ┕━ remote::Connection::fetch_refs() [ 3.82ms | 0.00% / 0.01% ]
 13:15:58 tracing DEBUG    │     ┕━ gix_protocol::handshake() [ 3.79ms | 0.01% ] service: UploadPack | extra_parameters: []
 13:15:58 tracing DEBUG    │        ┝━ 🐛 [debug]: gix_transport::SpawnProcessOnDemand | command: "git-upload-pack" "/home/jalil/**CENSORED**.git"
 13:15:58 tracing TRACE    │        ┝━ 📍 [trace]: << ff30028d5fb155dcd6fcc250cb256fccba528245 HEAD�multi_ack thin-pack side-band side-band-64k ofs-delta shallow deepen-since deepen-not deepen-relative no-progress include-tag multi_ack_detailed symref=HEAD:refs/heads/main object-format=sha1 agent=git/2.42.0
 13:15:58 tracing TRACE    │        ┝━ 📍 [trace]: << ff30028d5fb155dcd6fcc250cb256fccba528245 refs/heads/main
 13:15:58 tracing TRACE    │        ┕━ 📍 [trace]: << FLUSH
 13:15:58 tracing INFO     ┕━ fetch::Prepare::receive() [ 49.1s | 0.00% / 99.96% ]
 13:15:58 tracing DEBUG       ┕━ negotiate [ 49.1s | 0.00% / 99.96% ] protocol_version: 1
 13:15:58 tracing DEBUG          ┝━ mark_complete_and_common_ref [ 826µs | 0.00% / 0.00% ] mappings: 1
 13:15:58 tracing INFO           │  ┝━ mark_all_refs [ 401µs | 0.00% ]
 13:15:58 tracing DEBUG          │  ┝━ mark_alternate_refs [ 483ns | 0.00% ] num_odb: 0
 13:15:58 tracing INFO           │  ┝━ mark known_common [ 746ns | 0.00% ]
 13:15:58 tracing DEBUG          │  ┕━ mark tips [ 9.55µs | 0.00% ] num_tips: 1
 13:15:58 tracing DEBUG          ┝━ negotiate round [ 49.1s | 99.95% ] round: 1
 13:15:58 tracing TRACE          │  ┝━ 📍 [trace]: >> want ff30028d5fb155dcd6fcc250cb256fccba528245 thin-pack side-band-64k ofs-delta shallow deepen-since deepen-not deepen-relative multi_ack_detailed agent=git/oxide-0.55.2 include-tag
 13:15:58 tracing TRACE          │  ┝━ 📍 [trace]: >> FLUSH
 13:15:58 tracing TRACE          │  ┝━ 📍 [trace]: >> have 713418f2c3ef057538264c6c379d45836a86ca06
 13:15:58 tracing TRACE          │  ┝━ 📍 [trace]: >> have 24ed375e274f6c90b846b5579ae01ab8ac77e631
 13:15:58 tracing TRACE          │  ┝━ 📍 [trace]: >> have 4896909faec56497fdd03f7c00ec51570a6c5d88
 13:15:58 tracing TRACE          │  ┝━ 📍 [trace]: >> have ef17d82d35d901a702d561561cedd6d9ee73af14
 13:15:58 tracing TRACE          │  ┝━ 📍 [trace]: >> have 4da23d2886e815c9b3c524826bf3fdd83226a85e
 13:15:58 tracing TRACE          │  ┝━ 📍 [trace]: >> have d29e402d3e90a3bcc352107a2ec7cc499d799479
 13:15:58 tracing TRACE          │  ┝━ 📍 [trace]: >> have ec4b3825f8fe612a88e00ffdca41fb4b51972bd0
 13:15:58 tracing TRACE          │  ┝━ 📍 [trace]: >> have 82a264f12274ca8e1999b73b5fd850c85118f50a
 13:15:58 tracing TRACE          │  ┝━ 📍 [trace]: >> have 2204a3efc5352b1dbcc6d211aa2a53b23f74ff27
 13:15:58 tracing TRACE          │  ┝━ 📍 [trace]: >> have 9a827eb717c1cde2a5e7f5046d5da869fcbe9be4
 13:15:58 tracing TRACE          │  ┝━ 📍 [trace]: >> have b805cc79f48ff3b1adf0f52b5042b17108e0aaa1
 13:15:58 tracing TRACE          │  ┝━ 📍 [trace]: >> have 795e583ee2931e25b7347616e013099fcaa9789c
 13:15:58 tracing TRACE          │  ┝━ 📍 [trace]: >> have 552a1d3b6d81bb0d184bc93996b1dbdc20b0f2ca
 13:15:58 tracing TRACE          │  ┝━ 📍 [trace]: >> have 4661adc46821622bcecc94188e2e1e4382f0c2cc
 13:15:58 tracing TRACE          │  ┝━ 📍 [trace]: >> have d131fb4dd3611b1ceed828b3c3b61266e928d031
 13:15:58 tracing TRACE          │  ┝━ 📍 [trace]: >> have ab24553467cc323133e9e81eeeabf21f08e2d97e
 13:15:58 tracing TRACE          │  ┝━ 📍 [trace]: >> FLUSH
 13:15:58 tracing TRACE          │  ┝━ 📍 [trace]: << ACK 713418f2c3ef057538264c6c379d45836a86ca06 common
 13:15:58 tracing TRACE          │  ┝━ 📍 [trace]: << ACK 24ed375e274f6c90b846b5579ae01ab8ac77e631 common
 13:15:58 tracing TRACE          │  ┝━ 📍 [trace]: << ACK 4896909faec56497fdd03f7c00ec51570a6c5d88 common
 13:15:58 tracing TRACE          │  ┝━ 📍 [trace]: << ACK ef17d82d35d901a702d561561cedd6d9ee73af14 common
 13:15:58 tracing TRACE          │  ┝━ 📍 [trace]: << ACK 4da23d2886e815c9b3c524826bf3fdd83226a85e common
 13:15:58 tracing TRACE          │  ┝━ 📍 [trace]: << ACK d29e402d3e90a3bcc352107a2ec7cc499d799479 common
 13:15:58 tracing TRACE          │  ┝━ 📍 [trace]: << ACK ec4b3825f8fe612a88e00ffdca41fb4b51972bd0 common
 13:15:58 tracing TRACE          │  ┝━ 📍 [trace]: << ACK 82a264f12274ca8e1999b73b5fd850c85118f50a common
 13:15:58 tracing TRACE          │  ┝━ 📍 [trace]: << ACK 2204a3efc5352b1dbcc6d211aa2a53b23f74ff27 common
 13:15:58 tracing TRACE          │  ┝━ 📍 [trace]: << ACK 9a827eb717c1cde2a5e7f5046d5da869fcbe9be4 common
 13:15:58 tracing TRACE          │  ┝━ 📍 [trace]: << ACK b805cc79f48ff3b1adf0f52b5042b17108e0aaa1 common
 13:15:58 tracing TRACE          │  ┝━ 📍 [trace]: << ACK 795e583ee2931e25b7347616e013099fcaa9789c common
 13:15:58 tracing TRACE          │  ┝━ 📍 [trace]: << ACK 552a1d3b6d81bb0d184bc93996b1dbdc20b0f2ca common
 13:15:58 tracing TRACE          │  ┝━ 📍 [trace]: << ACK 4661adc46821622bcecc94188e2e1e4382f0c2cc common
 13:15:58 tracing TRACE          │  ┝━ 📍 [trace]: << ACK d131fb4dd3611b1ceed828b3c3b61266e928d031 common
 13:15:58 tracing TRACE          │  ┝━ 📍 [trace]: << ACK ab24553467cc323133e9e81eeeabf21f08e2d97e common
 13:15:58 tracing TRACE          │  ┝━ 📍 [trace]: << ACK ab24553467cc323133e9e81eeeabf21f08e2d97e ready
 13:15:58 tracing TRACE          │  ┕━ 📍 [trace]: << NAK
 13:15:58 tracing DEBUG          ┕━ negotiate round [ 132µs | 0.00% ] round: 2
 13:15:58 tracing TRACE             ┕━ 📍 [trace]: >> done
Error: An IO error occurred when talking to the server

Caused by:
    Broken pipe (os error 32)

@Byron
Copy link
Member

Byron commented Oct 20, 2023

That's great :)! And it clearly shows that git doesn't actually receive the done which is the reason for the hang. We also see that gix sends it, which is followed by flush right away. We also know that stdin doesn't seem to be flushed when flush() is called, and the underlying AnonPipe also doesn't seem to have a flush implementation.

In theory, it should be possible to try forcing a flush (just to prove that it works then) by flushing on the underlying file descriptor directly. But doing that would be cumbersome in Rust especially since it's just a test.

If you are on linux, maybe you could try to flush the descriptor while the process is hanging. One could just grab all open descriptors and flush them if it's unclear which one it is.

If that works, it would be clear that it's a bug in the standard library which ought to actually flush.

@jalil-salame
Copy link
Contributor Author

jalil-salame commented Oct 20, 2023

I tried attaching to gix with gdb, then calling fsync on all file descriptors, it didn't seem to fix it.

Are you sure it's stdin and not one of the open pipes (I tried all of the open fds though)? Trying the same on git-upload-pack didn't do anything either...

@jalil-salame
Copy link
Contributor Author

Using ChatGPT's python script didn't fix the issue either

@Byron
Copy link
Member

Byron commented Oct 20, 2023

Thanks a lot for trying!

This means I am puzzled as to where the done could have gone. Rust sends it into a pipe that should connect to the git-upload-pack which has just sent ACK and NAK and would now proceed to read the next packetline. That should be done and then the pack is sent.

But that clearly doesn't happen.

Maybe something else is happening here, somehow. What confuses me is that done is sent in the second round which would mean that it finished parsing the first response. But according to the logic here with client_expects_pack=false and saw_ready=true, we'd get false for the filter which would then try to read past the last NAK which should make it stall right there. Thus it wouldn't get to send done at all in the second round.

In any case, the way I understand the code in git-upload-pack, a logic change seems in order:

diff --git a/gix-protocol/src/fetch/response/blocking_io.rs b/gix-protocol/src/fetch/response/blocking_io.rs
index 309f5a7c5..d36a1a45f 100644
--- a/gix-protocol/src/fetch/response/blocking_io.rs
+++ b/gix-protocol/src/fetch/response/blocking_io.rs
@@ -85,7 +85,7 @@ impl Response {
                     assert_ne!(reader.readline_str(&mut line)?, 0, "consuming a peeked line works");
                     // When the server sends ready, we know there is going to be a pack so no need to stop early.
                     saw_ready |= matches!(acks.last(), Some(Acknowledgement::Ready));
-                    if let Some(Acknowledgement::Nak) = acks.last().filter(|_| !client_expects_pack && !saw_ready) {
+                    if let Some(Acknowledgement::Nak) = acks.last().filter(|_| !client_expects_pack || !saw_ready) {
                         break 'lines false;
                     }
                 };

Can you try it with this patch? it passes the test-suite, so that's a start (and it will hang if I butcher it too much).

@jalil-salame
Copy link
Contributor Author

diff --git a/gix-protocol/src/fetch/response/blocking_io.rs b/gix-protocol/src/fetch/response/blocking_io.rs
index 309f5a7c5..d36a1a45f 100644
--- a/gix-protocol/src/fetch/response/blocking_io.rs
+++ b/gix-protocol/src/fetch/response/blocking_io.rs
@@ -85,7 +85,7 @@ impl Response {
                     assert_ne!(reader.readline_str(&mut line)?, 0, "consuming a peeked line works");
                     // When the server sends ready, we know there is going to be a pack so no need to stop early.
                     saw_ready |= matches!(acks.last(), Some(Acknowledgement::Ready));
-                    if let Some(Acknowledgement::Nak) = acks.last().filter(|_| !client_expects_pack && !saw_ready) {
+                    if let Some(Acknowledgement::Nak) = acks.last().filter(|_| !client_expects_pack || !saw_ready) {
                         break 'lines false;
                     }
                 };

That did it! Works both locally and through ssh!

Byron added a commit that referenced this issue Oct 20, 2023
The logic previously tried to estimate when a pack can be expected,
and when a NAK is the end of a block, or the beginning of a pack.

This can be known because a pack (with our settings) needs two things:

* the server thinks it's ready
* a `done` sent by the client

If the server doesn't think it's ready it will send NAK and be done.

So the logic should be, for a NAK to stop the read-loop, that the client
expects a pack, and the server is ready. If the client is not ready, or
the server isn't ready, keep NAK and consider them the end of a round,
hence break the loop.
@Byron
Copy link
Member

Byron commented Oct 20, 2023

That's incredible! A small change with huge effect! I can now just hope that the test-coverage is as good as I think or else something else might break 😅 (at least the blast radios is limited to V1).

Alright, the PR is in flight and I hope it will be smooth sailing from now on :).

Byron added a commit that referenced this issue Oct 20, 2023
The logic previously tried to estimate when a pack can be expected,
and when a NAK is the end of a block, or the beginning of a pack.

This can be known because a pack (with our settings) needs two things:

* the server thinks it's ready
* a `done` sent by the client

If the server doesn't think it's ready it will send NAK and be done.

So the logic should be, for a NAK to stop the read-loop, that the client
expects a pack, and the server is ready. If the client is not ready, or
the server isn't ready, keep NAK and consider them the end of a round,
hence break the loop.
Byron added a commit that referenced this issue Oct 20, 2023
The logic previously tried to estimate when a pack can be expected,
and when a NAK is the end of a block, or the beginning of a pack.

This can be known because a pack (with our settings) needs two things:

* the server thinks it's ready
* a `done` sent by the client

If the server doesn't think it's ready it will send NAK and be done.

So the logic should be, for a NAK to stop the read-loop, that the client
expects a pack, and the server is ready. If the client is not ready, or
the server isn't ready, keep NAK and consider them the end of a round,
hence break the loop.
@Byron
Copy link
Member

Byron commented Oct 20, 2023

Can you try once more from this PR? It contains adjustments to the logic to work with more test-cases, and I can only hope that it also still covers your case.

Byron added a commit that referenced this issue Oct 20, 2023
The logic previously tried to estimate when a pack can be expected,
and when a NAK is the end of a block, or the beginning of a pack.

This can be known because a pack (with our settings) needs two things:

* the server thinks it's ready
* a `done` sent by the client

If the server doesn't think it's ready it will send NAK and be done.

So the logic should be, for a NAK to stop the read-loop, that the client
expects a pack, and the server is ready. If the client is not ready, or
the server isn't ready, keep NAK and consider them the end of a round,
hence break the loop.
@jalil-salame
Copy link
Contributor Author

Can you try once more from this PR? It contains adjustments to the logic to work with more test-cases, and I can only hope that it also still covers your case.

Verified that main fixes the issue c: Thanks for all of this!

@mainrs
Copy link

mainrs commented Nov 22, 2023

$ gix --trace fetch

I scimmed through the code but couldn't ultimately figure out what kind of logging was used for the hierarchies printed by this command.

It doesn't seem to be tracing or log as far as I can tell.

@Byron
Copy link
Member

Byron commented Nov 22, 2023

You are probably looking for this:

https://github.com/Byron/gitoxide/blob/b2300782ad09f17730fe2a5ba4938517d4e7ec16/src/shared.rs#L104-L132

It's tracing at the core, and visualized with tracing_forest. Note that the latter is finicky and works best in single-threaded applications. I tried it in a couple of other places but failed ungracefully, probably due to my own ignorance. However, when it works, it's great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
acknowledged an issue is accepted as shortcoming to be fixed feedback requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants