Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Network partition should cause 'host unreachable' not 'connection refused' for TCP #169

Open
cameronelliott opened this issue Feb 16, 2024 · 4 comments · May be fixed by #193
Open

Network partition should cause 'host unreachable' not 'connection refused' for TCP #169

cameronelliott opened this issue Feb 16, 2024 · 4 comments · May be fixed by #193
Labels
good first issue Good for newcomers

Comments

@cameronelliott
Copy link

Summary: Unreachable hosts should cause an UnreachableHost rather than ConnectionRefused
on network partitions, etc.
Summary: I am not sure if Shuttle needs this level of fidelity just yet, and if anyone would notice the difference at this time. But someday simulations using Shuttle might take different actions based upon UnreachableHost vs ConnectionRefused, so it might make sense to fix.

detail

I modified the axum example by adding a single line before the client request:
turmoil::partition("client", "server");

Doing so resulted in this output:

[...]
thread 'main' panicked at examples/axum/src/main.rs:71:15:
called `Result::unwrap()` on an `Err` value: Error { kind: Connect, source: Some(Custom { kind:ConnectionRefused, error: "192.168.0.1:9999" }) }
[...]

Normally when trying to reach a TCP server via a partitioned network, a HostUnreachable error will occurr after a timeout period. A ConnectionRefused occurr will not occur, because a ConnectionRefused occurr happens when a box receiving a TCP syn rejects it, because there is no listener or server running on that port.

This can be demostrated by using curl from the command kine.

# in this first example, I am curling an IP address without a computer. 
# thus there is nothing to respond. it will timeout after ~3 seconds, and return host unreachable
c@intel12400 ~/t/e/axum (main) [7]> time curl -vvvvv 192.168.86.33
*   Trying 192.168.86.33:80...
* connect to 192.168.86.33 port 80 from 192.168.86.5 port 59648 failed: Host is unreachable
* Failed to connect to 192.168.86.33 port 80 after 3055 ms: Couldn't connect to server
* Closing connection
curl: (7) Failed to connect to 192.168.86.33 port 80 after 3055 ms: Couldn't connect to server

________________________________________________________
Executed in    3.06 secs      fish           external
   usr time    6.04 millis  960.00 micros    5.08 millis
   sys time    0.32 millis  315.00 micros    0.00 millis
# this second example shows when a connection refused occurs
# I am curling to a valid IP with a computer running, but nothing running on the port specified
# thus the computer receives the TCP syn request, but denies it, cause nothing is on the port
c@intel12400 ~/t/e/axum (main) [7]> time curl -vvvvv 192.168.86.5:8888
*   Trying 192.168.86.5:8888...
* connect to 192.168.86.5 port 8888 from 192.168.86.5 port 41228 failed: Connection refused
* Failed to connect to 192.168.86.5 port 8888 after 0 ms: Couldn't connect to server
* Closing connection
curl: (7) Failed to connect to 192.168.86.5 port 8888 after 0 ms: Couldn't connect to server

________________________________________________________
Executed in    5.57 millis    fish           external
   usr time    5.49 millis  701.00 micros    4.79 millis
   sys time    0.23 millis  226.00 micros    0.00 millis
@mcches
Copy link
Contributor

mcches commented Feb 16, 2024

Good catch. There isn't a distinction between the network dropping the SYN and the recipient host dropping it today. I also noticed this path as well when a host doesn't exist: https://github.com/tokio-rs/turmoil/blob/main/src/top.rs#L227

Here is the path for partition: https://github.com/tokio-rs/turmoil/blob/main/src/top.rs#L341, where we drop the SYN and the oneshot inside drops, failing the connect: https://github.com/tokio-rs/turmoil/blob/main/src/net/tcp/stream.rs#L88

Both of these issues are relatively easy to fix. Do you have any interest in cutting a PR? If not, I can get to this next week.

@mcches mcches added the good first issue Good for newcomers label Feb 16, 2024
@tneely
Copy link

tneely commented Feb 17, 2024

It looks like the HostUnreachable error is only supported in nightly builds. It might be best to wait to bring this in until rust-lang/rust#86442 is finalized.

@Colin1860
Copy link

Colin1860 commented Aug 30, 2024

What's the current state with this? I see the tracking issue for io_error_more is still open, so should an MR for this wait until this issue is finalized? I would be more than happy to cut an MR for this

@mcches
Copy link
Contributor

mcches commented Sep 2, 2024

This can be actioned, even without needing the nightly io error kind. That can be left as a todo, but we should still fix the incorrect behavior outlined above.

@brandonpike brandonpike linked a pull request Nov 3, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants