fix: gracefully handle stale interface binding with fallback to default route#1400
fix: gracefully handle stale interface binding with fallback to default route#1400xiguagua wants to merge 17 commits into
Conversation
The script has been updated to include `aarch64-apple-ios-sim` in the build targets, ensuring that simulator builds are generated for iOS. This allows for more comprehensive testing and development on Apple platforms.
Enable the rustls provider required by clash-ffi and gate the TUN visibility constants so iOS builds stop failing on dead code warnings.
Make the Apple build script default to iOS-only output and gate the macOS build path behind an explicit full build flag.
# Conflicts: # Cargo.toml
…llation token CancellationToken::clone() shares cancellation state, so shutting down the old API server on reload would also cancel the new one's token, causing delay/healthcheck endpoints to become unreachable after the second hot-reload cycle.
|
Caution Review failedPull request was closed or merged during review 📝 WalkthroughWalkthroughSocket binding is made resilient to stale interface indices on Unix and Apple by converting specific errors to non-fatal warnings. Application logging and statistics configuration are updated to handle missing log files and remove memory export. Apple build script gains conditional iOS/macOS targeting via ChangesPlatform resilience and infrastructure updates
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
clash-lib/src/proxy/utils/platform/unix.rs (1)
1-43:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winPipeline failure: rustfmt formatting must be fixed before merge.
The CI pipeline reports rustfmt formatting mismatches at lines 17 and 25. Run
cargo +nightly fmt --allas per coding guidelines before committing.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@clash-lib/src/proxy/utils/platform/unix.rs` around lines 1 - 43, The file's formatting in function bind_socket_on_interface doesn't meet rustfmt rules; run the formatter (cargo +nightly fmt --all) and reformat the file so the socket.bind_device chaining and the tracing! macro calls (inside bind_socket_on_interface) are properly wrapped/indented per rustfmt; then commit the reformatted changes.clash-lib/src/proxy/utils/platform/apple.rs (1)
1-58:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winPipeline failure: rustfmt formatting must be fixed before merge.
The CI pipeline reports a rustfmt formatting mismatch. Run
cargo +nightly fmt --allas per coding guidelines before committing.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@clash-lib/src/proxy/utils/platform/apple.rs` around lines 1 - 58, The file fails rustfmt; run the formatter and commit the changes: run `cargo +nightly fmt --all` (or your project's configured formatter) and reformat the function `bind_socket_on_interface` in apple.rs so formatting matches CI expectations (ensure alignment/indentation around the match arms and the warn! macro calls for both `socket2::Domain::IPV4` and `socket2::Domain::IPV6` branches), then stage and push the updated formatted file.clash-lib/src/proxy/utils/platform/win.rs (1)
17-94: 🧹 Nitpick | 🔵 Trivial | ⚖️ Poor tradeoffWindows should handle stale interface errors gracefully like Apple and Unix platforms.
The Apple implementation catches
io::ErrorKind::AddrNotAvailablewhen binding to a stale interface and falls back to default routing (lines 25, 41). The Unix implementation similarly catchesAddrNotAvailable | NotFounderrors and gracefully degrades (lines 17-22).The Windows implementation propagates all
setsockoptfailures without attempting to detect or handle stale interfaces (lines 52-56, 87-91). While Windows uses a different binding mechanism (IP_UNICAST_IF/IPV6_UNICAST_IF), it should likewise recognize when an interface becomes unavailable and fall back gracefully to maintain consistent cross-platform behavior. Currently, the error handling converts Windows errors to strings viaGetLastError().to_hresult().message(), which prevents matching specific error conditions; this would need to be addressed to implement proper stale interface handling.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@clash-lib/src/proxy/utils/platform/win.rs` around lines 17 - 94, The Windows bind_socket_on_interface currently treats any setsockopt failure as fatal; change both error paths (the two errno != 0 checks inside bind_socket_on_interface) to read the raw Windows error code (GetLastError as i32), convert it to an io::Error via io::Error::from_raw_os_error(last_err), and then match on io_err.kind() (AddrNotAvailable and NotFound) and/or specific Windows socket codes like WSAEADDRNOTAVAIL (10049) to treat those as stale-interface cases by logging a debug/info message and returning Ok(()) to fall back to default routing; only for other errors keep the existing error logging (use the error message for logging) and return Err(new_io_error(...)).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@clash-lib/src/proxy/utils/platform/apple.rs`:
- Around line 22-51: The IPv4 and IPv6 branches duplicate the same or_else error
handling for AddrNotAvailable; extract that logic into a helper (e.g., a closure
or function like handle_addr_not_available) that accepts the Result from
bind_device_by_index_v4/bind_device_by_index_v6 and maps the io::Error with kind
AddrNotAvailable to Ok(()) while otherwise re-raising the error; then replace
both inline or_else blocks with a call to this helper, keeping the warn! call
and iface.name message intact.
In `@clash-lib/src/proxy/utils/socket_helpers.rs`:
- Line 71: The TCP bind call using bind_socket_on_interface(&socket, iface,
family)? silently propagates errors; update this call to mirror the UDP sites by
logging the error before propagating (e.g., call
bind_socket_on_interface(&socket, iface, family).inspect_err(|e| error!("binding
TCP socket on interface {} (family {:?}) failed: {:?}", iface, family, e))? or
otherwise log the error with the same logger used elsewhere), so failures from
bind_socket_on_interface are logged with iface, family and the error before
being returned.
---
Outside diff comments:
In `@clash-lib/src/proxy/utils/platform/apple.rs`:
- Around line 1-58: The file fails rustfmt; run the formatter and commit the
changes: run `cargo +nightly fmt --all` (or your project's configured formatter)
and reformat the function `bind_socket_on_interface` in apple.rs so formatting
matches CI expectations (ensure alignment/indentation around the match arms and
the warn! macro calls for both `socket2::Domain::IPV4` and
`socket2::Domain::IPV6` branches), then stage and push the updated formatted
file.
In `@clash-lib/src/proxy/utils/platform/unix.rs`:
- Around line 1-43: The file's formatting in function bind_socket_on_interface
doesn't meet rustfmt rules; run the formatter (cargo +nightly fmt --all) and
reformat the file so the socket.bind_device chaining and the tracing! macro
calls (inside bind_socket_on_interface) are properly wrapped/indented per
rustfmt; then commit the reformatted changes.
In `@clash-lib/src/proxy/utils/platform/win.rs`:
- Around line 17-94: The Windows bind_socket_on_interface currently treats any
setsockopt failure as fatal; change both error paths (the two errno != 0 checks
inside bind_socket_on_interface) to read the raw Windows error code
(GetLastError as i32), convert it to an io::Error via
io::Error::from_raw_os_error(last_err), and then match on io_err.kind()
(AddrNotAvailable and NotFound) and/or specific Windows socket codes like
WSAEADDRNOTAVAIL (10049) to treat those as stale-interface cases by logging a
debug/info message and returning Ok(()) to fall back to default routing; only
for other errors keep the existing error logging (use the error message for
logging) and return Err(new_io_error(...)).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: f724423f-648f-41d4-a796-f05f56c5e1e5
📒 Files selected for processing (5)
clash-lib/src/proxy/utils/platform/apple.rsclash-lib/src/proxy/utils/platform/mod.rsclash-lib/src/proxy/utils/platform/unix.rsclash-lib/src/proxy/utils/platform/win.rsclash-lib/src/proxy/utils/socket_helpers.rs
| socket | ||
| .bind_device_by_index_v4(std::num::NonZeroU32::new(index)) | ||
| .or_else(|e| { | ||
| if e.kind() == io::ErrorKind::AddrNotAvailable { | ||
| warn!( | ||
| "stale interface index {index} for '{}', \ | ||
| falling back to default route: {e}", | ||
| iface.name | ||
| ); | ||
| Ok(()) | ||
| } else { | ||
| Err(e) | ||
| } | ||
| }) | ||
| } | ||
| socket2::Domain::IPV6 => { | ||
| socket.bind_device_by_index_v6(std::num::NonZeroU32::new(index)) | ||
| socket | ||
| .bind_device_by_index_v6(std::num::NonZeroU32::new(index)) | ||
| .or_else(|e| { | ||
| if e.kind() == io::ErrorKind::AddrNotAvailable { | ||
| warn!( | ||
| "stale interface index {index} for '{}', \ | ||
| falling back to default route: {e}", | ||
| iface.name | ||
| ); | ||
| Ok(()) | ||
| } else { | ||
| Err(e) | ||
| } | ||
| }) |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial | ⚡ Quick win
Extract duplicate error handling logic.
The IPv4 and IPv6 branches have identical error handling code for AddrNotAvailable. Consider extracting this into a helper closure to eliminate duplication.
♻️ Proposed refactor to eliminate duplication
pub(crate) fn bind_socket_on_interface(
socket: &socket2::Socket,
iface: &OutboundInterface,
family: socket2::Domain,
) -> io::Result<()> {
let index = iface.index;
if index == 0 {
warn!(
"OutboundInterface index is 0, skipping binding to interface {}",
iface.name
);
return Ok(());
}
+ let handle_stale = |e: io::Error| {
+ if e.kind() == io::ErrorKind::AddrNotAvailable {
+ warn!(
+ "stale interface index {index} for '{}', \
+ falling back to default route: {e}",
+ iface.name
+ );
+ Ok(())
+ } else {
+ Err(e)
+ }
+ };
match family {
socket2::Domain::IPV4 => {
socket
.bind_device_by_index_v4(std::num::NonZeroU32::new(index))
- .or_else(|e| {
- if e.kind() == io::ErrorKind::AddrNotAvailable {
- warn!(
- "stale interface index {index} for '{}', \
- falling back to default route: {e}",
- iface.name
- );
- Ok(())
- } else {
- Err(e)
- }
- })
+ .or_else(handle_stale)
}
socket2::Domain::IPV6 => {
socket
.bind_device_by_index_v6(std::num::NonZeroU32::new(index))
- .or_else(|e| {
- if e.kind() == io::ErrorKind::AddrNotAvailable {
- warn!(
- "stale interface index {index} for '{}', \
- falling back to default route: {e}",
- iface.name
- );
- Ok(())
- } else {
- Err(e)
- }
- })
+ .or_else(handle_stale)
}
_ => Err(io::Error::new(
io::ErrorKind::InvalidInput,
"unsupported socket family",
)),
}
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@clash-lib/src/proxy/utils/platform/apple.rs` around lines 22 - 51, The IPv4
and IPv6 branches duplicate the same or_else error handling for
AddrNotAvailable; extract that logic into a helper (e.g., a closure or function
like handle_addr_not_available) that accepts the Result from
bind_device_by_index_v4/bind_device_by_index_v6 and maps the io::Error with kind
AddrNotAvailable to Ok(()) while otherwise re-raising the error; then replace
both inline or_else blocks with a call to this helper, keeping the warn! call
and iface.name message intact.
| && let Some(iface) = iface | ||
| { | ||
| must_bind_socket_on_interface(&socket, iface, family)?; | ||
| bind_socket_on_interface(&socket, iface, family)?; |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial | 💤 Low value
Consider adding error logging for TCP socket binding failures.
The UDP socket binding call sites (lines 134-138, 253-255) use .inspect_err() to log binding failures before propagation, but the TCP binding call site at line 71 propagates errors silently. For consistency and easier debugging, consider adding similar error logging here.
📝 Proposed addition for consistency
if !cfg!(target_os = "android")
&& let Some(iface) = iface
{
- bind_socket_on_interface(&socket, iface, family)?;
+ bind_socket_on_interface(&socket, iface, family).inspect_err(
+ |x| {
+ error!("failed to bind TCP socket to interface: {}", x);
+ },
+ )?;
trace!("tcp socket bound to interface: {socket:?}");
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| bind_socket_on_interface(&socket, iface, family)?; | |
| if !cfg!(target_os = "android") | |
| && let Some(iface) = iface | |
| { | |
| bind_socket_on_interface(&socket, iface, family).inspect_err( | |
| |x| { | |
| error!("failed to bind TCP socket to interface: {}", x); | |
| }, | |
| )?; | |
| trace!("tcp socket bound to interface: {socket:?}"); | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@clash-lib/src/proxy/utils/socket_helpers.rs` at line 71, The TCP bind call
using bind_socket_on_interface(&socket, iface, family)? silently propagates
errors; update this call to mirror the UDP sites by logging the error before
propagating (e.g., call bind_socket_on_interface(&socket, iface,
family).inspect_err(|e| error!("binding TCP socket on interface {} (family {:?})
failed: {:?}", iface, family, e))? or otherwise log the error with the same
logger used elsewhere), so failures from bind_socket_on_interface are logged
with iface, family and the error before being returned.
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
|
Wouldn't the connection fail anyways if it's supposed to go out via a specific interface when fallback to default? |
That's a different issue. This PR addresses an issue where interface binding fails after a device has been idle for a period of time on the iOS platform. causing endpoint speed tests to fail. |
|
In that case I think you should properly stop and resume on network path changed and populate the active interface again. This PR won't fix the issue it just hides it and resurface with a different error instead, right? |
It is not hiding,just let system choose. |
…lt route macOS returns EADDRNOTAVAIL (os error 49) when bind_device_by_index_v4/v6 is called with a stale interface index (after network changes like Wi-Fi switches or VPN connect/disconnect). Similarly, Linux bind_device returns ENODEV when the interface name no longer exists. Rename must_bind_socket_on_interface -> bind_socket_on_interface to reflect that binding is now best-effort rather than mandatory. When the target interface is stale (AddrNotAvailable/NotFound), log a warning and fall back to the default route instead of failing the connection.
|
It's probably fine and works on iOS where NE protects the sockets. But it's logically wrong when you decided to bind an interface for a socket but it ended up going through another interface. |
baf2beb to
f7cb640
Compare
|
Got it. |
iOS returns EADDRNOTAVAIL (os error 49) when bind_device_by_index_v4/v6 is called with a stale interface index (after network changes like Wi-Fi switches or VPN connect/disconnect). Similarly, Linux bind_device returns ENODEV when the interface name no longer exists.
Rename must_bind_socket_on_interface -> bind_socket_on_interface to reflect that binding is now best-effort rather than mandatory. When the target interface is stale (AddrNotAvailable/NotFound), log a warning and fall back to the default route instead of failing the connection.
What does this PR do?
Type
Checklist
Summary by CodeRabbit
Bug Fixes
New Features
--fullflag.Chores