Skip to content

fix: gracefully handle stale interface binding with fallback to default route#1400

Closed
xiguagua wants to merge 17 commits into
Watfaq:masterfrom
xiguagua:fix/stale-interface-binding
Closed

fix: gracefully handle stale interface binding with fallback to default route#1400
xiguagua wants to merge 17 commits into
Watfaq:masterfrom
xiguagua:fix/stale-interface-binding

Conversation

@xiguagua

@xiguagua xiguagua commented May 13, 2026

Copy link
Copy Markdown
Contributor

iOS returns EADDRNOTAVAIL (os error 49) when bind_device_by_index_v4/v6 is called with a stale interface index (after network changes like Wi-Fi switches or VPN connect/disconnect). Similarly, Linux bind_device returns ENODEV when the interface name no longer exists.

Rename must_bind_socket_on_interface -> bind_socket_on_interface to reflect that binding is now best-effort rather than mandatory. When the target interface is stale (AddrNotAvailable/NotFound), log a warning and fall back to the default route instead of failing the connection.

What does this PR do?

Type

  • Bug fix
  • New feature
  • Refactoring / cleanup
  • Other

Checklist

  • Tests added or not needed
  • Docs updated or not needed

Summary by CodeRabbit

  • Bug Fixes

    • Improved network interface binding reliability on Apple and Unix platforms by gracefully handling stale interface indices.
    • Fixed log file creation to properly create missing files instead of failing.
  • New Features

    • Build script now supports flexible iOS/macOS build targeting via a --full flag.
  • Chores

    • Updated dependencies for enhanced security and observability.
    • Removed memory field from statistics output.

Review Change Stack

The script has been updated to include `aarch64-apple-ios-sim` in the
build targets, ensuring that simulator builds are generated for iOS.
This allows for more comprehensive testing and development on Apple
platforms.
Enable the rustls provider required by clash-ffi and gate the TUN visibility constants so iOS builds stop failing on dead code warnings.
Make the Apple build script default to iOS-only output and gate the macOS build path behind an explicit full build flag.
# Conflicts:
#	Cargo.toml
…llation token

CancellationToken::clone() shares cancellation state, so shutting down
the old API server on reload would also cancel the new one's token,
causing delay/healthcheck endpoints to become unreachable after the
second hot-reload cycle.
@coderabbitai

coderabbitai Bot commented May 13, 2026

Copy link
Copy Markdown
Contributor

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

Socket binding is made resilient to stale interface indices on Unix and Apple by converting specific errors to non-fatal warnings. Application logging and statistics configuration are updated to handle missing log files and remove memory export. Apple build script gains conditional iOS/macOS targeting via --full flag with deployment version configuration.

Changes

Platform resilience and infrastructure updates

Layer / File(s) Summary
Socket interface binding stale index tolerance
clash-lib/src/proxy/utils/platform/apple.rs, clash-lib/src/proxy/utils/platform/unix.rs, clash-lib/src/proxy/utils/socket_helpers.rs
Apple and Unix socket binding implementations now catch stale interface errors (AddrNotAvailable or NotFound) and log a warning instead of propagating the error, allowing fallback to default routing. Socket helper call site formatting is adjusted.
Application logging, statistics, and dependencies
clash-lib/src/app/logging.rs, clash-lib/src/app/dispatcher/statistics_manager.rs, clash-ffi/Cargo.toml
Logging now creates the log file if missing. Statistics Snapshot struct removes the memory field from serialization. FFI dependencies expand to include tracing infrastructure (tracing, tracing-oslog, tracing-subscriber) and aws-lc-rs feature for clash-lib.
Mobile targeting and Apple build configuration
clash-lib/src/proxy/tun/runner.rs, scripts/build_apple.sh
TUN visibility constant is conditionally compiled only for non-mobile targets. Apple build script adds --full flag to optionally include macOS artifacts alongside iOS, with deployment target variables and conditional universal binary/XCFramework generation.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • Watfaq/clash-rs#1397: Enables and configures aws-lc-rs feature in the dependency chain; this PR also adds aws-lc-rs feature enablement in FFI Cargo.toml.

Suggested reviewers

  • ibigbug

Poem

🐰 Hop, hop—sockets bind more gently now,
When interfaces fade, we bow with grace,
Logs create themselves, stats wear no weight,
And Apple builds—iOS tall, macOS wide,
Platform-wise, we thrive and glide!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: handling stale interface binding gracefully with fallback to the default route, which is the primary objective across the platform-specific implementations.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
clash-lib/src/proxy/utils/platform/unix.rs (1)

1-43: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Pipeline failure: rustfmt formatting must be fixed before merge.

The CI pipeline reports rustfmt formatting mismatches at lines 17 and 25. Run cargo +nightly fmt --all as per coding guidelines before committing.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@clash-lib/src/proxy/utils/platform/unix.rs` around lines 1 - 43, The file's
formatting in function bind_socket_on_interface doesn't meet rustfmt rules; run
the formatter (cargo +nightly fmt --all) and reformat the file so the
socket.bind_device chaining and the tracing! macro calls (inside
bind_socket_on_interface) are properly wrapped/indented per rustfmt; then commit
the reformatted changes.
clash-lib/src/proxy/utils/platform/apple.rs (1)

1-58: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Pipeline failure: rustfmt formatting must be fixed before merge.

The CI pipeline reports a rustfmt formatting mismatch. Run cargo +nightly fmt --all as per coding guidelines before committing.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@clash-lib/src/proxy/utils/platform/apple.rs` around lines 1 - 58, The file
fails rustfmt; run the formatter and commit the changes: run `cargo +nightly fmt
--all` (or your project's configured formatter) and reformat the function
`bind_socket_on_interface` in apple.rs so formatting matches CI expectations
(ensure alignment/indentation around the match arms and the warn! macro calls
for both `socket2::Domain::IPV4` and `socket2::Domain::IPV6` branches), then
stage and push the updated formatted file.
clash-lib/src/proxy/utils/platform/win.rs (1)

17-94: 🧹 Nitpick | 🔵 Trivial | ⚖️ Poor tradeoff

Windows should handle stale interface errors gracefully like Apple and Unix platforms.

The Apple implementation catches io::ErrorKind::AddrNotAvailable when binding to a stale interface and falls back to default routing (lines 25, 41). The Unix implementation similarly catches AddrNotAvailable | NotFound errors and gracefully degrades (lines 17-22).

The Windows implementation propagates all setsockopt failures without attempting to detect or handle stale interfaces (lines 52-56, 87-91). While Windows uses a different binding mechanism (IP_UNICAST_IF/IPV6_UNICAST_IF), it should likewise recognize when an interface becomes unavailable and fall back gracefully to maintain consistent cross-platform behavior. Currently, the error handling converts Windows errors to strings via GetLastError().to_hresult().message(), which prevents matching specific error conditions; this would need to be addressed to implement proper stale interface handling.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@clash-lib/src/proxy/utils/platform/win.rs` around lines 17 - 94, The Windows
bind_socket_on_interface currently treats any setsockopt failure as fatal;
change both error paths (the two errno != 0 checks inside
bind_socket_on_interface) to read the raw Windows error code (GetLastError as
i32), convert it to an io::Error via io::Error::from_raw_os_error(last_err), and
then match on io_err.kind() (AddrNotAvailable and NotFound) and/or specific
Windows socket codes like WSAEADDRNOTAVAIL (10049) to treat those as
stale-interface cases by logging a debug/info message and returning Ok(()) to
fall back to default routing; only for other errors keep the existing error
logging (use the error message for logging) and return Err(new_io_error(...)).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@clash-lib/src/proxy/utils/platform/apple.rs`:
- Around line 22-51: The IPv4 and IPv6 branches duplicate the same or_else error
handling for AddrNotAvailable; extract that logic into a helper (e.g., a closure
or function like handle_addr_not_available) that accepts the Result from
bind_device_by_index_v4/bind_device_by_index_v6 and maps the io::Error with kind
AddrNotAvailable to Ok(()) while otherwise re-raising the error; then replace
both inline or_else blocks with a call to this helper, keeping the warn! call
and iface.name message intact.

In `@clash-lib/src/proxy/utils/socket_helpers.rs`:
- Line 71: The TCP bind call using bind_socket_on_interface(&socket, iface,
family)? silently propagates errors; update this call to mirror the UDP sites by
logging the error before propagating (e.g., call
bind_socket_on_interface(&socket, iface, family).inspect_err(|e| error!("binding
TCP socket on interface {} (family {:?}) failed: {:?}", iface, family, e))? or
otherwise log the error with the same logger used elsewhere), so failures from
bind_socket_on_interface are logged with iface, family and the error before
being returned.

---

Outside diff comments:
In `@clash-lib/src/proxy/utils/platform/apple.rs`:
- Around line 1-58: The file fails rustfmt; run the formatter and commit the
changes: run `cargo +nightly fmt --all` (or your project's configured formatter)
and reformat the function `bind_socket_on_interface` in apple.rs so formatting
matches CI expectations (ensure alignment/indentation around the match arms and
the warn! macro calls for both `socket2::Domain::IPV4` and
`socket2::Domain::IPV6` branches), then stage and push the updated formatted
file.

In `@clash-lib/src/proxy/utils/platform/unix.rs`:
- Around line 1-43: The file's formatting in function bind_socket_on_interface
doesn't meet rustfmt rules; run the formatter (cargo +nightly fmt --all) and
reformat the file so the socket.bind_device chaining and the tracing! macro
calls (inside bind_socket_on_interface) are properly wrapped/indented per
rustfmt; then commit the reformatted changes.

In `@clash-lib/src/proxy/utils/platform/win.rs`:
- Around line 17-94: The Windows bind_socket_on_interface currently treats any
setsockopt failure as fatal; change both error paths (the two errno != 0 checks
inside bind_socket_on_interface) to read the raw Windows error code
(GetLastError as i32), convert it to an io::Error via
io::Error::from_raw_os_error(last_err), and then match on io_err.kind()
(AddrNotAvailable and NotFound) and/or specific Windows socket codes like
WSAEADDRNOTAVAIL (10049) to treat those as stale-interface cases by logging a
debug/info message and returning Ok(()) to fall back to default routing; only
for other errors keep the existing error logging (use the error message for
logging) and return Err(new_io_error(...)).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: f724423f-648f-41d4-a796-f05f56c5e1e5

📥 Commits

Reviewing files that changed from the base of the PR and between d621c7b and baf2beb.

📒 Files selected for processing (5)
  • clash-lib/src/proxy/utils/platform/apple.rs
  • clash-lib/src/proxy/utils/platform/mod.rs
  • clash-lib/src/proxy/utils/platform/unix.rs
  • clash-lib/src/proxy/utils/platform/win.rs
  • clash-lib/src/proxy/utils/socket_helpers.rs

Comment on lines +22 to +51
socket
.bind_device_by_index_v4(std::num::NonZeroU32::new(index))
.or_else(|e| {
if e.kind() == io::ErrorKind::AddrNotAvailable {
warn!(
"stale interface index {index} for '{}', \
falling back to default route: {e}",
iface.name
);
Ok(())
} else {
Err(e)
}
})
}
socket2::Domain::IPV6 => {
socket.bind_device_by_index_v6(std::num::NonZeroU32::new(index))
socket
.bind_device_by_index_v6(std::num::NonZeroU32::new(index))
.or_else(|e| {
if e.kind() == io::ErrorKind::AddrNotAvailable {
warn!(
"stale interface index {index} for '{}', \
falling back to default route: {e}",
iface.name
);
Ok(())
} else {
Err(e)
}
})

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Extract duplicate error handling logic.

The IPv4 and IPv6 branches have identical error handling code for AddrNotAvailable. Consider extracting this into a helper closure to eliminate duplication.

♻️ Proposed refactor to eliminate duplication
 pub(crate) fn bind_socket_on_interface(
     socket: &socket2::Socket,
     iface: &OutboundInterface,
     family: socket2::Domain,
 ) -> io::Result<()> {
     let index = iface.index;
     if index == 0 {
         warn!(
             "OutboundInterface index is 0, skipping binding to interface {}",
             iface.name
         );
         return Ok(());
     }
+    let handle_stale = |e: io::Error| {
+        if e.kind() == io::ErrorKind::AddrNotAvailable {
+            warn!(
+                "stale interface index {index} for '{}', \
+                 falling back to default route: {e}",
+                iface.name
+            );
+            Ok(())
+        } else {
+            Err(e)
+        }
+    };
     match family {
         socket2::Domain::IPV4 => {
             socket
                 .bind_device_by_index_v4(std::num::NonZeroU32::new(index))
-                .or_else(|e| {
-                    if e.kind() == io::ErrorKind::AddrNotAvailable {
-                        warn!(
-                            "stale interface index {index} for '{}', \
-                             falling back to default route: {e}",
-                            iface.name
-                        );
-                        Ok(())
-                    } else {
-                        Err(e)
-                    }
-                })
+                .or_else(handle_stale)
         }
         socket2::Domain::IPV6 => {
             socket
                 .bind_device_by_index_v6(std::num::NonZeroU32::new(index))
-                .or_else(|e| {
-                    if e.kind() == io::ErrorKind::AddrNotAvailable {
-                        warn!(
-                            "stale interface index {index} for '{}', \
-                             falling back to default route: {e}",
-                            iface.name
-                        );
-                        Ok(())
-                    } else {
-                        Err(e)
-                    }
-                })
+                .or_else(handle_stale)
         }
         _ => Err(io::Error::new(
             io::ErrorKind::InvalidInput,
             "unsupported socket family",
         )),
     }
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@clash-lib/src/proxy/utils/platform/apple.rs` around lines 22 - 51, The IPv4
and IPv6 branches duplicate the same or_else error handling for
AddrNotAvailable; extract that logic into a helper (e.g., a closure or function
like handle_addr_not_available) that accepts the Result from
bind_device_by_index_v4/bind_device_by_index_v6 and maps the io::Error with kind
AddrNotAvailable to Ok(()) while otherwise re-raising the error; then replace
both inline or_else blocks with a call to this helper, keeping the warn! call
and iface.name message intact.

&& let Some(iface) = iface
{
must_bind_socket_on_interface(&socket, iface, family)?;
bind_socket_on_interface(&socket, iface, family)?;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial | 💤 Low value

Consider adding error logging for TCP socket binding failures.

The UDP socket binding call sites (lines 134-138, 253-255) use .inspect_err() to log binding failures before propagation, but the TCP binding call site at line 71 propagates errors silently. For consistency and easier debugging, consider adding similar error logging here.

📝 Proposed addition for consistency
     if !cfg!(target_os = "android")
         && let Some(iface) = iface
     {
-        bind_socket_on_interface(&socket, iface, family)?;
+        bind_socket_on_interface(&socket, iface, family).inspect_err(
+            |x| {
+                error!("failed to bind TCP socket to interface: {}", x);
+            },
+        )?;
         trace!("tcp socket bound to interface: {socket:?}");
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
bind_socket_on_interface(&socket, iface, family)?;
if !cfg!(target_os = "android")
&& let Some(iface) = iface
{
bind_socket_on_interface(&socket, iface, family).inspect_err(
|x| {
error!("failed to bind TCP socket to interface: {}", x);
},
)?;
trace!("tcp socket bound to interface: {socket:?}");
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@clash-lib/src/proxy/utils/socket_helpers.rs` at line 71, The TCP bind call
using bind_socket_on_interface(&socket, iface, family)? silently propagates
errors; update this call to mirror the UDP sites by logging the error before
propagating (e.g., call bind_socket_on_interface(&socket, iface,
family).inspect_err(|e| error!("binding TCP socket on interface {} (family {:?})
failed: {:?}", iface, family, e))? or otherwise log the error with the same
logger used elsewhere), so failures from bind_socket_on_interface are logged
with iface, family and the error before being returned.

@codecov

codecov Bot commented May 13, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 4.00000% with 24 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
clash-lib/src/proxy/utils/platform/apple.rs 0.00% 16 Missing ⚠️
clash-lib/src/proxy/utils/platform/unix.rs 0.00% 7 Missing ⚠️
clash-lib/src/app/logging.rs 0.00% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@ibigbug

ibigbug commented May 13, 2026

Copy link
Copy Markdown
Member

Wouldn't the connection fail anyways if it's supposed to go out via a specific interface when fallback to default?

@xiguagua

xiguagua commented May 13, 2026

Copy link
Copy Markdown
Contributor Author

Wouldn't the connection fail anyways if it's supposed to go out via a specific interface when fallback to default?

That's a different issue. This PR addresses an issue where interface binding fails after a device has been idle for a period of time on the iOS platform. causing endpoint speed tests to fail.

@ibigbug

ibigbug commented May 13, 2026

Copy link
Copy Markdown
Member

In that case I think you should properly stop and resume on network path changed and populate the active interface again.

This PR won't fix the issue it just hides it and resurface with a different error instead, right?

@xiguagua

Copy link
Copy Markdown
Contributor Author

In that case I think you should properly stop and resume on network path changed and populate the active interface again.

This PR won't fix the issue it just hides it and resurface with a different error instead, right?

It is not hiding,just let system choose.
resurface with a error not associated

…lt route

macOS returns EADDRNOTAVAIL (os error 49) when bind_device_by_index_v4/v6
is called with a stale interface index (after network changes like Wi-Fi
switches or VPN connect/disconnect). Similarly, Linux bind_device returns
ENODEV when the interface name no longer exists.

Rename must_bind_socket_on_interface -> bind_socket_on_interface to
reflect that binding is now best-effort rather than mandatory. When the
target interface is stale (AddrNotAvailable/NotFound), log a warning and
fall back to the default route instead of failing the connection.
@ibigbug

ibigbug commented May 13, 2026

Copy link
Copy Markdown
Member

It's probably fine and works on iOS where NE protects the sockets.

But it's logically wrong when you decided to bind an interface for a socket but it ended up going through another interface.

@xiguagua xiguagua force-pushed the fix/stale-interface-binding branch from baf2beb to f7cb640 Compare May 13, 2026 03:24
@xiguagua

Copy link
Copy Markdown
Contributor Author

Got it.
Would make another PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants