fix: ss2022 throughput, simple-obfs TLS state machine, MMDB path#1330
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughCanonicalizes the CLI directory before changing CWD in compatibility mode; increases UDP and bidirectional buffer sizes and channel capacities; reduces per-poll UDP allocations by reusing a receive buffer; switches some mutexes to std::sync::Mutex; caches Shadowsocks Context in the handler; and rewrites TLS obfs read/write to persist state across Pending. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
|
@copilot update PR title and description accordingly |
Updated the PR title to "fix: Shadowsocks outbound and simple-obfs TLS state machine fixes" and filled in the description with a full changelog covering the SS context sharing, mutex swap, channel/buffer size increases, TLS obfs read/write state machine rewrite, datagram buffer pre-allocation, and the CLI compatibility-mode path canonicalization fix. |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
clash-lib/src/app/dispatcher/dispatcher_impl.rs (1)
258-258: Consider extracting UDP channel capacity into a named constant.
256is duplicated in two channel initializations. A single constant will keep tuning safer and prevent drift.Suggested refactor
+const UDP_CHANNEL_CAPACITY: usize = 256; ... - let (remote_receiver_w, mut remote_receiver_r) = - tokio::sync::mpsc::channel(256); + let (remote_receiver_w, mut remote_receiver_r) = + tokio::sync::mpsc::channel(UDP_CHANNEL_CAPACITY); ... - let (remote_sender, mut remote_forwarder) = - tokio::sync::mpsc::channel::<UdpPacket>(256); + let (remote_sender, mut remote_forwarder) = + tokio::sync::mpsc::channel::<UdpPacket>(UDP_CHANNEL_CAPACITY);Also applies to: 370-370
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@clash-lib/src/app/dispatcher/dispatcher_impl.rs` at line 258, Extract the duplicated numeric literal 256 used in tokio::sync::mpsc::channel(...) calls into a single named constant (e.g., UDP_CHANNEL_CAPACITY) in dispatcher_impl.rs and replace both channel initializations with that constant; locate the two occurrences where tokio::sync::mpsc::channel(256) is called (around the usages that create the UDP channels) and update them to use the constant so capacity tuning is centralized and cannot drift.clash-lib/src/proxy/transport/simple_obfs/tls.rs (1)
47-50: Avoid heap-allocating header scratch on every record boundary.
ReadState::SkippingHeader(Vec<u8>, usize)plusvec![0u8; 3]allocates for every TLS record. A fixed scratch buffer with atarget_len/filledpair would keep this parser allocation-free on the hot path.Also applies to: 217-229, 329-331
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@clash-lib/src/proxy/transport/simple_obfs/tls.rs` around lines 47 - 50, ReadState is currently allocating a Vec for SkippingHeader on every record boundary; replace the heap allocation with a fixed-size stack scratch buffer and explicit counters: change SkippingHeader(Vec<u8>, usize) to something like SkippingHeader([u8; 3], usize /*filled*/, usize /*target_len*/) (or two usize fields for filled/target_len plus a [u8;3] buffer), and update all code paths that construct or consume ReadState (parsing loops that previously used vec![0u8;3]) to write into the fixed buffer and advance counters without allocating. Apply the same replacement pattern to any other places using heap scratch buffers for header parsing so the parser is allocation-free on the hot path.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@clash-lib/src/proxy/shadowsocks/outbound/datagram.rs`:
- Line 229: Occurrences of let mut w = self.w.lock().unwrap() in datagram.rs
will panic on a poisoned mutex; change each to handle LockResult by matching or
using .map_err to convert a poisoned lock into a recoverable io::Error (e.g.,
return Err(std::io::Error::new(std::io::ErrorKind::Other, "mutex poisoned"))) so
UDP poll paths return an I/O error instead of panicking; update all three sites
(the three occurrences of self.w.lock().unwrap()) to propagate a std::io::Error
when the lock is poisoned.
In `@clash-lib/src/proxy/transport/simple_obfs/tls.rs`:
- Around line 78-108: poll_write currently wraps the plaintext into
this.write_buf and then immediately calls ready!(drain_write_buf(this, cx)),
which can cause partial inner writes while poll_write returns Pending —
violating AsyncWrite. Change poll_write so that after computing end and
populating this.write_buf (using make_client_hello_msg or the TLS record
branch), you set this.write_pos = 0 and this.write_committed = end and
immediately return Poll::Ready(Ok(end)) without calling drain_write_buf; leave
the existing early check for this.write_committed > 0 (and its drain_write_buf
usage) intact so actual sending is performed later by poll_flush / subsequent
calls, and remove the post-population ready!(drain_write_buf(this, cx))
invocation.
---
Nitpick comments:
In `@clash-lib/src/app/dispatcher/dispatcher_impl.rs`:
- Line 258: Extract the duplicated numeric literal 256 used in
tokio::sync::mpsc::channel(...) calls into a single named constant (e.g.,
UDP_CHANNEL_CAPACITY) in dispatcher_impl.rs and replace both channel
initializations with that constant; locate the two occurrences where
tokio::sync::mpsc::channel(256) is called (around the usages that create the UDP
channels) and update them to use the constant so capacity tuning is centralized
and cannot drift.
In `@clash-lib/src/proxy/transport/simple_obfs/tls.rs`:
- Around line 47-50: ReadState is currently allocating a Vec for SkippingHeader
on every record boundary; replace the heap allocation with a fixed-size stack
scratch buffer and explicit counters: change SkippingHeader(Vec<u8>, usize) to
something like SkippingHeader([u8; 3], usize /*filled*/, usize /*target_len*/)
(or two usize fields for filled/target_len plus a [u8;3] buffer), and update all
code paths that construct or consume ReadState (parsing loops that previously
used vec![0u8;3]) to write into the fixed buffer and advance counters without
allocating. Apply the same replacement pattern to any other places using heap
scratch buffers for header parsing so the parser is allocation-free on the hot
path.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 56ef81c0-f001-4987-9900-7054b5901513
📒 Files selected for processing (6)
clash-bin/src/main.rsclash-lib/src/app/dispatcher/dispatcher_impl.rsclash-lib/src/proxy/datagram.rsclash-lib/src/proxy/shadowsocks/outbound/datagram.rsclash-lib/src/proxy/shadowsocks/outbound/mod.rsclash-lib/src/proxy/transport/simple_obfs/tls.rs
| target: std::net::SocketAddr, | ||
| ) -> Poll<io::Result<usize>> { | ||
| let mut w = self.w.try_lock().expect("must acquire"); | ||
| let mut w = self.w.lock().unwrap(); |
There was a problem hiding this comment.
Avoid panicking on poisoned mutexes in UDP poll paths.
Line 229, Line 247, and Line 259 use lock().unwrap(). If the lock is poisoned, this panics and can tear down active UDP handling instead of returning an I/O error.
Suggested fix
- let mut w = self.w.lock().unwrap();
+ let mut w = match self.w.lock() {
+ Ok(guard) => guard,
+ Err(_) => {
+ return Poll::Ready(Err(io::Error::other(
+ "shadowsocks udp writer lock poisoned",
+ )));
+ }
+ };
- let mut w = self.w.lock().unwrap();
+ let mut w = match self.w.lock() {
+ Ok(guard) => guard,
+ Err(_) => {
+ return Poll::Ready(Err(io::Error::other(
+ "shadowsocks udp writer lock poisoned",
+ )));
+ }
+ };
- let mut g = self.r.lock().unwrap();
+ let mut g = match self.r.lock() {
+ Ok(guard) => guard,
+ Err(_) => {
+ return Poll::Ready(Err(io::Error::other(
+ "shadowsocks udp reader lock poisoned",
+ )));
+ }
+ };Also applies to: 247-247, 259-259
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@clash-lib/src/proxy/shadowsocks/outbound/datagram.rs` at line 229,
Occurrences of let mut w = self.w.lock().unwrap() in datagram.rs will panic on a
poisoned mutex; change each to handle LockResult by matching or using .map_err
to convert a poisoned lock into a recoverable io::Error (e.g., return
Err(std::io::Error::new(std::io::ErrorKind::Other, "mutex poisoned"))) so UDP
poll paths return an I/O error instead of panicking; update all three sites (the
three occurrences of self.w.lock().unwrap()) to propagate a std::io::Error when
the lock is poisoned.
| // If a previous chunk is still being sent, finish draining it first. | ||
| // Returning Ok(write_committed) signals to the caller how many source | ||
| // bytes that chunk covered; the caller then advances its window. | ||
| if this.write_committed > 0 { | ||
| ready!(drain_write_buf(this, cx))?; | ||
| let committed = this.write_committed; | ||
| this.write_committed = 0; | ||
| return Poll::Ready(Ok(committed)); | ||
| } | ||
|
|
||
| // Wrap the next chunk (at most CHUNK_SIZE source bytes) into a TLS | ||
| // Application Data record (or ClientHello for the very first write). | ||
| let end = CHUNK_SIZE.min(buf.len()); | ||
| let chunk = &buf[..end]; | ||
| this.write_buf = if this.first_request { | ||
| this.first_request = false; | ||
| make_client_hello_msg(chunk, &this.server).into_owned() | ||
| } else { | ||
| let mut v = Vec::with_capacity(5 + chunk.len()); | ||
| v.extend_from_slice(&[0x17, 0x03, 0x03]); | ||
| v.push((chunk.len() >> 8) as u8); | ||
| v.push((chunk.len() & 0xff) as u8); | ||
| v.extend_from_slice(chunk); | ||
| v | ||
| }; | ||
| this.write_pos = 0; | ||
| this.write_committed = end; | ||
|
|
||
| ready!(drain_write_buf(this, cx))?; | ||
| this.write_committed = 0; | ||
| Poll::Ready(Ok(end)) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
fd -t f "tls.rs" clash-lib/src/proxy/transport/simple_obfs/Repository: Watfaq/clash-rs
Length of output: 108
🏁 Script executed:
wc -l clash-lib/src/proxy/transport/simple_obfs/tls.rsRepository: Watfaq/clash-rs
Length of output: 112
🏁 Script executed:
cat -n clash-lib/src/proxy/transport/simple_obfs/tls.rsRepository: Watfaq/clash-rs
Length of output: 14190
🌐 Web query:
Tokio AsyncWrite poll_write contract Pending semantics
💡 Result:
Tokio’s AsyncWrite::poll_write “pending semantics” are: 1) What Pending means - If poll_write can’t make progress, it must return Poll::Pending. - When it returns Pending, it arranges (via cx.waker) for the current task to be woken when the object becomes writable (or is closed).
Citations:
- 1: https://docs.rs/tokio_wasi/latest/tokio/io/trait.AsyncWrite.html
- 2: https://github.com/tokio-rs/tokio/blob/master/tokio/src/io/async_write.rs
- 3: https://docs.rs/tokio/latest/aarch64-apple-darwin/src/tokio/io/async_write.rs.html
🌐 Web query:
Tokio AsyncWrite poll_write no data written from buffer Pending cancel safety
💡 Result:
In Tokio's AsyncWrite trait, poll_write returning Poll::Pending explicitly means no data was written from the provided buffer. The I/O object is not currently writable, and the current task is registered via cx.waker to be notified when it becomes writable again. On the next poll after waking (with potentially a different buf), it will attempt to write from that new buffer. This behavior ensures that poll_write itself is cancel-safe: if the future is dropped/cancelled while pending (e.g., in tokio::select! where another branch completes), no partial data has been written or buffered from the caller's buffer, avoiding data loss or corruption. The contract guarantees atomic writes—no partial writes without Ready(Ok(n)), making it safe to retry with the same or different data on resumption. Higher-level methods like AsyncWriteExt::write have documented cancel safety: "This method is cancellation safe in the sense that if it is used as the event in a tokio::select! statement and some other branch completes first, then it is guaranteed that no data was written to this AsyncWrite." Similar for write_vectored. However, methods like write_all are not fully cancel-safe without using write_all_buf, which tracks progress via bytes::Buf. Sources confirm this from official Tokio source/docs: Pending = no data written, waker registered; used in implementations like AsyncFd/TcpStream with poll_write_ready loops that respect WouldBlock without partial writes.
Citations:
- 1: https://docs.rs/tokio_wasi/latest/tokio/io/trait.AsyncWrite.html
- 2: https://github.com/tokio-rs/tokio/blob/master/tokio/src/io/async_write.rs
- 3: https://docs.rs/tokio/latest/tokio/io/trait.AsyncWriteExt.html
Don't send wrapped bytes before poll_write reports them accepted.
This code wraps plaintext into write_buf, then immediately tries to drain it via ready!(drain_write_buf(this, cx)) on line 106. If the inner stream returns Pending after accepting part of the wrapped record, poll_write returns Pending to the caller. However, Tokio's AsyncWrite contract requires that Pending means zero bytes from the provided buffer were written or buffered. Here, the plaintext chunk has been wrapped and partially sent, violating that contract. If the caller's future is cancelled (e.g., in tokio::select!), they will retry assuming their bytes are still pending—but partial TLS records are already in the inner stream, causing protocol desynchronization and AEAD tag failures.
Accept the plaintext immediately in poll_write by returning Ok(end) without draining. Let poll_flush and subsequent calls (via write_committed) handle sending the wrapped bytes. This ensures poll_write never returns Pending after accepting source bytes, upholding the AsyncWrite contract.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@clash-lib/src/proxy/transport/simple_obfs/tls.rs` around lines 78 - 108,
poll_write currently wraps the plaintext into this.write_buf and then
immediately calls ready!(drain_write_buf(this, cx)), which can cause partial
inner writes while poll_write returns Pending — violating AsyncWrite. Change
poll_write so that after computing end and populating this.write_buf (using
make_client_hello_msg or the TLS record branch), you set this.write_pos = 0 and
this.write_committed = end and immediately return Poll::Ready(Ok(end)) without
calling drain_write_buf; leave the existing early check for this.write_committed
> 0 (and its drain_write_buf usage) intact so actual sending is performed later
by poll_flush / subsequent calls, and remove the post-population
ready!(drain_write_buf(this, cx)) invocation.
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Pull request overview
This PR addresses several runtime correctness and performance issues across transports and UDP/TCP plumbing, including a fix to the simple-obfs TLS framing state machine and some buffering/path-resolution improvements.
Changes:
- Reworked
simple_obfsTLS read/write handling to bepoll_*-correct (no losing bytes onPoll::Pending, handles partial writes). - Reduced per-poll UDP allocations by pre-allocating receive buffers and reusing a shared Shadowsocks
Context. - Adjusted dispatcher buffering/channel sizing and fixed
clash-bincompatibility-mode cwd handling for relative--directory.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| clash-lib/src/proxy/transport/simple_obfs/tls.rs | Replaces stack-pinned read_exact/single-call writes with persistent state for poll_read/poll_write correctness. |
| clash-lib/src/proxy/shadowsocks/outbound/mod.rs | Stores a shared Shadowsocks Context in the handler and reuses it for stream/datagram setup. |
| clash-lib/src/proxy/shadowsocks/outbound/datagram.rs | Changes UDP IO locking strategy (Tokio mutex → std mutex) around split sink/stream. |
| clash-lib/src/proxy/datagram.rs | Adds a reusable recv_buf to avoid repeated 64KiB allocations in poll_next. |
| clash-lib/src/app/dispatcher/dispatcher_impl.rs | Increases default TCP relay buffer size and enlarges UDP mpsc channel sizes. |
| clash-bin/src/main.rs | Canonicalizes --directory before set_current_dir and avoids passing a now-wrong relative cwd into start_scaffold. |
Comments suppressed due to low confidence (1)
clash-lib/src/proxy/shadowsocks/outbound/datagram.rs:233
std::sync::Mutex::lock()is used insidepoll_send_to(and other poll_* methods). If this ever contends it will block a Tokio worker thread inside a poll function, which can lead to stalls/deadlocks. Alsounwrap()will panic on a poisoned mutex. Consider switching back to a non-blocking strategy (e.g.,try_lockand returnPoll::Pending/WouldBlockwhen unavailable) and map poisoning to anio::Errorinstead of panicking.
let mut w = self.w.lock().unwrap();
match w.start_send_unpin(UdpPacket {
data: buf.to_vec(),
src_addr: SocksAddr::any_ipv4(),
dst_addr: target.into(),
| let mut g = self.r.lock().unwrap(); | ||
| let (r, remained) = &mut *g; | ||
|
|
There was a problem hiding this comment.
std::sync::Mutex::lock().unwrap() in poll_recv has the same issues as on the send side: potential blocking inside a poll function and a possible panic if the mutex is poisoned. Prefer a non-blocking lock acquisition (or redesign so locking isn’t needed) and convert lock errors into io::Error rather than panicking.
| this.remain = length; | ||
| // Subsequent records have a 3-byte header (type + version). | ||
| this.read_state = ReadState::SkippingHeader(vec![0u8; 3], 0); | ||
| // Loop back to Phase 1 to deliver payload bytes immediately. |
There was a problem hiding this comment.
ReadState::SkippingHeader allocates a new Vec for every TLS record (vec![0u8; 3] in the hot path). This creates per-record heap churn in the read loop. Since the skipped headers are fixed-size (105 for first response, then 3), consider representing these as fixed-size arrays in the state (or store the remaining skip count) to avoid repeated allocations.
| // SS2022 (AEAD-2022) MAX_PACKET_SIZE is 0xFFFF (65535 bytes). Using a relay | ||
| // buffer smaller than that forces the cipher to split every full packet into | ||
| // multiple smaller encrypted chunks, multiplying encrypt/decrypt overhead. |
There was a problem hiding this comment.
The comment says SS2022 max packet size is 0xFFFF (65535), but DEFAULT_BUFFER_SIZE is set to 64 * 1024 (= 65536). Either adjust the constant to match the stated limit, or update the comment to explain why a 64 KiB buffer (one byte larger) is chosen (e.g., power-of-two sizing).
| // SS2022 (AEAD-2022) MAX_PACKET_SIZE is 0xFFFF (65535 bytes). Using a relay | |
| // buffer smaller than that forces the cipher to split every full packet into | |
| // multiple smaller encrypted chunks, multiplying encrypt/decrypt overhead. | |
| // SS2022 (AEAD-2022) MAX_PACKET_SIZE is 0xFFFF (65535 bytes). We use a 64 KiB | |
| // relay buffer (65536 bytes) as a convenient power-of-two allocation, which is | |
| // still large enough to hold the largest SS2022 packet without forcing the | |
| // cipher to split a full packet into multiple smaller encrypted chunks. |
|
@copilot fix the PR title and description again it was overriden by some one. also mention the cli compatibility fix |
Restored the PR title and description. The CLI compatibility fix (canonicalizing the |
What does this PR do?
Fixes several bugs causing SS2022 outbound to be slow or fail (decrypt errors, low throughput) while DIRECT was full speed.
Bug fixes
simple-obfs TLS: data loss on
Poll::Pending(root cause of decrypt failures)The original
reading()helper createdread_exactfutures with stack-local buffers. Whenread_exactreturnedPoll::Pendingafter consuming M bytes from the inner stream, the future and its buffer were dropped. The nextpoll_readrestarted from byte 0 — M bytes were permanently lost. This desynchronised the TLS frame parser, causing AEAD tag mismatches and wildly fluctuating speeds.Fix: rewrote
poll_read/poll_writeas explicit state machines (ReadStateenum,write_buf/write_pos/write_committedfields) whose buffers live in the struct and survivePoll::Pending. Also replacedVec<u8>scratch buffers inSkippingHeaderwith a fixed-size[u8; 105]array to avoid per-record heap allocations.SS2022: undersized copy buffer
DEFAULT_BUFFER_SIZEwas 16 KB — 4× smaller than SS2022'sMAX_PACKET_SIZEof 65535 bytes. Increased to 64 KB.SS2022 UDP: per-packet
Contextallocation and recv bufferContext::new_shared()was called on every UDP send. Changed to a sharedArc<Context>perHandler. Also pre-allocated a 65535-byterecv_bufin the UDP datagram wrapper instead of allocating per packet.MMDB: path resolution after
set_current_dirset_current_dir(relative_path)changed the cwd, then the same relative path was used again, doubling it. Fix: canonicalize the path before callingset_current_dir.Type
Checklist
Summary by CodeRabbit
Bug Fixes
Performance Improvements
Refactor