refactor(test): run all docker integration tests in parallel#1335
Conversation
- Replace hardcoded PORT=10002 and serial_test::serial with dynamic port allocation via alloc_docker_port() (OS-assigned free ports) - ping_pong_test / ping_pong_udp_test: bind to port 0 internally so the echo server gets a TOCTOU-free OS-assigned port - DockerTestRunnerBuilder::host_port(host, container) for cases where the container internal port differs (e.g. SSH always on 2222) - smart/relay get_ss_runner: add .port(port) to avoid 10002 collisions - Fix crypto provider setup for watfaq-rustls and reality transport tests - Gate console_subscriber behind #[cfg(not(test))] to fix start_and_stop test when compiled with --all-features (telemetry feature enabled) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Replace fixed 1500ms sleep with polling loop (20 × 500ms) so the test catches the connection window without brittleness - Wait for mixed-port 8899 before spawning the request task - Read full response body to keep connection alive during polling - Store JoinHandle in ClashInstance and join on drop to prevent stale Tokio runtimes from holding ports across test runs Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
📝 WalkthroughWalkthroughWorkspace linting and telemetry cfg checks adjusted; watfaq_rustls crypto providers explicitly registered; extensive test infra refactor to allocate dynamic Docker host ports, thread them through runner builders, remove forced serial test attributes, improve test helpers, and make thread shutdown deterministic. Changes
Sequence Diagram(s)sequenceDiagram
participant TestRunner as Test Runner
participant Alloc as alloc_docker_port()
participant Builder as DockerTestRunnerBuilder
participant Docker as Docker Engine
participant Container as Test Container
participant Proxy as Clash Proxy
TestRunner->>Alloc: request free host port
Alloc-->>TestRunner: host_port
TestRunner->>Builder: host_port(host_port, container_port)
Builder->>Docker: create/run container with explicit port_bindings
Docker-->>Container: start (host_port ↔ container_port)
TestRunner->>Proxy: connect to host:host_port
Proxy->>Container: forward traffic via mapped port
Container-->>Proxy: respond
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
clash-lib/src/proxy/anytls/mod.rs (1)
824-842:⚠️ Potential issue | 🟠 MajorFallback path uses the wrong port after dynamic host-port mapping
With dynamic host mapping, the localhost fallback must use
host_port, not10002. Otherwise, tests fail whenevercontainer_ip()is unavailable.💡 Suggested fix
- let opts = HandlerOptions { + let container_ip = runner.container_ip(); + let port = if container_ip.is_some() { 10002 } else { host_port }; + let opts = HandlerOptions { name: "test-anytls".to_owned(), common_opts: Default::default(), - server: runner.container_ip().unwrap_or(LOCAL_ADDR.to_owned()), - port: 10002, + server: container_ip.unwrap_or(LOCAL_ADDR.to_owned()), + port, password: "example".to_owned(), udp: true, tls: Some(Box::new(tls)), transport: None, };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@clash-lib/src/proxy/anytls/mod.rs` around lines 824 - 842, In test_anytls update the HandlerOptions initialization so the fallback localhost port uses the dynamically allocated host_port instead of the hardcoded 10002: change the port field on the HandlerOptions struct (in function test_anytls) to use host_port to match the dynamic mapping when runner.container_ip() is None.clash-lib/src/proxy/hysteria2/mod.rs (1)
744-750:⚠️ Potential issue | 🟠 MajorLocalhost fallback still points to container port
After dynamic mapping, fallback-to-localhost must use
host_port. Keepingport = 10002breaks environments wherecontainer_ip()is unavailable.💡 Suggested fix
- let container_ip = - container.container_ip().unwrap_or("127.0.0.1".to_owned()); - - let ip = IpAddr::from_str(&container_ip) - .unwrap_or(IpAddr::V4(Ipv4Addr::new(127, 0, 0, 1))); - let port = 10002; + let container_ip = container.container_ip(); + let ip = IpAddr::from_str( + container_ip.as_deref().unwrap_or("127.0.0.1"), + ) + .unwrap_or(IpAddr::V4(Ipv4Addr::new(127, 0, 0, 1))); + let port = if container_ip.is_some() { 10002 } else { host_port };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@clash-lib/src/proxy/hysteria2/mod.rs` around lines 744 - 750, The fallback currently hardcodes port = 10002 after resolving container.container_ip(), which causes localhost fallbacks to still use the container port; change the fallback logic so when container.container_ip() is absent (or when IpAddr parsing falls back to 127.0.0.1) you use the host_port value instead of the hardcoded 10002. Update the section around container_ip, ip and port (replace the fixed port assignment with host_port when using localhost fallback) so the code sets port = host_port for local fallback scenarios.clash-lib/tests/common/mod.rs (1)
58-70:⚠️ Potential issue | 🟠 MajorHandle startup failure without leaking the spawned thread
If startup readiness fails,
start()returns early while the spawned thread may continue running. Please shut down and join the thread on that error path to avoid cross-test interference.💡 Suggested fix
- let handle = std::thread::spawn(move || { + let mut handle = Some(std::thread::spawn(move || { start_clash(options).expect("Failed to start clash"); - }); + })); @@ // Wait for the main port (usually API port) to be ready if let Some(&main_port) = ports.first() { - wait_port_ready(main_port)?; + if let Err(err) = wait_port_ready(main_port) { + clash_lib::shutdown(); + if let Some(h) = handle.take() { + let _ = h.join(); + } + return Err(err); + } } @@ - handle: Some(handle), + handle, })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@clash-lib/tests/common/mod.rs` around lines 58 - 70, If startup readiness (the call to wait_port_ready) fails, ensure the spawned thread started with start_clash is cleanly shut down and joined before returning error to avoid leaking the thread; on the Err path send a shutdown signal or invoke the appropriate stopper (e.g., call a shutdown/stop helper or close any channels you use to signal start_clash to exit), then call handle.join() (or join the thread) and only then return the Err from start(); reference the spawned thread variable handle, the start_clash invocation, and the wait_port_ready call in the start() function to implement this cleanup.clash-lib/src/proxy/trojan/mod.rs (1)
302-307:⚠️ Potential issue | 🟠 MajorBoth Trojan tests still use container port on localhost fallback
When
container_ip()is absent, these tests targetLOCAL_ADDRbut still keepport: 10002. With dynamic mapping, that fallback must usehost_port.💡 Suggested fix
- let opts = HandlerOptions { + let container_ip = container.container_ip(); + let port = if container_ip.is_some() { 10002 } else { host_port }; + let opts = HandlerOptions { name: "test-trojan-ws".to_owned(), common_opts: Default::default(), - server: container.container_ip().unwrap_or(LOCAL_ADDR.to_owned()), - port: 10002, + server: container_ip.unwrap_or(LOCAL_ADDR.to_owned()), + port, password: "example".to_owned(), udp: true, tls: Some(Box::new(tls)), transport: Some(Box::new(transport)), }; @@ - let opts = HandlerOptions { + let container_ip = runner.container_ip(); + let port = if container_ip.is_some() { 10002 } else { host_port }; + let opts = HandlerOptions { name: "test-trojan-grpc".to_owned(), common_opts: Default::default(), - server: runner.container_ip().unwrap_or(LOCAL_ADDR.to_owned()), - port: 10002, + server: container_ip.unwrap_or(LOCAL_ADDR.to_owned()), + port, password: "example".to_owned(), udp: true, tls: Some(Box::new(tls)), transport: Some(Box::new(transport)), };Also applies to: 358-363
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@clash-lib/src/proxy/trojan/mod.rs` around lines 302 - 307, The tests build HandlerOptions with server fallback to LOCAL_ADDR but leave port hardcoded to 10002, which breaks when the container IP is absent and ports are dynamically mapped; update the test setup (the HandlerOptions construction used in the trojan WS tests) to obtain the port from the test container (e.g., call container.host_port() or similar and unwrap_or(DEFAULT_PORT)) and use that value for the port field instead of the hardcoded 10002; apply the same change to both HandlerOptions instances used in the trojan tests so server and port both reflect the container mapping.
🧹 Nitpick comments (2)
clash-lib/src/proxy/shadowquic/mod.rs (1)
349-350: Redundant assignment ofover_stream.
gen_optionsis called withover_stream: trueas the third parameter (line 349), but thenopts.over_stream = trueis set again on line 350. This is harmless but redundant.♻️ Suggested cleanup
- let mut opts = gen_options(container_ip, host_port, true)?; - opts.over_stream = true; + let opts = gen_options(container_ip, host_port, true)?;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@clash-lib/src/proxy/shadowquic/mod.rs` around lines 349 - 350, The code calls gen_options(container_ip, host_port, true) which already sets over_stream=true, then redundantly sets opts.over_stream = true; remove the unnecessary assignment to opts.over_stream (the second line) so the option relies on the value passed into gen_options (check the gen_options call site and the opts variable to confirm no other mutations are intended).clash-lib/src/proxy/utils/test_utils/docker_utils/mod.rs (1)
740-741: Consider optional per-test log capture for parallel readability.Line 740-Line 741 inheriting stdout/stderr from many parallel subprocesses can make CI logs hard to correlate. Optional: keep inherit for local runs and allow per-label log files in CI mode.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@clash-lib/src/proxy/utils/test_utils/docker_utils/mod.rs` around lines 740 - 741, The process builder currently uses std::process::Stdio::inherit() for both stdout and stderr which interleaves logs from parallel subprocesses; change the code that sets .stdout(...) and .stderr(...) (the process spawn/command builder in this module) to be configurable: accept a parameter or read an env var (e.g., CI or CLASH_TEST_LOG_DIR) or per-test label so that in local runs it uses Stdio::inherit(), but in CI or when a log-dir/label is provided it opens per-test files (create a File for "<label>.out" / "<label>.err") and pass them as the process stdout/stderr (or use Stdio::piped() and write into those files); ensure the new option is passed through the function that constructs/spawns the docker process and keep the original inherit behavior as default.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@clash-lib/src/proxy/utils/test_utils/docker_utils/mod.rs`:
- Around line 25-33: The current alloc_port() drops the TcpListener immediately
creating a TOCTOU race; change its API to allocate-and-reserve the port by
returning the bound TcpListener (e.g., pub fn alloc_port() ->
std::net::TcpListener) so callers can read listener.local_addr().port() and keep
the listener open until they actually bind or start the Docker container; update
call sites to accept a TcpListener rather than just a u16 (or add a new
alloc_reserved_listener() that returns the listener and deprecate the old
alloc_port()).
---
Outside diff comments:
In `@clash-lib/src/proxy/anytls/mod.rs`:
- Around line 824-842: In test_anytls update the HandlerOptions initialization
so the fallback localhost port uses the dynamically allocated host_port instead
of the hardcoded 10002: change the port field on the HandlerOptions struct (in
function test_anytls) to use host_port to match the dynamic mapping when
runner.container_ip() is None.
In `@clash-lib/src/proxy/hysteria2/mod.rs`:
- Around line 744-750: The fallback currently hardcodes port = 10002 after
resolving container.container_ip(), which causes localhost fallbacks to still
use the container port; change the fallback logic so when
container.container_ip() is absent (or when IpAddr parsing falls back to
127.0.0.1) you use the host_port value instead of the hardcoded 10002. Update
the section around container_ip, ip and port (replace the fixed port assignment
with host_port when using localhost fallback) so the code sets port = host_port
for local fallback scenarios.
In `@clash-lib/src/proxy/trojan/mod.rs`:
- Around line 302-307: The tests build HandlerOptions with server fallback to
LOCAL_ADDR but leave port hardcoded to 10002, which breaks when the container IP
is absent and ports are dynamically mapped; update the test setup (the
HandlerOptions construction used in the trojan WS tests) to obtain the port from
the test container (e.g., call container.host_port() or similar and
unwrap_or(DEFAULT_PORT)) and use that value for the port field instead of the
hardcoded 10002; apply the same change to both HandlerOptions instances used in
the trojan tests so server and port both reflect the container mapping.
In `@clash-lib/tests/common/mod.rs`:
- Around line 58-70: If startup readiness (the call to wait_port_ready) fails,
ensure the spawned thread started with start_clash is cleanly shut down and
joined before returning error to avoid leaking the thread; on the Err path send
a shutdown signal or invoke the appropriate stopper (e.g., call a shutdown/stop
helper or close any channels you use to signal start_clash to exit), then call
handle.join() (or join the thread) and only then return the Err from start();
reference the spawned thread variable handle, the start_clash invocation, and
the wait_port_ready call in the start() function to implement this cleanup.
---
Nitpick comments:
In `@clash-lib/src/proxy/shadowquic/mod.rs`:
- Around line 349-350: The code calls gen_options(container_ip, host_port, true)
which already sets over_stream=true, then redundantly sets opts.over_stream =
true; remove the unnecessary assignment to opts.over_stream (the second line) so
the option relies on the value passed into gen_options (check the gen_options
call site and the opts variable to confirm no other mutations are intended).
In `@clash-lib/src/proxy/utils/test_utils/docker_utils/mod.rs`:
- Around line 740-741: The process builder currently uses
std::process::Stdio::inherit() for both stdout and stderr which interleaves logs
from parallel subprocesses; change the code that sets .stdout(...) and
.stderr(...) (the process spawn/command builder in this module) to be
configurable: accept a parameter or read an env var (e.g., CI or
CLASH_TEST_LOG_DIR) or per-test label so that in local runs it uses
Stdio::inherit(), but in CI or when a log-dir/label is provided it opens
per-test files (create a File for "<label>.out" / "<label>.err") and pass them
as the process stdout/stderr (or use Stdio::piped() and write into those files);
ensure the new option is passed through the function that constructs/spawns the
docker process and keep the original inherit behavior as default.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 620208c8-8f45-4c08-9366-0b9f013ca13a
📒 Files selected for processing (21)
Cargo.tomlclash-lib/src/app/logging.rsclash-lib/src/lib.rsclash-lib/src/proxy/anytls/mod.rsclash-lib/src/proxy/group/relay/mod.rsclash-lib/src/proxy/group/smart/mod.rsclash-lib/src/proxy/hysteria2/mod.rsclash-lib/src/proxy/shadowquic/mod.rsclash-lib/src/proxy/shadowsocks/outbound/mod.rsclash-lib/src/proxy/socks/outbound/mod.rsclash-lib/src/proxy/ssh/mod.rsclash-lib/src/proxy/transport/reality.rsclash-lib/src/proxy/trojan/mod.rsclash-lib/src/proxy/tuic/mod.rsclash-lib/src/proxy/utils/test_utils/docker_utils/docker_runner.rsclash-lib/src/proxy/utils/test_utils/docker_utils/mod.rsclash-lib/src/proxy/vless/mod.rsclash-lib/src/proxy/vmess/mod.rsclash-lib/src/proxy/wg/mod.rsclash-lib/tests/api_tests.rsclash-lib/tests/common/mod.rs
| /// Allocate a free TCP port by asking the OS. The listener is immediately | ||
| /// dropped so the caller can bind the same port. Used by throughput tests | ||
| /// where multiple ports must be reserved up-front. | ||
| #[cfg(throughput_test)] | ||
| pub fn alloc_port() -> u16 { | ||
| // Bind port 0 to let the OS pick a free port, then release it. | ||
| // TOCTOU race is acceptable in test environments. | ||
| let listener = | ||
| std::net::TcpListener::bind("127.0.0.1:0").expect("alloc_port: bind failed"); | ||
| listener | ||
| .local_addr() | ||
| .expect("alloc_port: local_addr") | ||
| .port() | ||
| let listener = std::net::TcpListener::bind("127.0.0.1:0") | ||
| .expect("failed to allocate a free port"); | ||
| listener.local_addr().unwrap().port() | ||
| } |
There was a problem hiding this comment.
alloc_port still leaves a TOCTOU race in parallel runs.
Line 30-Line 32 allocates a free port and immediately releases it, so another process/test can bind it before the caller does. This can still cause intermittent EADDRINUSE in parallel Docker CI.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@clash-lib/src/proxy/utils/test_utils/docker_utils/mod.rs` around lines 25 -
33, The current alloc_port() drops the TcpListener immediately creating a TOCTOU
race; change its API to allocate-and-reserve the port by returning the bound
TcpListener (e.g., pub fn alloc_port() -> std::net::TcpListener) so callers can
read listener.local_addr().port() and keep the listener open until they actually
bind or start the Docker container; update call sites to accept a TcpListener
rather than just a u16 (or add a new alloc_reserved_listener() that returns the
listener and deprecate the old alloc_port()).
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
When compiled with both ring and aws-lc-rs features (e.g. --all-features), watfaq-rustls treats the combination as ambiguous in get_default_or_install_from_crate_features() and returns None, causing a panic. RealityConfig also calls CryptoProvider::get_default().unwrap() directly. Explicit installation is required for both code paths. Add a comment explaining the reason instead of leaving it implicit. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
📊 Proxy Throughput Results
Tests ran 4 variant(s) in parallel; each direction transfers the full payload. Full test logDownload the |
- Restore #[cfg(all(feature = "telemetry", tokio_unstable))] on console_subscriber::spawn(): RUSTFLAGS env var overrides cargo config rustflags, stripping tokio_unstable when docker tests set RUSTFLAGS. The cfg gate excludes console_subscriber at compile time when tokio_unstable is absent, preventing a runtime panic. - Restore 'cfg(tokio_unstable)' to unexpected_cfgs allowlist in Cargo.toml. - Fix mmdb race condition under parallel tests: concurrent callers all use the same Country.mmdb path; when one caller removes the file for re-download, a concurrent caller's remove_file fails with ENOENT via ? before the re-download happens. Ignore the remove error instead. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
clash-lib/src/common/mmdb.rs (1)
122-124: Preserve the race fix, but log non-NotFounddelete failures.Line 124 currently suppresses every
remove_fileerror. It should still ignoreNotFound, but emit a warning for other error kinds so real filesystem problems are visible during MMDB recovery.Proposed adjustment
- let _ = fs::remove_file(&mmdb_file); + if let Err(err) = fs::remove_file(&mmdb_file) { + if err.kind() != std::io::ErrorKind::NotFound { + warn!( + "failed to remove corrupt mmdb `{}` before re-download: {}", + mmdb_file.to_string_lossy(), + err + ); + } + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@clash-lib/src/common/mmdb.rs` around lines 122 - 124, Preserve the race-condition ignore for ENOENT but change the blanket suppression of fs::remove_file(&mmdb_file) errors: match the Result, ignore std::io::ErrorKind::NotFound, and emit a warning (including the error and the mmdb_file path) for any other Err so real filesystem failures surface during MMDB recovery; modify the code around the fs::remove_file(&mmdb_file) call (referencing mmdb_file and fs::remove_file) to use an if let / match and log with the project's warning logger (e.g., warn! or log::warn) including the error details.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@clash-lib/src/common/mmdb.rs`:
- Around line 122-124: Preserve the race-condition ignore for ENOENT but change
the blanket suppression of fs::remove_file(&mmdb_file) errors: match the Result,
ignore std::io::ErrorKind::NotFound, and emit a warning (including the error and
the mmdb_file path) for any other Err so real filesystem failures surface during
MMDB recovery; modify the code around the fs::remove_file(&mmdb_file) call
(referencing mmdb_file and fs::remove_file) to use an if let / match and log
with the project's warning logger (e.g., warn! or log::warn) including the error
details.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: c1ca9eb2-a030-4e70-b7f5-2204eaa19bd3
📒 Files selected for processing (2)
clash-lib/src/app/logging.rsclash-lib/src/common/mmdb.rs
What does this PR do?
Makes all
#[cfg(docker_test)]integration tests run concurrently instead of serially, dramatically reducing total CI test time.Key changes:
docker_utils/mod.rs—ping_pong_testandping_pong_udp_testnow bind0.0.0.0:0(OS-assigned port) instead of taking a hardcodedportargument; eliminates TOCTOU and collisiondocker_runner.rs— addalloc_docker_port()(OS-assigned port for Docker host mapping) andDockerTestRunnerBuilder::host_port(host, container)for containers with fixed internal ports (e.g. SSH on 2222)PORT=10002withalloc_docker_port(); remove#[serial_test::serial]guardslogging.rs— gateconsole_subscriber::spawn()on#[cfg(all(feature = "telemetry", tokio_unstable))]instead ofnot(test); fixes panic in api_tests subprocesses compiled withoutRUSTFLAGS=--cfg tokio_unstablelib.rs—setup_default_crypto_provider()initialises bothrustlsandwatfaq_rustlsproviders (separate crate forks with independent global state); fixestest_shadowtlsand reality unit teststransport/reality.rs— callsetup_default_crypto_provider()in unit tests that exercise TLS handshake pathsCargo.toml— addtokio_unstabletounexpected_cfgsallowlisttests/api_tests.rs— fixtest_connections_returns_proxy_chain_names: replace fixed 1500 ms sleep with a 10 s polling loop, wait for mixed-port 8899 before sending, read full response body to keep the connection alive during pollingtests/common/mod.rs—ClashInstancenow stores and joins itsJoinHandleon drop, preventing stale Tokio runtimes from holding ports across test runsType
Checklist
Summary by CodeRabbit
Tests
Chores
Bug Fixes