fix(#68): normalize Windows backslashes in surrealkv:// URLs#83
Conversation
Issue BicameralAI#68: ``urllib.parse.urlparse("surrealkv://C:\Users\...")`` treats everything after the scheme as a netloc and raises ``ValueError: Port could not be cast to integer value`` when ``parsed.port`` is read. The SurrealDB Python SDK's ``Url`` wrapper reads ``parsed.port`` on every connect, so passing an unmodified Windows backslash path crashes every embedded test that builds its URL from a ``tmp_path`` fixture (5 tests in test_schema_persistence.py). Fix: ``normalize_surrealkv_url()`` in ledger/client.py replaces backslashes with forward slashes inside ``surrealkv://``, ``surrealkv+versioned://``, and ``file://`` URLs. The forward-slash form parses cleanly through ``urllib.parse`` (netloc=``C:``, path=``/Users/...``, port=None — the path-after-colon doesn't trigger port parsing because it's empty at the port position) AND is accepted by the SurrealKV Rust backend on Windows. The triple-slash file-URI form was tested but rejected by the Rust backend with "invalid filename" — the simpler backslash-replacement is the only form that satisfies both layers. ``LedgerClient.__init__`` now normalizes the URL so callers don't have to. POSIX URLs, ``memory://``, and remote URLs (``ws://``, ``http://``) pass through unchanged because they contain no backslashes. Tests: - ``tests/test_surrealkv_url_normalization.py`` — 15 tests: - 11 unit tests for the normalizer (Windows backslash, already- normalized forward slash, lowercase drive, surrealkv+versioned, file:// scheme, POSIX, memory://, ws://, https://, empty, urllib round-trip) - 3 wiring tests confirming ``LedgerClient.__init__`` calls it - 1 e2e test exercising the original ``surrealkv://{tmp_path/'ledger.db'}`` repro that connects, queries, and closes successfully - All 5 ``tests/test_schema_persistence.py`` tests now pass on Windows (was 0/5 collectable, now 5/5 passing). Verified locally on Windows: 20/20 (15 normalization + 5 schema_persistence).
📝 WalkthroughWalkthroughThis pull request introduces URL normalization logic to handle Windows-style backslash paths in SurrealKV embedded database connections. A new Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
ledger/client.py (1)
11-27:⚠️ Potential issue | 🟡 MinorHandle scheme matching case-insensitively to avoid missed normalization.
startswith(...)is case-sensitive, so valid mixed/upper-case schemes (e.g.,SURREALKV://...) skip normalization and can still trigger the Windows parse crash.💡 Proposed fix
-import re from typing import Any @@ -# Windows-drive-letter detector at the start of an embedded URL path. -# Matches "C:\..." or "C:/...". Used to spot URLs that contain a -# Windows-style file path which needs slash-normalization before -# urllib.parse can read them. -_WINDOWS_DRIVE_AT_PATH_START = re.compile(r"^([A-Za-z]):[\\/]") - - def normalize_surrealkv_url(url: str) -> str: @@ - if not url.startswith(("surrealkv://", "surrealkv+versioned://", "file://")): + scheme, sep, after_scheme = url.partition("://") + if not sep or scheme.lower() not in {"surrealkv", "surrealkv+versioned", "file"}: return url - # Find the path portion (everything after scheme://) - scheme_end = url.find("://") + len("://") - after_scheme = url[scheme_end:] - # Only rewrite if the URL contains a Windows-style backslash or a # bare drive-letter prefix that would confuse urllib. Pure POSIX # paths and already-normalized Windows paths pass through unchanged. if "\\" not in after_scheme: return url - if not _WINDOWS_DRIVE_AT_PATH_START.match(after_scheme): - # Has backslashes but no drive letter — likely a malformed URL, - # but we fix the slashes anyway to give urllib a fighting chance. - return url[:scheme_end] + after_scheme.replace("\\", "/") - - return url[:scheme_end] + after_scheme.replace("\\", "/") + return f"{scheme}://{after_scheme.replace('\\', '/')}"Also applies to: 57-75
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ledger/client.py` around lines 11 - 27, The scheme checks that call startswith(...) are case-sensitive and will miss mixed/upper-case schemes (e.g., "SURREALKV://"); update those checks to match case-insensitively by normalizing the string first (e.g., use url.lower().startswith(...) or url.casefold().startswith(...)) wherever scheme detection occurs (including the logic that uses _WINDOWS_DRIVE_AT_PATH_START and the startswith checks around lines 57-75). Ensure you apply the same normalization consistently for all scheme comparisons so Windows-drive-path normalization always runs.
🧹 Nitpick comments (1)
tests/test_surrealkv_url_normalization.py (1)
29-68: Add a regression test for mixed/upper-case schemes.Given URL schemes are case-insensitive, a test here would prevent reintroducing the Windows crash via casing variants.
✅ Suggested test case
class TestNormalizeSurrealKVURL: @@ def test_empty_string_unchanged(self) -> None: assert normalize_surrealkv_url("") == "" + + def test_uppercase_scheme_is_normalised(self) -> None: + out = normalize_surrealkv_url(r"SURREALKV://C:\Users\foo\bar.db") + assert out == "SURREALKV://C:/Users/foo/bar.db"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_surrealkv_url_normalization.py` around lines 29 - 68, Add a regression test ensuring normalize_surrealkv_url handles mixed- or upper-case schemes (since schemes are case-insensitive) to prevent Windows path crash on casing variants; add a new test method in TestNormalizeSurrealKVURL that calls normalize_surrealkv_url with inputs like "SuRrEaLkV://C:\path\file.db" and "FILE://C:\path\file.db" (and a mixed-case version for the versioned scheme) and asserts the returned URL normalises back to using forward slashes and preserves drive-letter casing as expected; reference normalize_surrealkv_url and add the test alongside the existing tests in tests/test_surrealkv_url_normalization.py.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@ledger/client.py`:
- Around line 11-27: The scheme checks that call startswith(...) are
case-sensitive and will miss mixed/upper-case schemes (e.g., "SURREALKV://");
update those checks to match case-insensitively by normalizing the string first
(e.g., use url.lower().startswith(...) or url.casefold().startswith(...))
wherever scheme detection occurs (including the logic that uses
_WINDOWS_DRIVE_AT_PATH_START and the startswith checks around lines 57-75).
Ensure you apply the same normalization consistently for all scheme comparisons
so Windows-drive-path normalization always runs.
---
Nitpick comments:
In `@tests/test_surrealkv_url_normalization.py`:
- Around line 29-68: Add a regression test ensuring normalize_surrealkv_url
handles mixed- or upper-case schemes (since schemes are case-insensitive) to
prevent Windows path crash on casing variants; add a new test method in
TestNormalizeSurrealKVURL that calls normalize_surrealkv_url with inputs like
"SuRrEaLkV://C:\path\file.db" and "FILE://C:\path\file.db" (and a mixed-case
version for the versioned scheme) and asserts the returned URL normalises back
to using forward slashes and preserves drive-letter casing as expected;
reference normalize_surrealkv_url and add the test alongside the existing tests
in tests/test_surrealkv_url_normalization.py.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 97494dda-b06f-47df-a268-ef9cc4ae8c48
📒 Files selected for processing (2)
ledger/client.pytests/test_surrealkv_url_normalization.py
Closes #68.
Summary
urllib.parse.urlparse("surrealkv://C:\Users\...")treats everything after the scheme as a netloc. Readingparsed.portthen raisesValueError: Port could not be cast to integer value. The SurrealDB Python SDK'sUrlwrapper readsparsed.porton every connect, so passing an unmodified Windows backslash path crashes every embedded test that builds its URL from atmp_pathfixture — all 5 tests intest_schema_persistence.pywere uncollectable.Fix
ledger/client.pyaddsnormalize_surrealkv_url(), called fromLedgerClient.__init__. It replaces backslashes with forward slashes insidesurrealkv://,surrealkv+versioned://, andfile://URLs:The forward-slash form parses cleanly through
urllib.parse(netloc=C:, path=/Users/foo/bar.db, port=None — the colon isn't followed by digits so port-parsing is a no-op) AND is accepted by the SurrealKV Rust backend on Windows.What I tried first that didn't work
A triple-slash file-URI form (
surrealkv:///C:/Users/...) was the obvious-looking fix. It satisfies urllib but the SurrealKV Rust backend rejects it withFailed to create datastore: invalid filename. The same applies tofile:///C:/.... Only the simplesurrealkv://C:/Users/...shape satisfies both layers, which is why the fix is just a backslash-replacement and not a full URL rewrite.Tests
tests/test_surrealkv_url_normalization.py— 15 tests:surrealkv+versioned://andfile://schemes also normalizedmemory://,ws://,https://, empty string all unchangedurllib.parse.urlparse(...).portLedgerClient.__init__calls the normalizersurrealkv://{tmp_path/'ledger.db'}repro from the issue — connects, queries, and closes successfullyVerification
test_schema_persistence.pytests now pass on Windows (was 0/5 collectable)Test plan
memory:///ws:///http://Summary by CodeRabbit
New Features
Tests