Skip to content

fix(#68): normalize Windows backslashes in surrealkv:// URLs#83

Merged
Knapp-Kevin merged 1 commit into
BicameralAI:devfrom
Knapp-Kevin:fix/68-surrealkv-windows-url
Apr 28, 2026
Merged

fix(#68): normalize Windows backslashes in surrealkv:// URLs#83
Knapp-Kevin merged 1 commit into
BicameralAI:devfrom
Knapp-Kevin:fix/68-surrealkv-windows-url

Conversation

@Knapp-Kevin

@Knapp-Kevin Knapp-Kevin commented Apr 28, 2026

Copy link
Copy Markdown
Collaborator

Closes #68.

Summary

urllib.parse.urlparse("surrealkv://C:\Users\...") treats everything after the scheme as a netloc. Reading parsed.port then raises ValueError: Port could not be cast to integer value. The SurrealDB Python SDK's Url wrapper reads parsed.port on every connect, so passing an unmodified Windows backslash path crashes every embedded test that builds its URL from a tmp_path fixture — all 5 tests in test_schema_persistence.py were uncollectable.

Fix

ledger/client.py adds normalize_surrealkv_url(), called from LedgerClient.__init__. It replaces backslashes with forward slashes inside surrealkv://, surrealkv+versioned://, and file:// URLs:

surrealkv://C:\Users\foo\bar.db    →    surrealkv://C:/Users/foo/bar.db

The forward-slash form parses cleanly through urllib.parse (netloc=C:, path=/Users/foo/bar.db, port=None — the colon isn't followed by digits so port-parsing is a no-op) AND is accepted by the SurrealKV Rust backend on Windows.

What I tried first that didn't work

A triple-slash file-URI form (surrealkv:///C:/Users/...) was the obvious-looking fix. It satisfies urllib but the SurrealKV Rust backend rejects it with Failed to create datastore: invalid filename. The same applies to file:///C:/.... Only the simple surrealkv://C:/Users/... shape satisfies both layers, which is why the fix is just a backslash-replacement and not a full URL rewrite.

Tests

tests/test_surrealkv_url_normalization.py — 15 tests:

  • 11 unit tests for the normalizer:
    • Windows backslash → forward slash conversion
    • Already-normalized forward-slash URL passes through unchanged
    • Lowercase drive letter preserved
    • surrealkv+versioned:// and file:// schemes also normalized
    • POSIX URLs, memory://, ws://, https://, empty string all unchanged
    • Output of normalizer parses cleanly through urllib.parse.urlparse(...).port
  • 3 wiring tests confirming LedgerClient.__init__ calls the normalizer
  • 1 e2e test exercising the original surrealkv://{tmp_path/'ledger.db'} repro from the issue — connects, queries, and closes successfully

Verification

  • All 15 new tests pass on Windows
  • All 5 test_schema_persistence.py tests now pass on Windows (was 0/5 collectable)
  • CI Linux green (POSIX path is unchanged because there are no backslashes to replace)

Test plan

  • urllib.parse no longer raises ValueError on the normalized URL
  • SurrealKV Rust backend accepts the normalized URL on Windows
  • No behaviour change on POSIX
  • No behaviour change for memory:// / ws:// / http://

Summary by CodeRabbit

  • New Features

    • Added URL normalization for embedded database connections, enabling proper handling of Windows file paths with backslashes in connection strings.
  • Tests

    • Added comprehensive test coverage for URL normalization across Windows and POSIX path formats, verifying correct parsing and connection initialization.

Issue BicameralAI#68: ``urllib.parse.urlparse("surrealkv://C:\Users\...")``
treats everything after the scheme as a netloc and raises
``ValueError: Port could not be cast to integer value`` when
``parsed.port`` is read. The SurrealDB Python SDK's ``Url`` wrapper
reads ``parsed.port`` on every connect, so passing an unmodified
Windows backslash path crashes every embedded test that builds its
URL from a ``tmp_path`` fixture (5 tests in test_schema_persistence.py).

Fix: ``normalize_surrealkv_url()`` in ledger/client.py replaces
backslashes with forward slashes inside ``surrealkv://``,
``surrealkv+versioned://``, and ``file://`` URLs. The forward-slash
form parses cleanly through ``urllib.parse`` (netloc=``C:``,
path=``/Users/...``, port=None — the path-after-colon doesn't trigger
port parsing because it's empty at the port position) AND is accepted
by the SurrealKV Rust backend on Windows. The triple-slash file-URI
form was tested but rejected by the Rust backend with "invalid
filename" — the simpler backslash-replacement is the only form that
satisfies both layers.

``LedgerClient.__init__`` now normalizes the URL so callers don't have
to. POSIX URLs, ``memory://``, and remote URLs (``ws://``, ``http://``)
pass through unchanged because they contain no backslashes.

Tests:

- ``tests/test_surrealkv_url_normalization.py`` — 15 tests:
  - 11 unit tests for the normalizer (Windows backslash, already-
    normalized forward slash, lowercase drive, surrealkv+versioned,
    file:// scheme, POSIX, memory://, ws://, https://, empty, urllib
    round-trip)
  - 3 wiring tests confirming ``LedgerClient.__init__`` calls it
  - 1 e2e test exercising the original ``surrealkv://{tmp_path/'ledger.db'}``
    repro that connects, queries, and closes successfully

- All 5 ``tests/test_schema_persistence.py`` tests now pass on
  Windows (was 0/5 collectable, now 5/5 passing).

Verified locally on Windows: 20/20 (15 normalization + 5 schema_persistence).
@coderabbitai

coderabbitai Bot commented Apr 28, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

This pull request introduces URL normalization logic to handle Windows-style backslash paths in SurrealKV embedded database connections. A new normalize_surrealkv_url() function detects and converts backslashes to forward slashes for surrealkv://, surrealkv+versioned://, and file:// URLs, preventing urllib.parse port-parsing errors on Windows platforms. LedgerClient.__init__ applies this normalization before storing the URL.

Changes

Cohort / File(s) Summary
URL Normalization
ledger/client.py
Adds normalize_surrealkv_url() helper function that detects Windows drive-letter and backslash patterns in SurrealKV-family URLs and converts backslashes to forward slashes. Updates LedgerClient.__init__ to normalize url parameter before storage to prevent downstream parsing errors.
Test Coverage
tests/test_surrealkv_url_normalization.py
New test module validating normalization converts Windows backslash paths to forward slashes in surrealkv:// URLs, preserves non-SurrealKV schemes unchanged, confirms normalized URLs parse without ValueError, and verifies LedgerClient.__init__ applies normalization while passing through memory:// and ws:// unchanged.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 Windows paths once caused our client to stumble and cry,
Backslashes twisted in URLs reaching too high,
A normalization spell cast, forward slashes in place,
Now SurrealKV dances on every platform with grace! 🚀

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 17.65% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(#68): normalize Windows backslashes in surrealkv:// URLs' directly and concisely describes the main change: normalizing backslashes in Windows paths within surrealkv URLs to fix issue #68.
Linked Issues check ✅ Passed The PR successfully addresses issue #68 by implementing URL normalization that prevents urllib.parse from raising ValueError when accessing parsed.port on Windows paths, enabling tests to run on Windows.
Out of Scope Changes check ✅ Passed All changes are within scope: normalize_surrealkv_url() function and its integration into LedgerClient.init directly address the Windows backslash parsing issue, and tests validate this fix without introducing unrelated changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
ledger/client.py (1)

11-27: ⚠️ Potential issue | 🟡 Minor

Handle scheme matching case-insensitively to avoid missed normalization.

startswith(...) is case-sensitive, so valid mixed/upper-case schemes (e.g., SURREALKV://...) skip normalization and can still trigger the Windows parse crash.

💡 Proposed fix
-import re
 from typing import Any
@@
-# Windows-drive-letter detector at the start of an embedded URL path.
-# Matches "C:\..." or "C:/...". Used to spot URLs that contain a
-# Windows-style file path which needs slash-normalization before
-# urllib.parse can read them.
-_WINDOWS_DRIVE_AT_PATH_START = re.compile(r"^([A-Za-z]):[\\/]")
-
-
 def normalize_surrealkv_url(url: str) -> str:
@@
-    if not url.startswith(("surrealkv://", "surrealkv+versioned://", "file://")):
+    scheme, sep, after_scheme = url.partition("://")
+    if not sep or scheme.lower() not in {"surrealkv", "surrealkv+versioned", "file"}:
         return url
 
-    # Find the path portion (everything after scheme://)
-    scheme_end = url.find("://") + len("://")
-    after_scheme = url[scheme_end:]
-
     # Only rewrite if the URL contains a Windows-style backslash or a
     # bare drive-letter prefix that would confuse urllib. Pure POSIX
     # paths and already-normalized Windows paths pass through unchanged.
     if "\\" not in after_scheme:
         return url
 
-    if not _WINDOWS_DRIVE_AT_PATH_START.match(after_scheme):
-        # Has backslashes but no drive letter — likely a malformed URL,
-        # but we fix the slashes anyway to give urllib a fighting chance.
-        return url[:scheme_end] + after_scheme.replace("\\", "/")
-
-    return url[:scheme_end] + after_scheme.replace("\\", "/")
+    return f"{scheme}://{after_scheme.replace('\\', '/')}"

Also applies to: 57-75

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ledger/client.py` around lines 11 - 27, The scheme checks that call
startswith(...) are case-sensitive and will miss mixed/upper-case schemes (e.g.,
"SURREALKV://"); update those checks to match case-insensitively by normalizing
the string first (e.g., use url.lower().startswith(...) or
url.casefold().startswith(...)) wherever scheme detection occurs (including the
logic that uses _WINDOWS_DRIVE_AT_PATH_START and the startswith checks around
lines 57-75). Ensure you apply the same normalization consistently for all
scheme comparisons so Windows-drive-path normalization always runs.
🧹 Nitpick comments (1)
tests/test_surrealkv_url_normalization.py (1)

29-68: Add a regression test for mixed/upper-case schemes.

Given URL schemes are case-insensitive, a test here would prevent reintroducing the Windows crash via casing variants.

✅ Suggested test case
 class TestNormalizeSurrealKVURL:
@@
     def test_empty_string_unchanged(self) -> None:
         assert normalize_surrealkv_url("") == ""
+
+    def test_uppercase_scheme_is_normalised(self) -> None:
+        out = normalize_surrealkv_url(r"SURREALKV://C:\Users\foo\bar.db")
+        assert out == "SURREALKV://C:/Users/foo/bar.db"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_surrealkv_url_normalization.py` around lines 29 - 68, Add a
regression test ensuring normalize_surrealkv_url handles mixed- or upper-case
schemes (since schemes are case-insensitive) to prevent Windows path crash on
casing variants; add a new test method in TestNormalizeSurrealKVURL that calls
normalize_surrealkv_url with inputs like "SuRrEaLkV://C:\path\file.db" and
"FILE://C:\path\file.db" (and a mixed-case version for the versioned scheme) and
asserts the returned URL normalises back to using forward slashes and preserves
drive-letter casing as expected; reference normalize_surrealkv_url and add the
test alongside the existing tests in tests/test_surrealkv_url_normalization.py.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@ledger/client.py`:
- Around line 11-27: The scheme checks that call startswith(...) are
case-sensitive and will miss mixed/upper-case schemes (e.g., "SURREALKV://");
update those checks to match case-insensitively by normalizing the string first
(e.g., use url.lower().startswith(...) or url.casefold().startswith(...))
wherever scheme detection occurs (including the logic that uses
_WINDOWS_DRIVE_AT_PATH_START and the startswith checks around lines 57-75).
Ensure you apply the same normalization consistently for all scheme comparisons
so Windows-drive-path normalization always runs.

---

Nitpick comments:
In `@tests/test_surrealkv_url_normalization.py`:
- Around line 29-68: Add a regression test ensuring normalize_surrealkv_url
handles mixed- or upper-case schemes (since schemes are case-insensitive) to
prevent Windows path crash on casing variants; add a new test method in
TestNormalizeSurrealKVURL that calls normalize_surrealkv_url with inputs like
"SuRrEaLkV://C:\path\file.db" and "FILE://C:\path\file.db" (and a mixed-case
version for the versioned scheme) and asserts the returned URL normalises back
to using forward slashes and preserves drive-letter casing as expected;
reference normalize_surrealkv_url and add the test alongside the existing tests
in tests/test_surrealkv_url_normalization.py.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 97494dda-b06f-47df-a268-ef9cc4ae8c48

📥 Commits

Reviewing files that changed from the base of the PR and between a340750 and 89e1d1b.

📒 Files selected for processing (2)
  • ledger/client.py
  • tests/test_surrealkv_url_normalization.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Test suite: 5 ValueError 'Port could not be cast' from urllib parsing surrealkv:// URL with Windows path

1 participant