Parallel Worker Pool for VlConverter + V8 Platform Init Safety by jonmmease · Pull Request #242 · vega/vl-convert

jonmmease · 2026-02-21T01:19:54Z

Parallel Worker Pool for VlConverter + V8 Platform Init Safety

Summary

Introduces a configurable parallel worker pool for VlConverter, Python worker-count APIs (set_num_workers / get_num_workers), opt-in warmup APIs (warm_up / warm_up_workers) to pre-initialize workers, and a proactive V8 platform initialization guard that prevents a class of sporadic SIGSEGV crashes when spawning multiple Deno isolates from a multithreaded process. Default behavior is unchanged (1 worker, lazy startup unless warmup is called).

Also adds exponential-backoff retry for remote image loading (image_loading.rs) — a pre-existing reliability gap surfaced by CI while testing this PR.

Motivation

Background: The #206 segfault and the #237 workaround

Issue #206 reported a segfault when creating multiple VlConverter instances — each instance was spawning its own OS thread with its own V8 isolate, and the second isolate crashed inside JsRuntime::new_inner. PR #237 fixed the crash by collapsing to a single shared global worker thread (VL_CONVERTER_RUNTIME), so only one V8 isolate ever existed in the process. That fixed the segfault — but it made all conversions from all instances serialize through the same thread, with no path to concurrency.

What was actually causing the crash

Since V8 11.6, V8 enforces W^X on JIT pages using Intel/AMD Memory Protection Keys for Userspace (PKU/MPK). PKU state is stored in the per-thread PKRU register, which is inherited from parent to child at pthread_create() time. V8 writes the correct PKRU state during platform initialization. Any thread not descended from the platform-initializing thread gets wrong PKRU state for V8's JIT pages — causing SIGSEGV not at startup, but sporadically later when the JIT compiler activates.

The original multi-instance code created new worker threads from arbitrary calling threads (not the platform-initializing thread), so V8's JIT pages were inaccessible from those workers. The global-singleton workaround (#237) was immune because only one worker thread ever existed, but it gave up concurrency to get there.

This PR: the proper fix enables multiple workers

By calling init_platform once on the parent thread (via ensure_v8_platform_initialized) before spawning any worker threads, all workers are descendants of the platform-initializing thread and inherit correct PKRU state. This is a proactive workaround, not runtime detection — it unconditionally calls init_platform once via a std::sync::Once guard. With this guard in place, N workers can safely coexist, which removes the architectural bottleneck from #237 and enables genuine parallel execution.

References:

deno_core::JsRuntime docs: "Since V8 11.6, all runtimes must have a common parent thread that initialized the V8 platform."
deno_core PR #471: Segfault fix on PKU-enabled CPUs
rusty_v8 issue #1381: Direct SIGSEGV report on Intel 13th-gen hardware
deno issue #20495: "Unless V8 platform is initialized on main thread the segfaults start appearing once JIT kicks in."

What Changed

Rust — `vl-convert-rs/src/converter.rs`

VlConverter is now Clone via Arc<VlConverterInner>. All conversion methods take &self instead of &mut self.

New API:

VlConverter::with_num_workers(n: usize) -> Result<Self, AnyError> — construct with a specific worker count (validated ≥ 1)
VlConverter::num_workers(&self) -> usize — query configured worker count
VlConverter::warm_up(&self) -> Result<(), AnyError> — optionally pre-spawn workers before first conversion request
VlConverter::new() — unchanged, defaults to 1 worker

Clone semantics: #[derive(Clone)] on VlConverter clones the Arc<VlConverterInner> — it increments the reference count, not the pool. The clone and the original share the same worker pool, bundle cache, and configuration. The pool is torn down only when all clones are dropped. To get an independent pool (separate workers, separate memory), construct a new VlConverter::with_num_workers(n) rather than cloning.

Per-instance pools (behavioral change from #237): Previously all VlConverter instances shared a single global VL_CONVERTER_RUNTIME thread. Now each instance owns its own worker pool — new() calls no longer share state. Cloning is the way to get multiple handles to one pool.

Worker pool model: One dedicated OS thread per worker, each with a new_current_thread() Tokio runtime + LocalSet (required because Deno's MainWorker uses !Send types). Per-worker bounded channels (capacity 32) provide backpressure. Dispatch uses a least-outstanding strategy: each worker owns an Arc<AtomicUsize> counter; an OutstandingTicket RAII guard increments the counter at dispatch time and decrements it when the worker finishes the command, covering both queue time and execution time. next_sender picks the worker with the lowest outstanding count; a rotating dispatch_cursor breaks ties without index-0 bias. Per-worker dispatch (vs. a single shared queue) is required because each Deno runtime is stateful and cannot have work redistributed mid-execution. Pool startup remains lazy by default, with a new explicit warm_up() path that triggers the same startup handshake ahead of first request.

V8 init guard:

fn ensure_v8_platform_initialized() {
    static V8_INIT: Once = Once::new();
    // V8 11.6+ PKU requirement: all worker threads must descend from the thread
    // that initialized the platform. PKRU register is inherited at pthread_create()
    // time; threads not in the lineage get wrong JIT page permissions → SIGSEGV.
    V8_INIT.call_once(|| deno_core::JsRuntime::init_platform(None, false));
}

Called at the start of spawn_worker_pool, before any worker thread is created.

Send retry: If a worker's channel is closed (worker died), send_command_with_retry resets the pool and retries once. If the retry also fails, the error is returned to the caller.

Worker-local transfer state: JSON args / MessagePack scenegraph payloads are now stored per worker in Deno OpState (WorkerTransferState), removing process-wide contention on JSON_ARGS / MSGPACK_RESULTS / NEXT_ID. JsonArgGuard and MsgpackResultGuard remain RAII-based, but now clean up worker-local state on all error paths.

Rust — `vl-convert-rs/src/image_loading.rs`

The existing custom_string_resolver fetched remote images with a single reqwest call and silently dropped the image on any error. CI exposed this gap: the test_pdf[remote_images] test makes two fetches to Wikimedia in the same run — one for Vega, one for Vega-Lite — and the second was being rate-limited (HTTP 429 / 503), causing the Firefox logo to vanish from the PDF output.

The fix wraps the fetch with backon exponential-backoff retry:

(|| async {
    let response = REQWEST_CLIENT.get(href).send().await?;
    match response.status() {
        StatusCode::OK => Ok((Some(response.bytes().await?), content_type)),
        s if s.is_server_error() || s == StatusCode::TOO_MANY_REQUESTS => {
            Err(response.error_for_status().unwrap_err()) // signal: retry
        }
        s => {
            error!("Failed to load image from url {} with status {:?}", href, s);
            Ok((None, None)) // signal: permanent failure, stop retrying
        }
    }
})
.retry(
    ExponentialBuilder::default()
        .with_min_delay(Duration::from_millis(500))
        .with_max_delay(Duration::from_secs(10))
        .with_max_times(4),
)
.when(|e| {
    e.status()
        .map(|s| s.is_server_error() || s == StatusCode::TOO_MANY_REQUESTS)
        .unwrap_or(true) // network errors (no status code) are always retried
})

Retry policy:

Retried: TCP/network errors, HTTP 429, HTTP 5xx — up to 4 times with 500ms–10s exponential backoff + jitter
Not retried: HTTP 4xx (except 429) — logged immediately, returns (None, None) via Ok so backon does not retry

Python — `vl-convert-python/src/lib.rs` + `vl_convert.pyi`

The global converter state changed from Mutex<VlConverterRs> to RwLock<Arc<VlConverterRs>>. The read lock is held only briefly to clone the Arc; conversion runs after the lock is released, so concurrent conversion calls don't block each other. set_num_workers write-locks briefly to swap in a new Arc; in-flight callers hold the old Arc and complete normally.

New public API:

vl_convert.set_num_workers(num_workers: int) -> None
vl_convert.get_num_workers() -> int
vl_convert.warm_up_workers() -> None

GIL release: Vega-Lite and Vega conversion functions (those backed by Deno/Tokio) now call py.allow_threads(|| ...) to release the Python GIL during the blocking Rust execution, enabling true concurrency when called from Python threads. (Note: svg_to_png, svg_to_jpeg, svg_to_pdf are synchronous and not affected.)

Not asyncio-compatible. These functions call block_on internally. Calling from an asyncio event loop without an executor will stall the loop. Use loop.run_in_executor(None, ...) from async contexts.

Tests and Docs

New Rust tests: test_with_num_workers_rejects_zero, test_num_workers_reports_configured_value, test_warm_up_spawns_pool_without_request, test_warm_up_is_idempotent, test_parallel_conversions_with_shared_converter.

New Python tests (test_workers.py): default count, zero-worker rejection, 16 parallel conversions with 4 workers, warmup-before-submit scenario, and set_num_workers during concurrent submissions.

Python image comparison tests now save failed images to tests/failed/ on assertion failure; CI uploads them as an artifact on job failure for post-mortem diagnosis.

README: added "Configure Worker Parallelism" section.

User-Facing Impact

No breaking changes. All existing Python and CLI APIs work identically.
Default is unchanged: 1 worker, same behavior as before.
Startup behavior: Worker pools start lazily by default, and callers can opt into eager startup via warm_up() / warm_up_workers() to avoid first-request initialization latency.
Rust API: Conversion methods changed from &mut self to &self (backwards-compatible; enables shared access without exclusive borrow).
Python API: set_num_workers / get_num_workers / warm_up_workers are additive.
No CLI flag changes. The CLI creates one converter per invocation.

Memory note: Each worker is a full Deno/V8 runtime with Vega-Lite loaded. Pool state is per VlConverter instance — reuse or clone one converter handle rather than constructing multiple instances unless independent pools are intentional. See the README's "Parallel Workers" section for usage guidance.

Review Tour

Core Change

vl-convert-rs/src/converter.rs

New types near top: WorkerPool (~line 60), ensure_v8_platform_initialized (~line 89). VlConverterInner/VlConverter are further down (~lines 1739–1750). Worker pool implementation: spawn_worker_pool, get_or_spawn_sender, warm_up, send_command_with_retry, request. RAII guards (JsonArgGuard, MsgpackResultGuard) at top of file.

vl-convert-rs/src/image_loading.rs — Retry logic with backon. No behavior change on the happy path; only affects failed/throttled fetches.

Python Integration

vl-convert-python/src/lib.rs

VL_CONVERTER declaration (global RwLock<Arc<>>), converter_read_handle, run_converter_future (clones the Arc, releases GIL, drives the future), set_num_workers / get_num_workers / warm_up_workers.

Tests and Types

vl-convert-python/tests/test_workers.py — New file. Five focused tests for the worker API.

vl-convert-python/vl_convert.pyi — Type stubs for set_num_workers, get_num_workers, and warm_up_workers.

Mechanical

Minor let mut → let changes in vl-convert/src/main.rs, test_specs.rs, test_themes.rs, and docs-only addition to README.md.

Replace the single-threaded VlConverterRuntime with a WorkerPool that supports configurable numbers of Deno workers. Each worker gets its own tokio LocalSet and JsRuntime, providing true parallelism for concurrent conversion requests. Key changes: - WorkerPool struct with round-robin sender selection via AtomicUsize - spawn_worker_pool(num_workers) creates N independent worker threads - VlConverter is now Clone (wraps Arc<VlConverterInner>) - with_num_workers() constructor and num_workers() accessor - V8 platform initialization is one-time via Once/call_once - handle_command() helper centralizes command dispatch in worker loop - Worker startup uses sync channel to propagate initialization errors Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

VlConverter is now Clone and all conversion methods take `&self` since internal state is managed via Arc<VlConverterInner>. Drop the `mut` binding in tests and CLI callers to resolve unused_mut warnings. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Refactor the Python bindings to use a thread-safe, parallel VlConverter: - Switch VL_CONVERTER from Mutex<VlConverterRs> to RwLock<Arc<VlConverterRs>> so multiple conversion futures can run concurrently - Add converter_read_handle() and run_converter_future() helpers to centralize the read-lock + block_on pattern across all conversion functions - Allow Python's GIL to be released during blocking Deno calls via py.allow_threads() inside run_converter_future - Add set_num_workers(n) / get_num_workers() to configure and inspect the worker pool at runtime; set_num_workers replaces the global converter with a freshly-spawned one - Expose the new functions in vl_convert.pyi with NumPy-style docstrings Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Verify set/get_num_workers behavior: - Default is 1 worker - set_num_workers rejects zero - Workers can be reconfigured while conversions are running - Parallel conversions succeed with multiple configured workers Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add a "Parallel Workers" section showing how to use set_num_workers() and get_num_workers() to scale throughput for batch workloads. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

On failure, check_png() now writes the actual and expected PNGs to vl-convert-python/tests/failed/ so rendering differences can be inspected. The macOS/Windows Python CI job uploads that directory as an artifact when tests fail. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Wraps the reqwest fetch in `custom_string_resolver` with `backon` exponential backoff (500ms–10s, up to 4 retries). Retries on network errors and transient HTTP failures (429, 5xx); permanent errors (404, 403, etc.) short-circuit immediately without retrying. Fixes intermittent CI failures where Wikimedia rate-limits a second image fetch within the same test run. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

We only need tokio-sleep + std; gloo-timers is the WASM timer backend and has no use in this crate. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…selection Replaces the AtomicBool dispatch_pending approach with a proper outstanding-request counter per worker. The OutstandingTicket RAII guard increments on selection and decrements on drop, so the counter accurately covers three lifecycle phases: waiting on a full channel, queued in the channel, and actively executing on the worker thread. Key design points: - QueuedCommand wraps VlConvertCommand + OutstandingTicket so the ticket travels with the command through the MPSC channel; the worker drops _ticket after handle_command completes, not when it dequeues - dispatch_cursor rotates the scan start to break ties without index-0 bias - Early-exit when outstanding == 0 avoids scanning remaining workers - Cancellation-safe: if a send future is dropped, the ticket drops with it Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

jonmmease

completed my review

jonmmease · 2026-02-21T13:42:22Z

.github/workflows/CI.yml

+        if: failure()
+        with:
+          name: failed-images-python-${{ matrix.options[0] }}
+          path: vl-convert-python/tests/failed/


working out intermittent image test failures was a side quest that lead to using the backon retry logic

jonmmease · 2026-02-21T13:43:28Z

vl-convert-rs/src/image_loading.rs

retry logic for fetching images from urls was helpful for improving ci reliability, and should help end user usage as well

jonmmease · 2026-02-21T13:44:13Z

vl-convert-rs/src/converter.rs

+fn ensure_v8_platform_initialized() {
+    static V8_INIT: Once = Once::new();
+    V8_INIT.call_once(|| deno_core::JsRuntime::init_platform(None, false));
+}


this is the key to avoiding the segfault

jonmmease and others added 13 commits February 20, 2026 18:26

docs: Document parallel worker configuration in Python README

fb628e3

Add a "Parallel Workers" section showing how to use set_num_workers() and get_num_workers() to scale throughput for batch workloads. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

converter: move JSON/msgpack transfer state to per-worker OpState

f8c6d9e

feat: add opt-in worker warmup APIs

48654f9

chore: update bundled licenses for backon + gloo-timers

6f7d4ab

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chore: disable backon default features to drop wasm-only gloo-timers dep

f666e33

We only need tokio-sleep + std; gloo-timers is the WASM timer backend and has no use in this crate. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chore: remove gloo-timers from bundled licenses

df00ee5

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

jonmmease commented Feb 21, 2026

View reviewed changes

jonmmease marked this pull request as ready for review February 21, 2026 15:11

jonmmease merged commit 26ce1c7 into main Feb 21, 2026
12 checks passed

jonmmease deleted the jonmmease/worker-pool branch February 21, 2026 15:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel Worker Pool for VlConverter + V8 Platform Init Safety#242

Parallel Worker Pool for VlConverter + V8 Platform Init Safety#242
jonmmease merged 13 commits intomainfrom
jonmmease/worker-pool

jonmmease commented Feb 21, 2026 •

edited

Loading

Uh oh!

jonmmease left a comment

Uh oh!

jonmmease Feb 21, 2026

Uh oh!

jonmmease Feb 21, 2026

Uh oh!

jonmmease Feb 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jonmmease commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Parallel Worker Pool for VlConverter + V8 Platform Init Safety

Summary

Motivation

Background: The #206 segfault and the #237 workaround

What was actually causing the crash

This PR: the proper fix enables multiple workers

What Changed

Rust — vl-convert-rs/src/converter.rs

Rust — vl-convert-rs/src/image_loading.rs

Python — vl-convert-python/src/lib.rs + vl_convert.pyi

Tests and Docs

User-Facing Impact

Review Tour

Core Change

Python Integration

Tests and Types

Mechanical

Uh oh!

jonmmease left a comment

Choose a reason for hiding this comment

Uh oh!

jonmmease Feb 21, 2026

Choose a reason for hiding this comment

Uh oh!

jonmmease Feb 21, 2026

Choose a reason for hiding this comment

Uh oh!

jonmmease Feb 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jonmmease commented Feb 21, 2026 •

edited

Loading

Rust — `vl-convert-rs/src/converter.rs`

Rust — `vl-convert-rs/src/image_loading.rs`

Python — `vl-convert-python/src/lib.rs` + `vl_convert.pyi`