Skip to content

Parallel Worker Pool for VlConverter + V8 Platform Init Safety#242

Merged
jonmmease merged 13 commits intomainfrom
jonmmease/worker-pool
Feb 21, 2026
Merged

Parallel Worker Pool for VlConverter + V8 Platform Init Safety#242
jonmmease merged 13 commits intomainfrom
jonmmease/worker-pool

Conversation

@jonmmease
Copy link
Collaborator

@jonmmease jonmmease commented Feb 21, 2026

Parallel Worker Pool for VlConverter + V8 Platform Init Safety

Summary

Introduces a configurable parallel worker pool for VlConverter, Python worker-count APIs (set_num_workers / get_num_workers), opt-in warmup APIs (warm_up / warm_up_workers) to pre-initialize workers, and a proactive V8 platform initialization guard that prevents a class of sporadic SIGSEGV crashes when spawning multiple Deno isolates from a multithreaded process. Default behavior is unchanged (1 worker, lazy startup unless warmup is called).

Also adds exponential-backoff retry for remote image loading (image_loading.rs) — a pre-existing reliability gap surfaced by CI while testing this PR.

Motivation

Background: The #206 segfault and the #237 workaround

Issue #206 reported a segfault when creating multiple VlConverter instances — each instance was spawning its own OS thread with its own V8 isolate, and the second isolate crashed inside JsRuntime::new_inner. PR #237 fixed the crash by collapsing to a single shared global worker thread (VL_CONVERTER_RUNTIME), so only one V8 isolate ever existed in the process. That fixed the segfault — but it made all conversions from all instances serialize through the same thread, with no path to concurrency.

What was actually causing the crash

Since V8 11.6, V8 enforces W^X on JIT pages using Intel/AMD Memory Protection Keys for Userspace (PKU/MPK). PKU state is stored in the per-thread PKRU register, which is inherited from parent to child at pthread_create() time. V8 writes the correct PKRU state during platform initialization. Any thread not descended from the platform-initializing thread gets wrong PKRU state for V8's JIT pages — causing SIGSEGV not at startup, but sporadically later when the JIT compiler activates.

The original multi-instance code created new worker threads from arbitrary calling threads (not the platform-initializing thread), so V8's JIT pages were inaccessible from those workers. The global-singleton workaround (#237) was immune because only one worker thread ever existed, but it gave up concurrency to get there.

This PR: the proper fix enables multiple workers

By calling init_platform once on the parent thread (via ensure_v8_platform_initialized) before spawning any worker threads, all workers are descendants of the platform-initializing thread and inherit correct PKRU state. This is a proactive workaround, not runtime detection — it unconditionally calls init_platform once via a std::sync::Once guard. With this guard in place, N workers can safely coexist, which removes the architectural bottleneck from #237 and enables genuine parallel execution.

References:

What Changed

Rust — vl-convert-rs/src/converter.rs

VlConverter is now Clone via Arc<VlConverterInner>. All conversion methods take &self instead of &mut self.

New API:

  • VlConverter::with_num_workers(n: usize) -> Result<Self, AnyError> — construct with a specific worker count (validated ≥ 1)
  • VlConverter::num_workers(&self) -> usize — query configured worker count
  • VlConverter::warm_up(&self) -> Result<(), AnyError> — optionally pre-spawn workers before first conversion request
  • VlConverter::new() — unchanged, defaults to 1 worker

Clone semantics: #[derive(Clone)] on VlConverter clones the Arc<VlConverterInner> — it increments the reference count, not the pool. The clone and the original share the same worker pool, bundle cache, and configuration. The pool is torn down only when all clones are dropped. To get an independent pool (separate workers, separate memory), construct a new VlConverter::with_num_workers(n) rather than cloning.

Per-instance pools (behavioral change from #237): Previously all VlConverter instances shared a single global VL_CONVERTER_RUNTIME thread. Now each instance owns its own worker pool — new() calls no longer share state. Cloning is the way to get multiple handles to one pool.

Worker pool model: One dedicated OS thread per worker, each with a new_current_thread() Tokio runtime + LocalSet (required because Deno's MainWorker uses !Send types). Per-worker bounded channels (capacity 32) provide backpressure. Dispatch uses a least-outstanding strategy: each worker owns an Arc<AtomicUsize> counter; an OutstandingTicket RAII guard increments the counter at dispatch time and decrements it when the worker finishes the command, covering both queue time and execution time. next_sender picks the worker with the lowest outstanding count; a rotating dispatch_cursor breaks ties without index-0 bias. Per-worker dispatch (vs. a single shared queue) is required because each Deno runtime is stateful and cannot have work redistributed mid-execution. Pool startup remains lazy by default, with a new explicit warm_up() path that triggers the same startup handshake ahead of first request.

V8 init guard:

fn ensure_v8_platform_initialized() {
    static V8_INIT: Once = Once::new();
    // V8 11.6+ PKU requirement: all worker threads must descend from the thread
    // that initialized the platform. PKRU register is inherited at pthread_create()
    // time; threads not in the lineage get wrong JIT page permissions → SIGSEGV.
    V8_INIT.call_once(|| deno_core::JsRuntime::init_platform(None, false));
}

Called at the start of spawn_worker_pool, before any worker thread is created.

Send retry: If a worker's channel is closed (worker died), send_command_with_retry resets the pool and retries once. If the retry also fails, the error is returned to the caller.

Worker-local transfer state: JSON args / MessagePack scenegraph payloads are now stored per worker in Deno OpState (WorkerTransferState), removing process-wide contention on JSON_ARGS / MSGPACK_RESULTS / NEXT_ID. JsonArgGuard and MsgpackResultGuard remain RAII-based, but now clean up worker-local state on all error paths.

Rust — vl-convert-rs/src/image_loading.rs

The existing custom_string_resolver fetched remote images with a single reqwest call and silently dropped the image on any error. CI exposed this gap: the test_pdf[remote_images] test makes two fetches to Wikimedia in the same run — one for Vega, one for Vega-Lite — and the second was being rate-limited (HTTP 429 / 503), causing the Firefox logo to vanish from the PDF output.

The fix wraps the fetch with backon exponential-backoff retry:

(|| async {
    let response = REQWEST_CLIENT.get(href).send().await?;
    match response.status() {
        StatusCode::OK => Ok((Some(response.bytes().await?), content_type)),
        s if s.is_server_error() || s == StatusCode::TOO_MANY_REQUESTS => {
            Err(response.error_for_status().unwrap_err()) // signal: retry
        }
        s => {
            error!("Failed to load image from url {} with status {:?}", href, s);
            Ok((None, None)) // signal: permanent failure, stop retrying
        }
    }
})
.retry(
    ExponentialBuilder::default()
        .with_min_delay(Duration::from_millis(500))
        .with_max_delay(Duration::from_secs(10))
        .with_max_times(4),
)
.when(|e| {
    e.status()
        .map(|s| s.is_server_error() || s == StatusCode::TOO_MANY_REQUESTS)
        .unwrap_or(true) // network errors (no status code) are always retried
})

Retry policy:

  • Retried: TCP/network errors, HTTP 429, HTTP 5xx — up to 4 times with 500ms–10s exponential backoff + jitter
  • Not retried: HTTP 4xx (except 429) — logged immediately, returns (None, None) via Ok so backon does not retry

Python — vl-convert-python/src/lib.rs + vl_convert.pyi

The global converter state changed from Mutex<VlConverterRs> to RwLock<Arc<VlConverterRs>>. The read lock is held only briefly to clone the Arc; conversion runs after the lock is released, so concurrent conversion calls don't block each other. set_num_workers write-locks briefly to swap in a new Arc; in-flight callers hold the old Arc and complete normally.

New public API:

vl_convert.set_num_workers(num_workers: int) -> None
vl_convert.get_num_workers() -> int
vl_convert.warm_up_workers() -> None

GIL release: Vega-Lite and Vega conversion functions (those backed by Deno/Tokio) now call py.allow_threads(|| ...) to release the Python GIL during the blocking Rust execution, enabling true concurrency when called from Python threads. (Note: svg_to_png, svg_to_jpeg, svg_to_pdf are synchronous and not affected.)

Not asyncio-compatible. These functions call block_on internally. Calling from an asyncio event loop without an executor will stall the loop. Use loop.run_in_executor(None, ...) from async contexts.

Tests and Docs

New Rust tests: test_with_num_workers_rejects_zero, test_num_workers_reports_configured_value, test_warm_up_spawns_pool_without_request, test_warm_up_is_idempotent, test_parallel_conversions_with_shared_converter.

New Python tests (test_workers.py): default count, zero-worker rejection, 16 parallel conversions with 4 workers, warmup-before-submit scenario, and set_num_workers during concurrent submissions.

Python image comparison tests now save failed images to tests/failed/ on assertion failure; CI uploads them as an artifact on job failure for post-mortem diagnosis.

README: added "Configure Worker Parallelism" section.

User-Facing Impact

  • No breaking changes. All existing Python and CLI APIs work identically.
  • Default is unchanged: 1 worker, same behavior as before.
  • Startup behavior: Worker pools start lazily by default, and callers can opt into eager startup via warm_up() / warm_up_workers() to avoid first-request initialization latency.
  • Rust API: Conversion methods changed from &mut self to &self (backwards-compatible; enables shared access without exclusive borrow).
  • Python API: set_num_workers / get_num_workers / warm_up_workers are additive.
  • No CLI flag changes. The CLI creates one converter per invocation.

Memory note: Each worker is a full Deno/V8 runtime with Vega-Lite loaded. Pool state is per VlConverter instance — reuse or clone one converter handle rather than constructing multiple instances unless independent pools are intentional. See the README's "Parallel Workers" section for usage guidance.

Review Tour

Core Change

vl-convert-rs/src/converter.rs

New types near top: WorkerPool (~line 60), ensure_v8_platform_initialized (~line 89). VlConverterInner/VlConverter are further down (~lines 1739–1750). Worker pool implementation: spawn_worker_pool, get_or_spawn_sender, warm_up, send_command_with_retry, request. RAII guards (JsonArgGuard, MsgpackResultGuard) at top of file.

vl-convert-rs/src/image_loading.rs — Retry logic with backon. No behavior change on the happy path; only affects failed/throttled fetches.

Python Integration

vl-convert-python/src/lib.rs

VL_CONVERTER declaration (global RwLock<Arc<>>), converter_read_handle, run_converter_future (clones the Arc, releases GIL, drives the future), set_num_workers / get_num_workers / warm_up_workers.

Tests and Types

vl-convert-python/tests/test_workers.py — New file. Five focused tests for the worker API.

vl-convert-python/vl_convert.pyi — Type stubs for set_num_workers, get_num_workers, and warm_up_workers.

Mechanical

Minor let mutlet changes in vl-convert/src/main.rs, test_specs.rs, test_themes.rs, and docs-only addition to README.md.

jonmmease and others added 13 commits February 20, 2026 18:26
Replace the single-threaded VlConverterRuntime with a WorkerPool that
supports configurable numbers of Deno workers. Each worker gets its own
tokio LocalSet and JsRuntime, providing true parallelism for concurrent
conversion requests.

Key changes:
- WorkerPool struct with round-robin sender selection via AtomicUsize
- spawn_worker_pool(num_workers) creates N independent worker threads
- VlConverter is now Clone (wraps Arc<VlConverterInner>)
- with_num_workers() constructor and num_workers() accessor
- V8 platform initialization is one-time via Once/call_once
- handle_command() helper centralizes command dispatch in worker loop
- Worker startup uses sync channel to propagate initialization errors

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
VlConverter is now Clone and all conversion methods take `&self` since
internal state is managed via Arc<VlConverterInner>. Drop the `mut`
binding in tests and CLI callers to resolve unused_mut warnings.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Refactor the Python bindings to use a thread-safe, parallel VlConverter:

- Switch VL_CONVERTER from Mutex<VlConverterRs> to RwLock<Arc<VlConverterRs>>
  so multiple conversion futures can run concurrently
- Add converter_read_handle() and run_converter_future() helpers to
  centralize the read-lock + block_on pattern across all conversion functions
- Allow Python's GIL to be released during blocking Deno calls via
  py.allow_threads() inside run_converter_future
- Add set_num_workers(n) / get_num_workers() to configure and inspect
  the worker pool at runtime; set_num_workers replaces the global
  converter with a freshly-spawned one
- Expose the new functions in vl_convert.pyi with NumPy-style docstrings

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Verify set/get_num_workers behavior:
- Default is 1 worker
- set_num_workers rejects zero
- Workers can be reconfigured while conversions are running
- Parallel conversions succeed with multiple configured workers

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add a "Parallel Workers" section showing how to use set_num_workers()
and get_num_workers() to scale throughput for batch workloads.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
On failure, check_png() now writes the actual and expected PNGs to
vl-convert-python/tests/failed/ so rendering differences can be
inspected. The macOS/Windows Python CI job uploads that directory
as an artifact when tests fail.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Wraps the reqwest fetch in `custom_string_resolver` with `backon`
exponential backoff (500ms–10s, up to 4 retries). Retries on network
errors and transient HTTP failures (429, 5xx); permanent errors (404,
403, etc.) short-circuit immediately without retrying.

Fixes intermittent CI failures where Wikimedia rate-limits a second
image fetch within the same test run.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
We only need tokio-sleep + std; gloo-timers is the WASM timer backend
and has no use in this crate.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…selection

Replaces the AtomicBool dispatch_pending approach with a proper
outstanding-request counter per worker. The OutstandingTicket RAII guard
increments on selection and decrements on drop, so the counter accurately
covers three lifecycle phases: waiting on a full channel, queued in the
channel, and actively executing on the worker thread.

Key design points:
- QueuedCommand wraps VlConvertCommand + OutstandingTicket so the ticket
  travels with the command through the MPSC channel; the worker drops
  _ticket after handle_command completes, not when it dequeues
- dispatch_cursor rotates the scan start to break ties without index-0 bias
- Early-exit when outstanding == 0 avoids scanning remaining workers
- Cancellation-safe: if a send future is dropped, the ticket drops with it

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Collaborator Author

@jonmmease jonmmease left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

completed my review

if: failure()
with:
name: failed-images-python-${{ matrix.options[0] }}
path: vl-convert-python/tests/failed/
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

working out intermittent image test failures was a side quest that lead to using the backon retry logic

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

retry logic for fetching images from urls was helpful for improving ci reliability, and should help end user usage as well

fn ensure_v8_platform_initialized() {
static V8_INIT: Once = Once::new();
V8_INIT.call_once(|| deno_core::JsRuntime::init_platform(None, false));
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the key to avoiding the segfault

@jonmmease jonmmease marked this pull request as ready for review February 21, 2026 15:11
@jonmmease jonmmease merged commit 26ce1c7 into main Feb 21, 2026
12 checks passed
@jonmmease jonmmease deleted the jonmmease/worker-pool branch February 21, 2026 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant