Skip to content

Conversation

@pschyska
Copy link

@pschyska pschyska commented Oct 2, 2025

Proposed changes

As mentioned in #110, my work on making Scheduler.schedule() thread-safe.

1. Problem statement

In ngx-rust, the async executor is built on async-task.
async-task has strict safety requirements:

The Scheduler must either:

  1. be Send + Sync, or
  2. guarantee that all:
    • Runnable::run() calls,
    • Waker invocations,
    • Waker drops
      happen on the same thread.

The previous implementation violated this.

Previous behaviour

  • The Scheduler was declared:

    unsafe impl Send for Scheduler {}
    unsafe impl Sync for Scheduler {}
  • schedule() could be called from any thread:

    • tokio runtime threads,
    • I/O completion threads,
    • any foreign thread that holds a Waker.
  • And critically, schedule() would sometimes call runnable.run() directly on that foreign thread.

That breaks both async-task’s and nginx’s assumptions:

  • nginx objects like ngx_http_request_t are explicitly non-Send and must stay on the event loop thread,
  • yet the executor was polling futures that captured these objects from arbitrary threads.

This is undefined behaviour. In practice we observed:

  • segfaults,
  • memory corruption,
  • sporadic crashes under load.

This is not a theoretical concern; it’s a real safety bug.


2. What this PR changes

The fix is intentionally minimal and local to the executor, and keeps existing APIs intact.
The performance impact on Futures that are implemented in terms of native nginx I/O, like
nginx-acme's PeerConnection
is low: as the Wakers are called from the event loop thread, we are still in the
fast-path and simply .run() them, as before. The only visible change is checking the ngx_thread_tid,
which should be cheap.

2.1. Scheduler design

The new design introduces:

  • a global Scheduler with a crossbeam_channel queue:

    struct Scheduler {
        rx: Receiver<Runnable>,
        tx: Sender<Runnable>,
    }
  • a MAIN_TID: AtomicI64 which stores the nginx event thread id the first time the event handler runs:

    static MAIN_TID: AtomicI64 = AtomicI64::new(-1);
    
    fn on_event_thread() -> bool {
        let main_tid = MAIN_TID.load(Ordering::Relaxed);
        let tid: i64 = unsafe { ngx_thread_tid().into() };
        main_tid == tid
    }
    
    extern "C" fn async_handler(ev: *mut ngx_event_t) {
        let tid = unsafe { ngx_thread_tid().into() };
        let _ = MAIN_TID.compare_exchange(-1, tid, Ordering::Relaxed, Ordering::Relaxed);
    
        // eventfd drain when enabled (see below) …
    
        // drain queue and run tasks
        while let Ok(r) = scheduler.rx.try_recv() {
            r.run();
        }
    }
  • a schedule() function that chooses between:

    • a fast path when already on the event loop thread, and
    • a queued wakeup when called from any other thread or during reentrancy:
    fn schedule(runnable: Runnable, info: ScheduleInfo) {
        let scheduler = scheduler();
        let on_evt = on_event_thread();
    
        // If we are already on the event loop thread and this is not a re-entrant
        // wakeup, we can safely run the task inline.
        //
        // Otherwise, we enqueue the task and notify the nginx event loop.
        if on_evt && !info.woken_while_running {
            runnable.run();
        } else {
            scheduler.tx.send(runnable).expect("send");
            notify();
        }
    }

Key properties:

  • Only the nginx event thread ever calls Runnable::run():

    • If schedule() is called on the event thread and this is not a woken_while_running re-entrant wakeup, we run inline.
      NOTE: the reentrant case is mostly pathological (e.g. a future that call yield_now() repeatedly,
      and this is just to prevent a stack overflow — a future schedule()'ing itself on the main thread will promptly .run() itself in a new stack frame otherwise.
    • Otherwise we enqueue and let async_handler (on the event thread) drain the queue and run tasks.
  • When schedule() is called from foreign threads, they only:

    • push into the queue, and
    • trigger notify() to wake the event loop.

This PR contains two implementation of notify(): a kill(pid, SIGIO) approach for
compatibility, and an epoll-with-eventfd specific approach in the spirit of ngx_notify
without actually using that — as
ngx_notify clobbers the static notify_event.data
and ngx_threads call that outside of any lock,
any other usage — "nginx internal" or not — would break that.

The eventfd-based mode is behind the optional feature "async_eventfd", which requires
cfg ngx_feature = "have_eventfd", and will be more efficient by integrating more
tightly with nginx' epoll loop, but the safety invariant is the same in both modes.

This satisfies async-task’s contract:

All Runnable::run() and waker traffic is confined to the nginx event loop thread. All Futures can safely use objects like ngx_http_request_t, regardless of which thread they might get woken up from.

Blanket unsafe impl {Send,Sync} … {} are avoided, because this led to the previous
Scheduler UB (unintentionally moving threads) to be unnoticed.


3. About async_compat and “moving runtimes”

The concern was that using async_compat might cause tasks to “run on tokio without us noticing”.

This design makes that impossible.

  • The executor is ngx_rust::async_. It owns and polls all tasks.
  • async_compat is only used so that tokio-based I/O types can be awaited.
  • When a tokio resource becomes ready:
    • tokio calls the Waker,
    • the Waker calls our schedule() function,
    • schedule() either:
      • runs the task inline (if already on the nginx event thread and non-reentrant), or
      • enqueues it and calls notify() (foreign thread / re-entrant case).

Tokio never:

  • polls our futures,
  • runs our tasks,
  • or moves ngx_http_request_t to its own threads.

The only cross-thread boundary is the wakeup itself, which this PR makes strictly enqueue-and-notify, never run.


4. Why this matters

Without this PR:

  • The executor runs tasks from arbitrary threads.
  • async-task’s safety guarantees are violated.
  • nginx’s non-Send objects are used from threads they don’t belong to.
  • This results in undefined behaviour and has already caused real crashes.

With this PR:

  • All async tasks run on the nginx event loop thread.
  • schedule() is thread-safe and main-thread-affine.
  • async-task’s invariants are respected.
  • nginx’s threading model is respected.
  • The change is local, preserves public APIs, and removes a real production crash class.

TL;DR

  • Old executor: sometimes runs tasks on foreign threads → UB.
  • New executor:
    • uses a queue + notify() to hop from foreign threads to nginx’s event loop,
    • optionally uses eventfd for wakeups instead of SIGIO,
    • and only runs Runnable::run() on the nginx event thread (with an inline fast path when already there).
  • This aligns with both async-task’s model and nginx’s threading constraints, fixing a real safety bug with a minimal change.

Checklist

Before creating a PR, run through this checklist and mark each as complete.

  • I have written my commit messages in the Conventional Commits format.
  • I have read the CONTRIBUTING doc
  • I have added tests (when possible) that prove my fix is effective or that my feature works (don't think it's possible)
  • I have checked that all unit tests pass after adding my changes
  • I have updated necessary documentation
  • I have rebased my branch onto main
  • I will ensure my PR is targeting the main branch and pulling from my branch from my own fork

@pschyska pschyska force-pushed the main branch 2 times, most recently from e327e07 to e1c9191 Compare October 6, 2025 13:06
@pschyska pschyska changed the title RFC: thread-safe spawn with ngx_notify thread-safe spawn with ngx_notify Oct 6, 2025
@pschyska pschyska force-pushed the main branch 5 times, most recently from 53f60c3 to 4a95650 Compare October 8, 2025 13:25
@bavshin-f5
Copy link
Member

  • ngx_notify is "thread-safe" under a very narrow set of conditions. One of those is that nobody outside of the nginx internal code is allowed to call it.
    Check ngx_epoll_module.c:769 and consider what would happen if multiple modules will start invoking ngx_notify() with different handler methods.
  • I don't want to allow mixing internal and external async runtimes or encourage use of threads. Both seem to be fragile and dangerous.
    I don't even believe you need to mix both runtimes: if you intend to use tokio, just run all the asynchronous code in the tokio task.
  • This change would break or make significantly slower any IO implementation that properly integrates with the nginx event loop (such as hyper client in nginx-acme).

@pschyska
Copy link
Author

pschyska commented Nov 7, 2025

  • ngx_notify is "thread-safe" under a very narrow set of conditions. One of those is that nobody outside of the nginx internal code is allowed to call it.
    Check ngx_epoll_module.c:769 and consider what would happen if multiple modules will start invoking ngx_notify() with different handler methods.

I see it now.

  • I don't want to allow mixing internal and external async runtimes or encourage use of threads. Both seem to be fragile and dangerous.

If it's guaranteed that all tasks run on the main thread, I don't think it's dangerous. This change only allows scheduling from other threads. It's not uncommon that libraries start their own helper threads, for instance. async-compat starts a transparent tokio runtime in a thread for IO completion handlers, while still using our executor for the tasks.

I also can image situations where you'd want to start non-IO compute in a thread pool to not block nginx - in our case, for example, crypto. You'd want to be able to notify the request handler async task of completion by writing to a channel or a similar mechanism. This, in turn, would call the waker from that thread (AFAIK), which calls schedule for the task from that thread, but the woken task would be scheduled to run on the main thread via ngx_notify.

I don't even believe you need to mix both runtimes: if you intend to use tokio, just run all the asynchronous code in the tokio task.

We need to work with the request heavily (mutate headers_in and headers_out, read client bodies, produce response bodies) in response to I/O (external requests, database queries, custom crypto/tunneling), which can only be done on the main thread safely. If all our code is running in a completely separate engine, it all becomes extremely hard. In addition, we need a way to interrupt nginx' epoll reacting I/O events, which aren't all bound to a request (OpenID shared signals, e.g.).
async-compat seemed like a good compromise to me: use the tokio "runtime" (I/O setup,...) , but with the ngx-rust scheduler/executor.

  • This change would break or make significantly slower any IO implementation that properly integrates with the nginx event loop (such as hyper client in nginx-acme).

I don't think it would do that. If the waker is invoked from the main thread, schedule in my branch would simply .run() the runnable, and everything stays on the main thread. ngx_notify would not be called (except once during the lifetime of a worker process because it's not known which tid is main). I have to admit I didn't test with nginx-acme yet though.

To recap, I'd still like the following:

  • A way to interrupt epoll
  • A way to move tasks to the main thread
  • Safe to call schedule from other threads

Given ngx_epoll_module.c:769, ngx_notify from other threads is indeed inherently unsafe.

However, what if we do this:

  • ngx_post_event a custom event, its handler being notify_handler
  • write(notify_fd, &inc, sizeof(uint64_t)) to interrupt epoll. The event loop would then find our custom event promptly.

Would this work for you?

@pschyska pschyska force-pushed the main branch 2 times, most recently from 7a736f8 to 35f97ff Compare November 7, 2025 16:59
@pschyska
Copy link
Author

pschyska commented Nov 7, 2025

@bavshin-f5 I've rewritten the code to not rely on ngx_notify. Instead, I'm using ngx_post_event, followed by pthread_kill(main_thread, SIGIO) as I had a hard time getting the notify_fd from within ngx-rust. Does that address your concern?
schedule still can be called from other threads, e.g. from a waker, and moves the tasks to the main thread. The SIGIO ensures prompt reaction.

@bavshin-f5
Copy link
Member

If it's guaranteed that all tasks run on the main thread, I don't think it's dangerous. This change only allows scheduling from other threads. It's not uncommon that libraries start their own helper threads, for instance. async-compat starts a transparent tokio runtime in a thread for IO completion handlers, while still using our executor for the tasks.

Ah. I got why you assume that this is safe. I don't believe it is, and I expect that some of your code is quietly being scheduled on a tokio executor in another thread. async-compat is not the kind of magic that can override tokio scheduling, it merely allows creating and polling certain tokio types outside of the runtime-owned thread.
I also suspect that tokio is not quite prepared for deallocation of seemingly exclusively owned objects from a thread outside of the runtime.

The only approach I would consider safe is where nothing owned by a request or a cycle pool is allowed to move to another runtime, either accidentally or intentionally. Many things we do are lacking such protection because we assume single-threaded environment.

@pschyska
Copy link
Author

pschyska commented Nov 7, 2025

If it's guaranteed that all tasks run on the main thread, I don't think it's dangerous. This change only allows scheduling from other threads. It's not uncommon that libraries start their own helper threads, for instance. async-compat starts a transparent tokio runtime in a thread for IO completion handlers, while still using our executor for the tasks.

Ah. I got why you assume that this is safe. I don't believe it is, and I expect that some of your code is quietly being scheduled on a tokio executor in another thread. async-compat is not the kind of magic that can override tokio scheduling, it merely allows creating and polling certain tokio types outside of the runtime-owned thread. I also suspect that tokio is not quite prepared for deallocation of seemingly exclusively owned objects from a thread outside of the runtime.
The only approach I would consider safe is where nothing owned by a request or a cycle pool is allowed to move to another runtime, either accidentally or intentionally. Many things we do are lacking such protection because we assume single-threaded environment.

I don't claim to fully understand it, but they state:

"Otherwise, a new single-threaded runtime will be created on demand. That does not mean the future is polled by the tokio runtime ."

The tokio runtime could spawn their own tasks into that runtime, sure. e.g some kind of helper task. But I don't see how my task could end up there. If my tasks Runnable.schedule() arrange it to be scheduled on the event thread, which is precisely what my PR does, it will run just there.

I'm not an expert, but I think what happens is this:

  • async_::spawn(my_task)
  • event handler starts running it (part1) until await:
    • reqwest.get(...).await
      • reqwest.get(...) is polled -> Pending, waker is set to my_task(part2).schedule()
        • tokio runtime thread things happen, ..., eventually waker is called (from that thread! which is why I want schedule() to work from other threads)
        • my_task(part2).schedule() is our Scheduler.schedule(), will post an event and push the Runnable to the queue
        • my_task(part2) runs on the main thread

This is what I see right now, using the code from the PR. This is also what I'd expect to happen with a "sidecar"-tokio-runtime that I started myself (no async-compat).

@pschyska pschyska changed the title thread-safe spawn with ngx_notify async: thread-safe schedule() Nov 7, 2025
@pschyska
Copy link
Author

pschyska commented Nov 7, 2025

I just pushed an experiment with a sidecar tokio runtime and added tid debug logging here: https://github.com/pschyska/ngx-rust/blob/a5ff1bb0cc3e6d5bb15f46e24348a1d2fa694f18/examples/async.rs#L115
What I see is this:

2025/11/07 23:27:02 [debug] 494044#494044: async: spawning new task
!!! schedule tid=494044
!!! run eager tid=494044
!!! async entry, tid=494044
!!! external task entry, tid=494047
!!! schedule tid=494047
!!! run handler tid=494044
!!! async resume, tid=494044, result=42
!!! schedule tid=494046
!!! run handler tid=494044
!!! after await tid=494044

This supports my theory: my task is never moved to the tokio runtime. It calls schedule from its own threads though - when using tokio::spawn from the thread of the runtime (494047), when awaiting tokio::time::sleep directly, from the sleep-thread, presumably. However, code in my task always runs in the event thread.

I've also pushed a change to main to switch to kill and nginx_thread_tid. It works fine also.

@pschyska
Copy link
Author

pschyska commented Nov 8, 2025

If it's guaranteed that all tasks run on the main thread, I don't think it's dangerous. This change only allows scheduling from other threads. It's not uncommon that libraries start their own helper threads, for instance. async-compat starts a transparent tokio runtime in a thread for IO completion handlers, while still using our executor for the tasks.

Ah. I got why you assume that this is safe. I don't believe it is, and I expect that some of your code is quietly being scheduled on a tokio executor in another thread. async-compat is not the kind of magic that can override tokio scheduling, it merely allows creating and polling certain tokio types outside of the runtime-owned thread. I also suspect that tokio is not quite prepared for deallocation of seemingly exclusively owned objects from a thread outside of the runtime.

The only approach I would consider safe is where nothing owned by a request or a cycle pool is allowed to move to another runtime, either accidentally or intentionally. Many things we do are lacking such protection because we assume single-threaded environment.

I just had another idea that helped me visualize this:

If Futures !Send could move executors at will, it would be able for them to end up in an executor that requires Send (and/or Sync).

E.g.: if the "part-2" future of my task, after awaiting a future from a tokio runtime, would magically run in a tokio executor using threads somehow, it would have to be Send. But If I used e.g. async_task::spawn_local, it could be just 'static. The compiler would not compile that code. (of course, crucial parts of an executor are unsafe, but this would still make this behaviour wildly illegal in Rust).

I don't know of any method of making a task move executors. If wanted to connect futures of different executors beyond their output for some reason (e.g. to be able to cancel the other task), I would use a remote_handle. But AFAIK this doesn't change the Context (which ties back to schedule() and task), but establishes an oneshot between the tasks.

We could use spawn_local instead of spawn_unchecked (which would store Rust's thread id and check that it is the same on .run()), but this is unnecessary overhead in this case, it simply can't happen. The example code I wrote which leads to waking from other threads all the time still runs fine with spawn_unchecked.

Another angle on this - the spawn_unchecked docs state:

Safety

  • If Fut is not 'static, borrowed non-metadata variables must outlive its Runnable. and: If schedule is not 'static, borrowed variables must outlive all instances of the Runnable's Waker.
    ✅ doesn't apply: we require 'static for the Future and Scheduler is 'static (current and my PR)
  • If Fut is not [Send], its [Runnable] must be used and dropped on the original thread
    ✅ run() is only called on the event thread (current and my PR), which is what "used and dropped" implies, I believe, according to the language used in the introduction.
  • If schedule is not Send and Sync, all instances of the Runnable's Waker must be used and dropped on the original thread.
    currently: ❗schedule is claimed to be Send + Sync, but it is not. It must not be called from another thread (and by extension Wakers, that call Runnable.schedule()). The fact that I'm even able to do it (e.g. accidentally by using async-compat, or manually by polling myself and calling Wakers, etc. ...) indicates an issue. Currently though, the Runnable will be .run() on an arbitrary thread. As there is no way to communicate that requirement in the type system, a runtime check would have been required (e.g.: spawn_local).
    PR: ✅ (IMHO :-) schedule is Send + Sync. The event is only mutated to update the log to the current ngx_cycle_log, and it is guarded with the RwLock. If not for that fact, the event could be `static actually (we only use it to communicate the static callback address) and there would be no need for the "unsafe impl"'s.

I think I have now fully convinced myself, let me know if this helps to convince you as well 🙂

@pschyska
Copy link
Author

@bavshin-f5 Did you have a chance to take a look?
Cheers

@pschyska pschyska force-pushed the main branch 3 times, most recently from f7fdbeb to ce8ba80 Compare November 21, 2025 14:44
schedule() can now be called from any thread, but will move tasks to the event loop
thread. pthread_kill(main_thread, SIGIO) is used to ensure prompt reponse if needed.

This enables receiving I/O notification from "sidecar runtimes" like async-compat, for
instance.

The async example has been rewritten to use async_::spawn, demonstrating usage of
reqwest and hyper clients wrapped in Compat to provide a tokio runtime environment while
using the async_ Scheduler as executor.
@pschyska pschyska force-pushed the main branch 3 times, most recently from 1f53296 to c8b35bc Compare December 2, 2025 22:18
@pschyska
Copy link
Author

pschyska commented Dec 3, 2025

@bavshin-f5 I've addressed your concerns in the updated summary. I also added an eventfd-based approach as an alternative to the SIGIO method, behind optional feature async_eventfd, requiring ngx_feature = "have_eventfd". Both approaches fix the UB, and work well with the updated async example, and in our project.
Can we get this PR moved along soon? We are relying heavily on it, and our module will become public code and be used as a reference implementation. We need to know if we will be getting this fix upstream, or have to maintain a fork.

Cheers,
Paul

@bavshin-f5
Copy link
Member

@pschyska I'm still very much not interested in allowing use of threads in the library code. I'm convinced that it is not possible to make such interface safe for use within nginx and general enough, so I would prioritize single-threaded scenarios. No threads — no problems.
See also https://nginx.org/en/docs/dev/development_guide.html#threads_pitfalls.

I would accept certain independent building blocks that allow making thread-aware scheduler in your module easier, such as a good notify implementation. The one in this PR is unfortunately not good enough: sigio implementation is dangerously unsafe (note that ngx_post_event is not atomic), and eventfd is non-portable.

Given the list of platforms we care about, the prospective notify implementation must support kqueue, eventfd with epoll/poll/select event methods, socketpair fallback with any event method on Unix-like systems, and something else for Windows.


I don't have enough time to cover everything, but I still want to address some points:

The performance impact on Futures that are implemented in terms of native nginx I/O, like
nginx-acme's PeerConnection
is low: as the Wakers are called from the event loop thread, we are still in the
fast-path and simply .run() them, as before.

schedule most certainly regresses single-threaded use case and makes it less resilient under low memory conditions. You even mentioned the specific scenario, but dismissed it as "pathological".
woken_while_running() branch is there because it is important, and we have an internal project that hits this branch intentionally way more often than nginx-acme. We managed to reduce the cost of this path to a single allocation, and plan to get rid of it eventually.

Tokio never:

  • polls our futures,
  • runs our tasks,
  • or moves ngx_http_request_t to its own threads.

examples/async.rs in this PR spawns new tokio tasks on tokio runtime thread. It does not share any request-owned data with these tasks at the moment, but the only thing that prevents this on a type system level is a Send requirement. Which is both easy to bypass, and has to be bypassed sometimes just because Rust type and ownership system is a bad match for callback-based FFI with objects owned by the caller and passed by pointer.

To reiterate: any library that assumes tokio environment is able to spawn tasks in another thread and move anything Sendable to these tasks. async-compat will not protect you from that.

@pschyska
Copy link
Author

pschyska commented Dec 4, 2025

I would accept certain independent building blocks that allow making thread-aware scheduler in your module easier, such as a good notify implementation. The one in this PR is unfortunately not good enough: sigio implementation is dangerously unsafe (note that ngx_post_event is not atomic), and eventfd is non-portable.

What makes ngx_post_event not atomic? It looks like it does this:

if (!(ev)->posted) { 
  (ev)->posted = 1;
  ngx_queue_insert_tail(q, &(ev)->queue);
  // logging
}

Is the ngx_queue_insert_tail call alone atomic? We could allocate a fresh ngx_event_t each time instead of using a static one to post, if the issue is the posted=1. I had this variant in one of my earlier commits (force-pushed over by now...) and it worked without issues for me also.

The performance impact on Futures that are implemented in terms of native nginx I/O, like
nginx-acme's PeerConnection
is low: as the Wakers are called from the event loop thread, we are still in the
fast-path and simply .run() them, as before.

schedule most certainly regresses single-threaded use case and makes it less resilient under low memory conditions. You even mentioned the specific scenario, but dismissed it as "pathological". woken_while_running() branch is there because it is important, and we have an internal project that hits this branch intentionally way more often than nginx-acme. We managed to reduce the cost of this path to a single allocation, and plan to get rid of it eventually.

In my mind Futures poll ready, until they can't because something external needs to happen, i.e. a series of run()s, at which point they install the waker and suspend. Then the external thing happens, calls .schedule() once and the state machine keeps moving again. A future must deliberately suspend in a state its not Pending for it to immediately run again. I actually test this in the updated example: yield_now() produces such a state. The example does this 1000 times, and the async code logs that it had processed 1001 items afterwards (because the channel batches this). Can you share the snippet of your future that hits this case? What makes you think this is expensive memory-wise? The Runnable should be a light object, containing just a pointer to the Future, to serve as a token to schedule to poll it again.

But anyways, this is besides the point, as it is not a difference between the old and new behavior when scheduled on the main thread (e.g. PeerConnection future):

Old; not woken while running

  • run()

Old, woken while running:

  • push Runnable to queue
  • post event
  • event handler run()s

New; not woken while running

  • ngx_thread_tid: on main thread?
  • yes; run()

New; woken while running

  • push Runnable to channel
  • post event or write eventfd
  • event handler run()

The only difference is self.queue.push_back vs self.tx.send(runnable), which are roughly equivalent, and an additional ngx_thread_tid() == MAIN_TID.load(Ordering::Relaxed), which shouldn't break the bank.

Tokio never:

  • polls our futures,
  • runs our tasks,
  • or moves ngx_http_request_t to its own threads.

examples/async.rs in this PR spawns new tokio tasks on tokio runtime thread. It does not share any request-owned data with these tasks at the moment, but the only thing that prevents this on a type system level is a Send requirement. Which is both easy to bypass, and has to be bypassed sometimes just because Rust type and ownership system is a bad match for callback-based FFI with objects owned by the caller and passed by pointer.

To reiterate: any library that assumes tokio environment is able to spawn tasks in another thread and move anything Sendable to these tasks. async-compat will not protect you from that.

I think this is a misunderstanding: I does not spawn Tasks using tokio. The only Tasks spawned are in the ngx-rust executor. What it does is that it starts background helper threads for I/O, that might schedule() Tasks on our executor, hence the importance of schedule() being thread-safe.
If you are thinking about this spawn, it runs on the ngx-rust executor. It cannot, "physically", run on another. It is perfectly safe to use ngx_http_request_t and other items here. I even posted a debug session earlier that showed that while wakeups come from other threads, each invocation of my Future is done on the main thread.

To get a Task on the tokio runtime, you have to call tokio::spawn(). We actually do this in our product, and Send is precisely the stop the language uses to prevent you from referring to e.g. &mut Request or ngx_http_request_t, because this makes the Future !Send. You could use tokio::spawn_local, which uses a thread-local scheduler, as this does not schedule to other threads. It would still be safe to use main-thread owned data, as we stay on the main thread. spawn_local doesn't require Send. We don't have to force Send for ngx_http_request_t ever. What we have to unsafely assert is that it is 'static, but this is an orthogonal concern entirely.

The language just works here, but the current Scheduler breaks this by allowing run() to move threads.

In our product, there is a way to establish an encrypted channel on top of tls using custom cryptography to implement a healthcare system requirement. After a 2 roundtrip handshake, you can encrypt a raw HTTP request and send it through the tunnel. Our nginx module decrypts it and unwraps it, and sends the inner request to itself. It then awaits the response, encrypts it and sends it back.
One wrapped-request interaction looks like this, async-wise:

  • first handshake message comes in (M1)
  • access phase handler checks bearer JWT and DPOP data. This involves getting an async lock for a worker-local JWKS cache, and a potential reqwest to refresh the cache.
  • content phase handler reads the client request body by async awaiting a native Future (i.e. wrapping ngx callbacks). This actually involves ngx threads because we use "aio threads", when the body is too large.
  • content phase handler takes a ngx rwlock to a crypto session cache backed by a ngx shm_zone, shared by all workers. As this uses ngx_spinlock, I spawn it (explicitly!) in a tokio thread to not introduce a deadlock potential. I await this foreign future in the main future, so the rest runs on the main thread again.
  • M2 is produced and sent back, and the step is repeated once more (including the access phase checks) to receive Message3 and produce Message4, which concludes the handshake
  • The client then sends an encrypted request, which goes through the access phase steps, body reading, shm locking, crypto. It then sends itself the wrapped request, which also goes at least through the access phase steps, and ultimately hits a native proxy_pass upstream. We use reqwest right now to do this inner request and await it in the outer, then update GCM counters, AEAD etc. the response and send it back.

As you can see, there are dozens of async calls of different types for one "round-trip", and I don't believe it could have been done without a thread-safe scheduler.
I'm running a load test on my local laptop:
I make 32 tasks in the client, multiplexed over 8 threads, which do the 2 handshake requests and a wrapped http request in a loop. nginx is configured to "return 200;" on the location that the wrapped requests target for this test, so there is no networking done in the upstream, and the scheduler is hit as hard as possible. This achieves ~7000 requests per seconds with 8 nginx workers, while the workers stay below 20MiB rss. There are no segfaults or other issues. It even involves ngx threads, so there doesn't seem to be an issue wrt. interactions there.

This holds for both the SIGIO and the eventfd based notification method, so while the SIGIO approach might be technically unsafe as you say, the problems doesn't appear in this test. Interestingly, the eventfd-based approach doesn't seem to be faster. Both notification methods work flawlessly and the bottleneck is somewhere else (almost certainly the cryptography).

So I'm pretty confident that this can be made to work, and I experienced many surprises, ie. segfaults, along the way. Hence the rapid iterations on this PR 🙂 , but this is at least very close, in my estimation.

I can't share the full code we are working on yet, unfortunately, but it will be published to https://github.com/gematik/zeta-guard-ngx-pep soon. Right now the repo only contains an early version of the access phase stuff, but this already broke with the current Scheduler, and was lacking a notify method leading requests to hang indefinitely until nginx gets woken up by something else happening.

The code doesn't have to live in ngx-rust though, it could be an add-on crate. I just thought it would be a good addition to the core, because in my opinion, in the current state, spawn() is extremely dangerous to use for newcomers. We tried to use it as soon as we hit the first use-case where we didn't want to block the main thread, and ran into hard to debug segfaults because our task was moved to some helper thread by the current Scheduler. In our case it was via async-compat, but it will happen any time any async code, e.g. from libraries, relies on the fact that Schedulers must be thread-safe and calls schedule() from some helper thread. As Scheduler must be Send (and invalidly asserted to be by your Scheduler right now), this is a problem. And of course it would be better if any I/O primitive could be natively implemented with nginx primitives, but we don't have that, so I don't share your opinion that threads are bad per-se. Even nginx uses them sometimes to unblock things, and the native Future approach is only feasible if the kind of I/O can be represented with nginx primitives and callbacks in the first place, and helper threads aren't as scary in Rust as in C as it is a safe language.

So,

  1. I'll grant you that a "native" without helper threads would be preferrable, if possible, but it's not a correctness issue - Tasks don't move threads, .run() is guaranteed to run on the main thread in the PR. The tokio runtime (i.e. an io completion thread) could not even .run() them, because it doesn't get the Runnable, it only calls the Wakers, which call .schedule(), not .run(). A !Send future cannot move threads in this PR, where it could previously.
  2. We need a reliable notify method, so maybe we can focus on that here, and we put our spawn implementation somewhere else (although they are related, I don't know how you'd do this).

@bavshin-f5
Copy link
Member

Is the ngx_queue_insert_tail call alone atomic?

...

No, of course it is not. Modification of a doubly linked list cannot be atomic. It's two pointers and two memory write operations.
Allocating a fresh event does not help here, because in the end you still modify a global doubly linked list in a non-thread-safe way.

Can you share the snippet of your future that hits this case?

I'm afraid I'm not allowed to. But yield_now() plays an important role in this code.

Rust's async is essentially a cooperative multitasking implementation. If you let a single task to run for a while without interruptions, it will start affecting other tasks. In a very degraded case, it may lead to accept queue overflow and loss of incoming connections. That is why certain tasks, such as script or bytecode interpreters, must limit the uninterrupted execution time and yield control periodically. And the time quantum allocated here directly affects the responsiveness of the server.

What makes you think this is expensive memory-wise?
The only difference is self.queue.push_back vs self.tx.send(runnable), which are roughly equivalent.

We want to have zero allocations on this path. Or one, but fallible with correct handling. The current implementation of task queuing is suboptimal and temporary.

crossbeam-channel is actually quite efficient. I mixed it up with another implementation that made multiple allocations per message.
It still allocates a Block, and does not handle allocation failures. If we switch to this library, we will lose means to handle this gracefully. It also has several strong atomic operations per message, and these are relatively cheap but not free.

I think this is a misunderstanding: I does not spawn Tasks using tokio. The only Tasks spawned are in the ngx-rust executor.

Even if you don't, reqwest does. It spawns a new tokio::task internally to poll the Connection. Now, I'm 98% sure that in this specific case no data from the current thread reaches the tokio task, and the cancellation is propagated correctly, so the tokio task will not call a deleted Waker from the main task.
Cannot be fully sure, because I don't remember how tokio handles cancellation, and because I know that async-task may detach() under certain conditions when the task handle is dropped.

You could use tokio::spawn_local, which uses a thread-local scheduler, as this does not schedule to other threads. It would still be safe to use main-thread owned data, as we stay on the main thread. spawn_local doesn't require Send.

I need to remind you that tokio::spawn_local requires a LocalRuntime, which is a full tokio event loop running in the current thread. I do hope you realize how bad that sounds.

content phase handler takes a ngx rwlock to a crypto session cache backed by a ngx shm_zone, shared by all workers. As this uses ngx_spinlock, I spawn it (explicitly!) in a tokio thread to not introduce a deadlock potential. I await this foreign future in the main future, so the rest runs on the main thread again.

Hm... This step actually looks suspicious. If you had to do this, you seem to have an issue with lock scope and contention. Offloading the locking to a thread only adds increased latency to the list of issues.

Our SSL session cache is shared between worker processes and protected with a spinlock, and it's not a bottleneck.

The code doesn't have to live in ngx-rust though, it could be an add-on crate. I just thought it would be a good addition to the core, because in my opinion, in the current state, spawn() is extremely dangerous to use for newcomers. We tried to use it as soon as we hit the first use-case where we didn't want to block the main thread, and ran into hard to debug segfaults because our task was moved to some helper thread by the current Scheduler. In our case it was via async-compat, but it will happen any time any async code, e.g. from libraries, relies on the fact that Schedulers must be thread-safe and calls schedule() from some helper thread. As Scheduler must be Send (and invalidly asserted to be by your Scheduler right now), this is a problem

A good addition would be to ensure that the existing spawn() can only be used in a single-threaded environment. It just cannot be enforced without thread local storage and std dependency. And thread-local variables are expensive, as it's not possible to generate an efficient implementation for non-executable compilation targets.

Sometimes I wish we could just mark this project as explicitly incompatible with std and enforce no_global_oom_handling to prevent people from doing a wide variety of dangerous things. Overall Rust ecosystem is not mature enough for that, though, and many foundational crates are not compatible with such configuration.

And of course it would be better if any I/O primitive could be natively implemented with nginx primitives, but we don't have that, so I don't share your opinion that threads are bad per-se. Even nginx uses them sometimes to unblock things

And that's exactly how we know that threads are bad, not compatible with nginx architecture and should be avoided if possible :(


We are (slowly) working on the native async wrappers for sockets, subrequests, request bodies etc, which hopefully should address some of the usability concerns you raise.
It is unfortunate that the timelines don't match and I can't offer you these tools right now. But I still believe that native primitives is the right direction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants