Conversation
|
Is there a way to run the throughput test using non-durable queues? |
|
😱 didn't get a coredump unfortunately, will try to reproduce again built with |
|
Ah, are not getting a core dump becaise we don't reset the SEGV signal handler in |
|
Maybe this is a crystal 1.16.0 issue, this is the coredump backtrace: |
|
With debug symbols compiled in: |
|
Hmm not good.. forgot to take notes here last week, so doing it now: last week me & @spuun saw multi-thread issues the throughput tool, we think we have it down to mqtt-client. |
I narrowed that down to crystal-lang/crystal#15647 |
|
This might be caused by crystal-lang/crystal#15647. With the fix from crystal-lang/crystal#15650 As explained in crystal-lang/crystal#15658 (comment) this means that a thread's evloop is trying to read or write, while another thread's evloop is already having a pending read or write. |
|
Or maybe there's an issue with the fix. I'm investigating to understand 🕵️ |
|
The fix exposes a limitation from the "lifetime event loop": we can't share a Problem is, those two fibers can't end up in two different event loops at the same time. This is a known limitation of the "lifetime" design. Due to The situation can be avoided for It can also be avoided with a |
|
I started a MT execution context + the patch from crystal-lang/crystal#15647 and I can't reproduce the "can't transfer fd" bug nor the SEGFAULT. diff --git a/src/lavinmqperf/mqtt/throughput.cr b/src/lavinmqperf/mqtt/throughput.cr
index 5bff1522..e5363a5f 100644
--- a/src/lavinmqperf/mqtt/throughput.cr
+++ b/src/lavinmqperf/mqtt/throughput.cr
@@ -100,16 +100,18 @@ def initialize
def run
super
+ mt = Fiber::ExecutionContext::MultiThreaded.new("MQTT", maximum: 4)
+
done = WaitGroup.new(@consumers + @publishers)
@consumers.times do |i|
- spawn { reconnect_on_disconnect(done) { consume(i) } }
+ mt.spawn { reconnect_on_disconnect(done) { consume(i) } }
end
sleep 1.seconds # Give consumers time to connect
@publishers.times do |i|
- spawn { reconnect_on_disconnect(done) { pub(i) } }
+ mt.spawn { reconnect_on_disconnect(done) { pub(i) } }
end
if @timeout != Time::Span.zero |
|
Ok, we "fixed" mqtt-client.cr and spawned the |
|
Allright. I'll try to reproduce. It sounds like a |
|
Oh, I missed that the socket opened on another thread from the |
|
But then you're using a single thread for everything, so you're back to ST. You should be able to open in a thread/context, do some things (like authentication) then pass the Having the |
|
I patched mqtt-client/connection, so both fibers would be on the same thread: spawn same_thread: false do
spawn read_loop, name: "mqtt-client:read_loop", same_thread: true
spawn message_loop, name: "mqtt-client:message_loop", same_thread: true
endBut I still get EDIT: there's a 3rd unnamed fiber |
|
Yes, isn't it the lavinmqperf that calls |
|
Maybe should have a |
|
No, it was me not thinking correctly. I took a break for dinner and realized we need the fix (to be released in 1.16.1) for the evloop timers to work correctly with MT, otherwise we get canceled timeouts on the wrong event loop and that leads to the above segfaults and nil wake_at. |
|
For this PR's sake, and because its good to be close to the packets, i'll go with using mqtt-protocol instead of mqtt-clent. this change is interesting though, running |
d71a82b to
c8430a1
Compare
There was a problem hiding this comment.
Copilot wasn't able to review any files in this pull request.
Files not reviewed (19)
- src/lavinmqperf.cr: Language not supported
- src/lavinmqperf/amqp/bind_churn.cr: Language not supported
- src/lavinmqperf/amqp/channel_churn.cr: Language not supported
- src/lavinmqperf/amqp/connection_churn.cr: Language not supported
- src/lavinmqperf/amqp/connection_count.cr: Language not supported
- src/lavinmqperf/amqp/consumer_churn.cr: Language not supported
- src/lavinmqperf/amqp/queue_churn.cr: Language not supported
- src/lavinmqperf/amqp/queue_count.cr: Language not supported
- src/lavinmqperf/amqp/throughput.cr: Language not supported
- src/lavinmqperf/bind_churn.cr: Language not supported
- src/lavinmqperf/channel_churn.cr: Language not supported
- src/lavinmqperf/connection_churn.cr: Language not supported
- src/lavinmqperf/connection_count.cr: Language not supported
- src/lavinmqperf/consumer_churn.cr: Language not supported
- src/lavinmqperf/mqtt/throughput.cr: Language not supported
- src/lavinmqperf/perf.cr: Language not supported
- src/lavinmqperf/queue_churn.cr: Language not supported
- src/lavinmqperf/queue_count.cr: Language not supported
- src/lavinmqperf/throughput.cr: Language not supported
There was a problem hiding this comment.
Copilot wasn't able to review any files in this pull request.
Files not reviewed (19)
- src/lavinmqperf.cr: Language not supported
- src/lavinmqperf/amqp/bind_churn.cr: Language not supported
- src/lavinmqperf/amqp/channel_churn.cr: Language not supported
- src/lavinmqperf/amqp/connection_churn.cr: Language not supported
- src/lavinmqperf/amqp/connection_count.cr: Language not supported
- src/lavinmqperf/amqp/consumer_churn.cr: Language not supported
- src/lavinmqperf/amqp/queue_churn.cr: Language not supported
- src/lavinmqperf/amqp/queue_count.cr: Language not supported
- src/lavinmqperf/amqp/throughput.cr: Language not supported
- src/lavinmqperf/bind_churn.cr: Language not supported
- src/lavinmqperf/channel_churn.cr: Language not supported
- src/lavinmqperf/connection_churn.cr: Language not supported
- src/lavinmqperf/connection_count.cr: Language not supported
- src/lavinmqperf/consumer_churn.cr: Language not supported
- src/lavinmqperf/mqtt/throughput.cr: Language not supported
- src/lavinmqperf/perf.cr: Language not supported
- src/lavinmqperf/queue_churn.cr: Language not supported
- src/lavinmqperf/queue_count.cr: Language not supported
- src/lavinmqperf/throughput.cr: Language not supported
snichme
left a comment
There was a problem hiding this comment.
Has anything changed in the amqp perf tools or are they just moved?
they are just moved 👍 |
8d50e03 to
36cd8c9
Compare
|
I got some strange errors after rebasing and adopting execution context, the consuming client would randomly disconnect itself without catching any errors, and LavinMQ would either log "connection reset by peer.." or nothing at all. Me & @spuun looked at this today, and this seems to not happen when I use two separate execution contexts for the consumer and publisher. One execution context should be ok here (?), so we are a bit confused because using one causes the errors (at least to happen more frequently, with 2 ECs we don't see errors even if we keep it going for minutes) |
|
Did you try to use a |
looking at that now, got some bumps with |
|
Ok, yes, that's why i did |
I get this for the amqp throughput thats merged to main as well. If I start lavinmq and lavinmqperf and then stop lavinmq and start it again, lavinmqperf crashes |
hmm scratch that.. now I caught an EOF error even when using I'll still add the waitgroup because thats better than just arbitrarily waiting with a yield, but I don't think it solves the execution context problem |
7fae885 to
28eefb0
Compare
WHAT is this pull request doing?
This PR abstracts LavinMQPerf into AMQP and MQTT modules.
lavinmqperf can now be run for both AMQP and MQTT by specifying the protocol in the cli like this:
e.g.
lavinmqperf mqtt throughputIt adds the throughput test for the MQTT module, largely similar to the AMQP throughput test.
MQTT-specific features:
addresses #971
HOW can this pull request be tested?