You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fix: consumer: snapshot: assertion on subsequent snapshot
Observed issue
==============
While a snapshot is being taken, the containing folder can disappear
unexpectedly. This can lead to the following errors, which are expected
and mostly handled fine:
PERROR - 14:47:32.002564464 [2922498/2922507]: Failed to open file relative to trace chunk file_path = "channel0_0", flags = 577, mode = 432: No such file or directory (in _lttng_trace_chunk_open_fs_handle_locked() at trace-chunk.cpp:1411)
Error: Failed to open stream file "channel0_0"
Error: Snapshot channel failed
The problem happens on the subsequent snapshot for the session:
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007fbbdadb3859 in __GI_abort () at abort.c:79
#2 0x00007fbbdadb3729 in __assert_fail_base (fmt=0x7fbbdaf49588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x55c4212cfbb5 "!stream->trace_chunk", file=0x55c4212cf820 "kernel-co
#3 0x00007fbbdadc5006 in __GI___assert_fail (assertion=0x55c4212cfbb5 "!stream->trace_chunk", file=0x55c4212cf820 "kernel-consumer/kernel-consumer.cpp", line=188, function=0x55c4212cfb00 "
#4 0x000055c421268cc6 in lttng_kconsumer_snapshot_channel (channel=0x7fbbc4000b60, key=1, path=0x7fbbd37f8fd4 "", relayd_id=18446744073709551615, nb_packets_per_stream=0) at kernel-consume
#5 0x000055c42126b39d in lttng_kconsumer_recv_cmd (ctx=0x55c421b80a90, sock=31, consumer_sockpoll=0x7fbbd37fd280) at kernel-consumer/kernel-consumer.cpp:986
#6 0x000055c4212546d1 in lttng_consumer_recv_cmd (ctx=0x55c421b80a90, sock=31, consumer_sockpoll=0x7fbbd37fd280) at consumer/consumer.cpp:2090
#7 0x000055c421259963 in consumer_thread_sessiond_poll (data=0x55c421b80a90) at consumer/consumer.cpp:3281
#8 0x00007fbbdaf8b609 in start_thread (arg=<optimized out>) at pthread_create.c:477
lttng#9 0x00007fbbdaeb0163 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
How to reproduce:
1. Setting a breakpoint on snapshot_channel() inside
src/common/ust-consumer/ust-consumer.cpp
2. When the breakpoint hits, remove the the complete lttng directory
containing the session data.
3. Continue the lttng_consumerd process from gdb.
4. In that case you see a negative return value -1 from
consumer_stream_create_output_files() inside snapshot_channel().
5. Take another snapshot and lttng_consumerd crashes because
of the `assert(!stream->trace_chunk)` in snapshot_channel().
This last action does not require any breakpoint intervention.
Cause
=====
During the snapshot, the stream is assigned the channel current chunk.
It is expected that the stream does not have a chunk at this point.
The error handling is faulty here, the stream chunk must be
invalidated/reset on error to allow its reuse later on.
The problem exists for both consumer domains (user/kernel).
Solution
========
For the ust consumer, we can directly use the `error_close_stream`
label.
For the kernel consumer, the code path is slightly different since it
does not uses `consumer_stream_close`. Note that `consumer_stream_close`
cannot be used as is for the kernel consumer. The current implementation
partially resembles `consumer_stream_close` at the end of the iteration.
It is extracted to its own function for easier reuse from the new
`error_finalize_stream` label.
Known drawbacks
=========
None.
Fixes: #1352
Signed-off-by: Marcel Hamer <[email protected]>
Signed-off-by: Jonathan Rajotte <[email protected]>
Signed-off-by: Jérémie Galarneau <[email protected]>
Change-Id: I9fc81917b19aa436ed8e8679672648f2d5baf41a
0 commit comments