-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Staging test fixes v1 #21
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add a deleted flag within the ust app session which is raised (with ust app session lock held) at delete, and checked within each RCU traversal, again with ust app session lock held. This takes care of races between teardown of an application (unregister) and execution of commands which are accessing the app session concurrently. Signed-off-by: Mathieu Desnoyers <[email protected]>
We should not report an error when creating a channel if the application is exiting concurrently. Also, remove an inappropriate assert() in ust_app_create_event_glb: it is possible to have a channel lookup fail if channel/event creation occurs concurrently with an application exit. Signed-off-by: Mathieu Desnoyers <[email protected]>
Signed-off-by: Mathieu Desnoyers <[email protected]>
The UST app session has a reference on the consumer output object, but it belongs to the UST session. Implement a refcounting scheme to ensure it is not freed before all users are done using it. Signed-off-by: Mathieu Desnoyers <[email protected]>
Signed-off-by: Mathieu Desnoyers <[email protected]>
The ownership and reference counting of the relay daemon is unclear and buggy in many ways. It is the cause of memory corruptions, double-free, leaks, segmentation faults, observed in various conditions. Fix this situation by introducing a clear ownership and reference counting scheme for this daemon. See doc/relayd-architecture.txt for details. Signed-off-by: Mathieu Desnoyers <[email protected]>
Signed-off-by: Mathieu Desnoyers <[email protected]>
We need to unlock the registry while we push metadata to break a circular dependency between the consumerd metadata lock and the sessiond registry lock. Indeed, pushing metadata to the consumerd awaits that it gets pushed all the way to relayd, but doing so requires grabbing the metadata lock. If a concurrent metadata request is being performed by consumerd, this can try to grab the registry lock on the sessiond while holding the metadata lock on the consumer daemon. Those push and pull schemes are performed on two different bidirectionnal communication sockets. Signed-off-by: Mathieu Desnoyers <[email protected]>
jgalar
pushed a commit
that referenced
this pull request
Apr 23, 2021
Observed issue ============== A dead lock is observed during the start-stop test suite for triggers. Cause ===== A start session action is executed by the action executor, the `cmd_start_trace` function is called and effectively holds the `session_list_lock.`. During `cmd_start_trace` a call to `notification_thread_command_add_channel` is performed to inform the notification thread of the new channel presence. At the same time, a tracer event notification is received by the notification thread. The actions are queued up and the sample of the session id take place and a call to `session_lock_list` is performed and blocks on the lock operation. The notification thread wait on the `session_list_lock` and the `session_list_lock` holder, the action executor, waits on the completion of a command the be run by the notification thread: deadlock. The backtrace: Thread 6 (Thread 0x7f831c8a6700 (LWP 3046458)): #0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38 #1 0x000000000053b852 in futex (uaddr=0x7f831c8a45e0, op=0, val=0, timeout=0x0, uaddr2=0x0, val3=0) at /home/joraj/lttng/master/install/include/urcu/futex.h:72 #2 0x000000000053b4f9 in futex_noasync (uaddr=0x7f831c8a45e0, op=0, val=0, timeout=0x0, uaddr2=0x0, val3=0) at /home/joraj/lttng/master/install/include/urcu/futex.h:81 #3 0x000000000053af10 in lttng_waiter_wait (waiter=0x7f831c8a45d8) at waiter.c:55 #4 0x000000000046b0f2 in run_command_wait (handle=0xe60520, cmd=0x7f831c8a4588) at notification-thread-commands.c:49 #5 0x000000000046b270 in notification_thread_command_add_channel (handle=0xe60520, session_name=0x7f8300006c30 "my_triggered_session", uid=1000, gid=1000, channel_name=0x7f82dc00be04 "channel0", key=1, domain=LTTNG_DOMAIN_UST, capacity=2097152) at notification-thread-commands.c:184 #6 0x00000000004c7f65 in create_channel_per_uid (app=0x7f82d8000bf0, usess=0x7f8300000bb0, ua_sess=0x7f82dc002600, ua_chan=0x7f82dc00bde0) at ust-app.c:3360 #7 0x00000000004c6f98 in ust_app_channel_send (app=0x7f82d8000bf0, usess=0x7f8300000bb0, ua_sess=0x7f82dc002600, ua_chan=0x7f82dc00bde0) at ust-app.c:3514 #8 0x00000000004c6bde in ust_app_channel_create (usess=0x7f8300000bb0, ua_sess=0x7f82dc002600, uchan=0x7f8300005a90, app=0x7f82d8000bf0, _ua_chan=0x7f831c8a48b0) at ust-app.c:4771 #9 0x00000000004c6968 in find_or_create_ust_app_channel (usess=0x7f8300000bb0, ua_sess=0x7f82dc002600, app=0x7f82d8000bf0, uchan=0x7f8300005a90, ua_chan=0x7f831c8a48b0) at ust-app.c:5610 #10 0x00000000004c4f09 in ust_app_synchronize_all_channels (usess=0x7f8300000bb0, ua_sess=0x7f82dc002600, app=0x7f82d8000bf0) at ust-app.c:5820 #11 0x00000000004b958c in ust_app_synchronize (usess=0x7f8300000bb0, app=0x7f82d8000bf0) at ust-app.c:5886 #12 0x00000000004b8500 in ust_app_global_update (usess=0x7f8300000bb0, app=0x7f82d8000bf0) at ust-app.c:5960 #13 0x00000000004b7ec2 in ust_app_start_trace_all (usess=0x7f8300000bb0) at ust-app.c:5520 #14 0x0000000000444e86 in cmd_start_trace (session=0x7f8300006c30) at cmd.c:2707 #15 0x00000000004a5af9 in action_executor_start_session_handler (executor=0x7f8314004410, work_item=0x7f8314005100, item=0x7f83140050b0) at action-executor.c:342 #16 0x00000000004a537f in action_executor_generic_handler (executor=0x7f8314004410, work_item=0x7f8314005100, item=0x7f83140050b0) at action-executor.c:696 #17 0x00000000004a4dbc in action_work_item_execute (executor=0x7f8314004410, work_item=0x7f8314005100) at action-executor.c:715 #18 0x00000000004a37e6 in action_executor_thread (_data=0x7f8314004410) at action-executor.c:797 #19 0x0000000000486193 in launch_thread (data=0x7f83140044b0) at thread.c:66 #20 0x00007f8320b60609 in start_thread (arg=<optimized out>) at pthread_create.c:477 #21 0x00007f8320a87293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 Thread 5 (Thread 0x7f831d0a7700 (LWP 3046457)): #0 __lll_lock_wait (futex=futex@entry=0x5e1c10 <ltt_session_list>, private=0) at lowlevellock.c:52 #1 0x00007f8320b630a3 in __GI___pthread_mutex_lock (mutex=0x5e1c10 <ltt_session_list>) at ../nptl/pthread_mutex_lock.c:80 #2 0x00000000004378c3 in session_lock_list () at session.c:156 #3 0x00000000004a871c in add_action_to_subitem_array (action=0x7f830001a730, subitems=0x7f83140051d0) at action-executor.c:1081 #4 0x00000000004a8578 in add_action_to_subitem_array (action=0x7f830001a620, subitems=0x7f83140051d0) at action-executor.c:1025 #5 0x00000000004a4922 in populate_subitem_array_from_trigger (trigger=0x7f830001a950, subitems=0x7f83140051d0) at action-executor.c:1116 #6 0x00000000004a416e in action_executor_enqueue_trigger (executor=0x7f8314004410, trigger=0x7f830001a950, evaluation=0x7f8314005190, object_creds=0x0, client_list=0x7f8314004980) at action-executor.c:924 #7 0x0000000000479481 in dispatch_one_event_notifier_notification (state=0x7f831d0a63e8, notification=0x7f8314005160) at notification-thread-events.c:4613 #8 0x0000000000472324 in handle_one_event_notifier_notification (state=0x7f831d0a63e8, pipe=65, domain=LTTNG_DOMAIN_UST) at notification-thread-events.c:4702 #9 0x0000000000472271 in handle_notification_thread_event_notification (state=0x7f831d0a63e8, pipe=65, domain=LTTNG_DOMAIN_UST) at notification-thread-events.c:4717 #10 0x00000000004695a3 in handle_event_notification_pipe (event_source_fd=65, domain=LTTNG_DOMAIN_UST, revents=1, state=0x7f831d0a63e8) at notification-thread.c:591 #11 0x000000000046849b in thread_notification (data=0xe60520) at notification-thread.c:727 #12 0x0000000000486193 in launch_thread (data=0xe60610) at thread.c:66 #13 0x00007f8320b60609 in start_thread (arg=<optimized out>) at pthread_create.c:477 #14 0x00007f8320a87293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 Solution ======== Instead of using session_find_by_name() which requires the `session_list_lock`, we introduce `sample_session_id_by_name` that uses a urcu backed data structure. This allows the sampling of the session id without holding the session list lock. We accept the small window where a session object is still accessible but concretely not valid since the actual execution context will be validated at the moment of execution. The execution side already handles the possibility that the session is removed at that point or is not the same session. The execution side acquires the session_list_lock for validation. Known drawbacks ========= None Signed-off-by: Jonathan Rajotte <[email protected]> Signed-off-by: Jérémie Galarneau <[email protected]> Change-Id: I5ad2c57acc0d03d2814dda59f8ecf2d831fd961e
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.