Skip to content

Commit ed41e57

Browse files
PSRCodejgalar
authored andcommitted
Fix: sessiond: assert on lttng_ht_add_unique_str on ltt_sessions_ht_by_name
Observed issue ============== The lttng-sessiond asserts with the following backtrace on lttng create: #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x00007ffff7ab5859 in __GI_abort () at abort.c:79 #2 0x00007ffff7ab5729 in __assert_fail_base (fmt=0x7ffff7c4b588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x5555556ab0a6 "node_ptr == &node->node", file=0x5555556ab085 "hashtable.c", line=298, function=<optimized out>) at a #3 0x00007ffff7ac6f36 in __GI___assert_fail (assertion=assertion@entry=0x5555556ab0a6 "node_ptr == &node->node", file=file@entry=0x5555556ab085 "hashtable.c", line=line@entry=298, function=function@entry=0x5555556ab380 <__PRETTY_FUNCTIO #4 0x000055555560be44 in lttng_ht_add_unique_str (ht=<optimized out>, node=0x7fffe0026c58) at hashtable.c:298 #5 0x000055555558fb6a in add_session_ht (ls=0x7fffe0024970) at session.c:372 #6 session_create (name=<optimized out>, uid=1000, gid=1000, out_session=out_session@entry=0x7fffedfddbd8) at session.c:1308 #7 0x000055555559b219 in cmd_create_session_from_descriptor (creds=<optimized out>, creds=<optimized out>, home_path=<optimized out>, descriptor=<optimized out>) at cmd.c:3040 #8 cmd_create_session (cmd_ctx=cmd_ctx@entry=0x7fffedfe5fa0, sock=<optimized out>, return_descriptor=return_descriptor@entry=0x7fffedfddd68) at cmd.c:3176 lttng#9 0x00005555555cc341 in process_client_msg (sock_error=0x7fffedfddd10, sock=0x7fffedfddd0c, cmd_ctx=0x7fffedfe5fa0) at client.c:2177 lttng#10 thread_manage_clients (data=<optimized out>) at client.c:2742 lttng#11 0x00005555555c5fff in launch_thread (data=0x55555571b780) at thread.c:66 lttng#12 0x00007ffff7c8b609 in start_thread (arg=<optimized out>) at pthread_create.c:477 lttng#13 0x00007ffff7bb2293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 The issue can be reproduced with modifications to the rotation thread code and the following scenario: $ lttng create test $ lttng enable-event -u -a $ lttng start run any app just so that we have a complete valid session. (might not be necessary) $ lttng destroy --no-wait $ lttng create test ^ Should assert here. The diff to be applied: diff --git a/src/bin/lttng-sessiond/rotation-thread.cpp b/src/bin/lttng-sessiond/rotation-thread.cpp index ac149c845..c11f068ed 100644 --- a/src/bin/lttng-sessiond/rotation-thread.cpp +++ b/src/bin/lttng-sessiond/rotation-thread.cpp @@ -565,6 +565,8 @@ int handle_job_queue(struct rotation_thread_handle *handle, { int ret = 0; + sleep(5); + for (;;) { struct ltt_session *session; struct rotation_thread_job *job; Note that the initial report for this issue was on a system under load for which the `lttng destroy` completion check failed and a `lttng create` was performed. As of today the exact reason why the completion check failed is not known. Still we can "fix" the race leading to the lttng-sessiond assertion considering a user might use the `--no-wait` variant of `lttng destroy` and could easily end up in this situation. Cause ===== Note: all `lttng create` commands have the same session name passed as argument. On `lttng destroy` the ltt_session object is flagged as destroyed (ltt_session::destroyed). The removal of the object from the hash table (ltt_sessions_ht_by_name) will be performed during the `session_release` which is driven by the session refcount. A reference on the `ltt_session` object is held for the rotation initiated by the `lttng destroy` command. The rotation is enqueued by the rotation thread. At this point the system is busy and the rotation thread does not run. We simulate this with a `sleep(5)` during the `handle_job_queue`. The `lttng destroy --no-wait` returns. If the `--no-wait` option is not passed the `lttng destroy` command will work as expected and wait for completion. We can SIGINT the `lttng destroy` command and perform a `lttng create` yielding the same backtrace. On `lttng create`, `session_create` validates that the name does not conflict with an existing session using `session_find_by_name`. It is important to note that `session_find_by_name` discriminates also on the `session->destroyed` flag (introduced by [1]). The `ltt_sessions_ht_by_name` hash table was introduced by [2] to remove the need to lock the session list to sample a session id during the queueing of actions to be executed related to a trigger. The assumption was made that, during the creation phase, the session would always be unique in that hash table based on its name. This is simply not true since multiple sessions with the same name can coexist as long as only a single one is marked as "not destroyed". This is an important concept due to the refcounting of the object and the feature relying on the lifetime of the object (i.e rotation). This is mostly valid when talking about the global session list. Solution ======== Move the hash table removal earlier during the release of the session object. Move the removal from `del_session_ht`, which is done during the `session_release` function, to the `lttng_session_destroy` function. It is safe to do so since currently the only user of that hash table (the action executor) does not care much about destroyed session at that point. This ensures that we maintain the uniqueness property of the key (name) for that hash table on insertion. The alternative was to expose an hash table that could contain duplicates and force the handling of a set on all lookups. Known drawbacks ========= None. References ========== [1] https://git.lttng.org/?p=lttng-tools.git;a=commit;h=e32d7f274604b77bcd83c24994e88df3761ed658 [2] https://git.lttng.org/?p=lttng-tools.git;a=commit;h=e1bbf98908a6399f39a9a8bc95bd8b59cecaa816 Signed-off-by: Jonathan Rajotte <[email protected]> Signed-off-by: Jérémie Galarneau <[email protected]> Change-Id: I2f1d0d6c04ee7210166e9847a850afbe6eaa7609
1 parent a49a921 commit ed41e57

File tree

1 file changed

+16
-5
lines changed

1 file changed

+16
-5
lines changed

src/bin/lttng-sessiond/session.cpp

+16-5
Original file line numberDiff line numberDiff line change
@@ -398,7 +398,7 @@ static int ltt_sessions_ht_empty(void)
398398
}
399399

400400
/*
401-
* Remove a ltt_session from the ltt_sessions_ht_by_id/name.
401+
* Remove a ltt_session from the ltt_sessions_ht_by_id.
402402
* If empty, the ltt_sessions_ht_by_id/name HTs are freed.
403403
* The session list lock must be held.
404404
*/
@@ -415,10 +415,6 @@ static void del_session_ht(struct ltt_session *ls)
415415
ret = lttng_ht_del(ltt_sessions_ht_by_id, &iter);
416416
LTTNG_ASSERT(!ret);
417417

418-
iter.iter.node = &ls->node_by_name.node;
419-
ret = lttng_ht_del(ltt_sessions_ht_by_name, &iter);
420-
LTTNG_ASSERT(!ret);
421-
422418
if (ltt_sessions_ht_empty()) {
423419
DBG("Empty ltt_sessions_ht_by_id/name, destroying hast tables");
424420
ltt_sessions_ht_destroy();
@@ -1065,8 +1061,23 @@ void session_put(struct ltt_session *session)
10651061
*/
10661062
void session_destroy(struct ltt_session *session)
10671063
{
1064+
int ret;
1065+
struct lttng_ht_iter iter;
1066+
10681067
LTTNG_ASSERT(!session->destroyed);
10691068
session->destroyed = true;
1069+
1070+
/*
1071+
* Remove immediately from the "session by name" hash table. Only one
1072+
* session is expected to exist with a given name for at any given time.
1073+
*
1074+
* Even if a session still technically exists for a little while longer,
1075+
* there is no point in performing action on a "destroyed" session.
1076+
*/
1077+
iter.iter.node = &session->node_by_name.node;
1078+
ret = lttng_ht_del(ltt_sessions_ht_by_name, &iter);
1079+
LTTNG_ASSERT(!ret);
1080+
10701081
session_put(session);
10711082
}
10721083

0 commit comments

Comments
 (0)