Skip to content

feat(diagnostics): Add thread dump capturing to diagnostics endpoint#639

Merged
andrewazores merged 3 commits intocryostatio:mainfrom
Josh-Matsuoka:thread-dumps
Aug 28, 2025
Merged

feat(diagnostics): Add thread dump capturing to diagnostics endpoint#639
andrewazores merged 3 commits intocryostatio:mainfrom
Josh-Matsuoka:thread-dumps

Conversation

@Josh-Matsuoka
Copy link
Copy Markdown
Contributor

Related to cryostatio/cryostat#135

Adds the ability to capture thread dumps to the agent's mbean-invoke endpoint. The original implementation of the endpoint is already fairly flexible, essentially acting as a wrapper around the mbean.invoke method, so we just need to extend the check for valid/supported operations, and add a further check if we're doing a thread dump so it can be written to the response body as-is.

Currently supports the threadPrint operation but leaves an opening for the threadDumpToFile operation to be supported later if necessary. Note that the format of threadDumpToFile isn't supported by existing support tooling so it won't work for the purposes of the linked issue, only threadPrint.

Opening as Draft until the other pieces of the feature are in place.

@Josh-Matsuoka Josh-Matsuoka marked this pull request as ready for review June 2, 2025 20:56
@andrewazores
Copy link
Copy Markdown
Member

Dumps via JMX look like this:

2025-07-31 18:13:05
Full thread dump OpenJDK 64-Bit Server VM (21.0.7+6-LTS mixed mode, sharing):

Threads class SMR info:
_java_thread_list=0x00007f5e90005fc0, length=61, elements={
0x00007f5f4402b760, 0x00007f5f440b0e50, 0x00007f5f440b2380, 0x00007f5f440b3ad0,
0x00007f5f440b50d0, 0x00007f5f440b6630, 0x00007f5f440b8130, 0x00007f5f440b97b0,
0x00007f5f440c9b50, 0x00007f5f4416d1a0, 0x00007f5f444741a0, 0x00007f5ea4001660,
0x00007f5f44440320, 0x00007f5f44554e20, 0x00007f5f445e58c0, 0x00007f5f445f6650,
0x00007f5f445fe7d0, 0x00007f5f46851310, 0x00007f5f46a274a0, 0x00007f5f46a28320,
0x00007f5f46ba93d0, 0x00007f5e7c0f8840, 0x00007f5f46d8f630, 0x00007f5f46d95bf0,
0x00007f5f46d96ac0, 0x00007f5f46d97b90, 0x00007f5f46d98d40, 0x00007f5f46d99f30,
0x00007f5f46d9b470, 0x00007f5f46d9c9b0, 0x00007f5f46d9dde0, 0x00007f5f46d9f210,
0x00007f5f46dbb670, 0x00007f5f46e27730, 0x00007f5f46e2a070, 0x00007f5f472a8d50,
0x00007f5f472b6a50, 0x00007f5e3835b800, 0x00007f5f4778b840, 0x00007f5f4778cd40,
0x00007f5e840a30f0, 0x00007f5e840c76e0, 0x00007f5e280ed060, 0x00007f5e801bc940,
0x00007f5e802188e0, 0x00007f5e8021a910, 0x00007f5e8032a2b0, 0x00007f5e043a5db0,
0x00007f5e783044a0, 0x00007f5e84131390, 0x00007f5e780054d0, 0x00007f5e8414cef0,
0x00007f5de4009e80, 0x00007f5de400e250, 0x00007f5e90003ea0, 0x00007f5e90006830,
0x00007f5e140bc740, 0x00007f5dd0006d60, 0x00007f5dd0007d90, 0x00007f5e90005000,
0x00007f5e90007b70
}

"main" #1 [222] prio=5 os_prio=0 cpu=1779.51ms elapsed=609.91s tid=0x00007f5f4402b760 nid=222 waiting on condition  [0x00007f5f4b156000]
   java.lang.Thread.State: WAITING (parking)
	at jdk.internal.misc.Unsafe.park(java.base@21.0.7/Native Method)
...

but dumps via Agent HTTP look like this:

"2025-07-31 18:31:23\nFull thread dump OpenJDK 64-Bit Server VM (17.0.16+8-LTS mixed mode, sharing):\n\nThreads class SMR info:\n_java_thread_list=0x00007f94d401a5f0, length=62, elements={\n0x00007fa3a802b8f0, 0x00007fa3a8380b60, 0x00007fa3a8381f90, 0x00007fa3a838d450,\n0x00007fa3a838e7d0, 0x00007fa3a838fbb0, 0x00007fa3a8391510, 0x00007fa3a8392a10,\n0x00007fa3a8393e50, 0x00007fa3a83b2b10, 0x00007fa3a8521780, 0x00007fa3a8576270,\n0x00007fa3a8661ab0, 0x00007fa3a8665430, 0x00007f96303e8a60, 0x00007f96303f1e50,\n0x00007f96303f4990, 0x00007f95ac002e70, 0x00007f95ac00d4e0, 0x00007fa3a8dc1980,\n0x0000

as a single line, so I suspect that the objectMapper is causing this to get serialized in a wrong way.

@andrewazores
Copy link
Copy Markdown
Member

@Josh-Matsuoka ping - any idea on the issue above?

@Josh-Matsuoka
Copy link
Copy Markdown
Contributor Author

@andrewazores

This is indeed casued by the ObjectMapper, it serializes the thread dump string into Json which escapes the newlines and other special characters creating that output. We could go back to the old method of using a separate branch and not using the ObjectMapper but the cleanest solution I think is to add handling in the server to handle the response and deserialize it back.

I've updated the server side here to fix it:

cryostatio/cryostat@ccbea34

@andrewazores
Copy link
Copy Markdown
Member

Nice, that works.

@andrewazores
Copy link
Copy Markdown
Member

I'm not sure if this is a problem on the server side, or if it's because of the data the Agent is sending back, but now when I try to request a thread dump on an http:// Agent instance, it eventually times out and I see the following in the server logs:

cryostat-1                | 2025-08-28 19:20:45,173 INFO  [org.jbo.res.rea.cli.log.DefaultClientLogger] (vert.x-eventloop-thread-0) Response: POST https://quarkus-cryostat-agent:9977/mbean-invoke/, Status[200 OK], Headers[Date=Thu, 28 Aug 2025 19:20:45 GMT Transfer-encoding=chunked], Body:
cryostat-1                | "2025-08-28 19:20:45\nFull thread dump OpenJDK 64-Bit Server VM (17.0.16+8-LTS mixed mode, sharing):
cryostat-1                | 2025-08-28 19:20:45,173 WARN  [io.cry.tar.TargetConnectionManager] (executor-thread-12) com.fasterxml.jackson.core.JsonParseException: Unrecognized token 'org': was expecting (JSON String, Number, Array, Object or token 'null', 'true' or 'false')
cryostat-1                |  at [Source: REDACTED (`StreamReadFeature.INCLUDE_SOURCE_IN_LOCATION` disabled); line: 1, column: 4]

cryostat-1                | 2025-08-28 19:21:14,108 WARN  [io.cry.rec.LongRunningRequestGenerator] (executor-thread-8) Failed to dump threads
cryostat-1                | 2025-08-28 19:21:14,108 ERROR [io.qua.ver.cor.run.VertxCoreRecorder] (vert.x-eventloop-thread-0) Uncaught exception received by Vert.x: java.util.concurrent.CompletionException: io.smallrye.mutiny.TimeoutException
cryostat-1                |     at io.cryostat.recordings.LongRunningRequestGenerator.onMessage(LongRunningRequestGenerator.java:120)
cryostat-1                |     at io.cryostat.recordings.LongRunningRequestGenerator_Subclass.onMessage$$superforward(Unknown Source)
cryostat-1                |     at io.cryostat.recordings.LongRunningRequestGenerator_Subclass$$function$$2.apply(Unknown Source)
cryostat-1                |     at io.quarkus.arc.impl.AroundInvokeInvocationContext.proceed(AroundInvokeInvocationContext.java:73)
cryostat-1                |     at io.quarkus.arc.impl.AroundInvokeInvocationContext.proceed(AroundInvokeInvocationContext.java:62)
cryostat-1                |     at io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorBase.invokeInOurTx(TransactionalInterceptorBase.java:136)
cryostat-1                |     at io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorBase.invokeInOurTx(TransactionalInterceptorBase.java:107)
cryostat-1                |     at io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorRequired.doIntercept(TransactionalInterceptorRequired.java:38)
cryostat-1                |     at io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorBase.intercept(TransactionalInterceptorBase.java:61)
cryostat-1                |     at io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorRequired.intercept(TransactionalInterceptorRequired.java:32)
cryostat-1                |     at io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorRequired_Bean.intercept(Unknown Source)
cryostat-1                |     at io.quarkus.arc.impl.InterceptorInvocation.invoke(InterceptorInvocation.java:42)
cryostat-1                |     at io.quarkus.arc.impl.AroundInvokeInvocationContext.perform(AroundInvokeInvocationContext.java:30)
cryostat-1                |     at io.quarkus.arc.impl.InvocationContexts.performAroundInvoke(InvocationContexts.java:27)
cryostat-1                |     at io.cryostat.recordings.LongRunningRequestGenerator_Subclass.onMessage(Unknown Source)
cryostat-1                |     at io.cryostat.recordings.LongRunningRequestGenerator_ClientProxy.onMessage(Unknown Source)
cryostat-1                |     at io.cryostat.recordings.LongRunningRequestGenerator_onMessage_Invoker_fzfV1ef26asmIvObqHSxwDcj6NY.invoke(Unknown Source)
cryostat-1                |     at io.cryostat.recordings.LongRunningRequestGenerator_onMessage_LazyInvoker_fzfV1ef26asmIvObqHSxwDcj6NY.invoke(Unknown Source)
cryostat-1                |     at io.quarkus.vertx.runtime.EventConsumerInvoker.invokeBean(EventConsumerInvoker.java:79)
cryostat-1                |     at io.quarkus.vertx.runtime.EventConsumerInvoker.invoke(EventConsumerInvoker.java:51)
cryostat-1                |     at io.quarkus.vertx.runtime.VertxEventBusConsumerRecorder$3$1$2.call(VertxEventBusConsumerRecorder.java:150)
cryostat-1                |     at io.quarkus.vertx.runtime.VertxEventBusConsumerRecorder$3$1$2.call(VertxEventBusConsumerRecorder.java:146)
cryostat-1                |     at io.vertx.core.impl.ContextImpl.lambda$executeBlocking$4(ContextImpl.java:192)
cryostat-1                |     at io.vertx.core.impl.ContextInternal.dispatch(ContextInternal.java:270)
cryostat-1                |     at io.vertx.core.impl.ContextImpl$1.execute(ContextImpl.java:221)
cryostat-1                |     at io.vertx.core.impl.WorkerTask.run(WorkerTask.java:56)
cryostat-1                |     at io.quarkus.vertx.core.runtime.VertxCoreRecorder$15.runWith(VertxCoreRecorder.java:645)
cryostat-1                |     at org.jboss.threads.EnhancedQueueExecutor$Task.doRunWith(EnhancedQueueExecutor.java:2651)
cryostat-1                |     at org.jboss.threads.EnhancedQueueExecutor$Task.run(EnhancedQueueExecutor.java:2630)
cryostat-1                |     at org.jboss.threads.EnhancedQueueExecutor.runThreadBody(EnhancedQueueExecutor.java:1622)
cryostat-1                |     at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1589)
cryostat-1                |     at org.jboss.threads.DelegatingRunnable.run(DelegatingRunnable.java:11)
cryostat-1                |     at org.jboss.threads.ThreadLocalResettingRunnable.run(ThreadLocalResettingRunnable.java:11)
cryostat-1                |     at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
cryostat-1                |     at java.base/java.lang.Thread.run(Thread.java:1583)
cryostat-1                | Caused by: io.smallrye.mutiny.TimeoutException
cryostat-1                |     at io.smallrye.mutiny.operators.uni.UniBlockingAwait.await(UniBlockingAwait.java:65)
cryostat-1                |     at io.smallrye.mutiny.groups.UniAwait.atMost(UniAwait.java:65)
cryostat-1                |     at io.cryostat.targets.TargetConnectionManager.executeConnectedTask(TargetConnectionManager.java:222)
cryostat-1                |     at io.cryostat.targets.TargetConnectionManager.executeConnectedTask(TargetConnectionManager.java:218)
cryostat-1                |     at io.cryostat.targets.TargetConnectionManager_ClientProxy.executeConnectedTask(Unknown Source)
cryostat-1                |     at io.cryostat.diagnostic.DiagnosticsHelper.dumpThreads(DiagnosticsHelper.java:101)
cryostat-1                |     at io.cryostat.diagnostic.DiagnosticsHelper_ClientProxy.dumpThreads(Unknown Source)
cryostat-1                |     at io.cryostat.recordings.LongRunningRequestGenerator.onMessage(LongRunningRequestGenerator.java:103)
cryostat-1                |     ... 34 more
cryostat-1                | 

@Josh-Matsuoka
Copy link
Copy Markdown
Contributor Author

This was caused by the rebase of the server side, it changed the type of Response that gets returned to invokeMbean, there was an extra step needed to get the json body. Fixed now.

@andrewazores andrewazores merged commit c930865 into cryostatio:main Aug 28, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feat New feature or request safe-to-test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants