-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-54115][CORE][UI] Escalate display ordering priority of connect server operation threads in thread dump page #52816
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…eads first in thread dump page
|
cc @sarutak @LuciferYang @HyukjinKwon, could you please take a look? |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you revise the following section of the PR description? The AS-IS sentences seem to advertise the current Spark executor behavior, @pan3793 .
Why are the changes needed?
Currently, Spark executor displays the task threads first on the thread dump page, this does improve the user experience in troubleshooting "task stuck" issues. There are a lot of similar stuck issues on the driver's side too, e.g. driver may be stuck at the HMS RPC call
|
@dongjoon-hyun, I mean the executor's behavior is good, this PR replicates a similar behavior on the driver side, for the Connect server case. I modified the PR description to explain more, I hope it is clear now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for revisions. I got your intention.
Could you revise it a little more correctly by mentioning the full ordering priority, e.g., task threads > connect operation threads > normal thread ?
I mean the executor's behavior is good, this PR replicates a similar behavior on the driver side, for the Connect server case.
As you mentioned, the current title is misleading because it's not true in some cases.
Note, since it runs in local mode, the task also executes at the driver side, the page display task threads first, then connect operation threads.
For instance, Apache Spark 3.4+ officially provides local mode in the small production jobs as a single-pod job to eliminate the overhead of the executor pod resource allocation and coordination.
In short, I hope the PR title should mention more correctly instead of connect server execution first phrase.
|
@dongjoon-hyun I updated the PR title and description, could you please take another look? |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you.
… server operation threads in thread dump page ### What changes were proposed in this pull request? Escalate display ordering priority of connect server execution threads in the thread dump page by defining a custom `threadInfoOrdering`. For connect server runs in local deploy mode, tasks also run in driver, in driver thread dump page, task threads display first, then connect operation threads. For connect server runs in other deploy modes (YARN, K8s, Standalone), in driver thread dump page, connect operation threads display first. ### Why are the changes needed? Currently, Spark executor displays the task threads first on the thread dump page, this does improve the user experience in troubleshooting "task stuck" issues. There are a lot of similar stuck issues on the driver's side too, e.g. driver may be stuck at the HMS/HDFS RPC call on the query planning phase, displaying the connect operation threads at the top on the driver thread dump pages makes the users easy to diagnose driver stuck issues for the Connect server. ### Does this PR introduce _any_ user-facing change? Yes, it affects the live UI thread dump page. ### How was this patch tested? Add UT. Also manually tested. Start a connect server in local mode, and use a client to run some queries. Note, since it runs in local mode, the task also executes at the driver side, the page display task threads first, then connect operation threads. <img width="1561" height="1038" alt="Xnip2025-10-31_18-14-16" src="https://github.com/user-attachments/assets/0c48412a-c6cd-4422-8126-226888ce2c56" /> ### Was this patch authored or co-authored using generative AI tooling? No. Closes #52816 from pan3793/SPARK-54115. Authored-by: Cheng Pan <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit bdac5cd) Signed-off-by: Dongjoon Hyun <[email protected]>
|
Merged to master/4.1. Thank you again, @pan3793 and @LuciferYang . |
What changes were proposed in this pull request?
Escalate display ordering priority of connect server execution threads in the thread dump page by defining a custom
threadInfoOrdering.For connect server runs in local deploy mode, tasks also run in driver, in driver thread dump page, task threads display first, then connect operation threads.
For connect server runs in other deploy modes (YARN, K8s, Standalone), in driver thread dump page, connect operation threads display first.
Why are the changes needed?
Currently, Spark executor displays the task threads first on the thread dump page, this does improve the user experience in troubleshooting "task stuck" issues.
There are a lot of similar stuck issues on the driver's side too, e.g. driver may be stuck at the HMS/HDFS RPC call on the query planning phase, displaying the connect operation threads at the top on the driver thread dump pages makes the users easy to diagnose driver stuck issues for the Connect server.
Does this PR introduce any user-facing change?
Yes, it affects the live UI thread dump page.
How was this patch tested?
Add UT. Also manually tested.
Start a connect server in local mode, and use a client to run some queries.
Note, since it runs in local mode, the task also executes at the driver side, the page display task threads first, then connect operation threads.
Was this patch authored or co-authored using generative AI tooling?
No.