-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-38550][SQL][CORE] Use a disk-based store to save more debug information for live UI #35856
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
core/src/main/scala/org/apache/spark/internal/config/Status.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use this even when app status fits in the memory, can't we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have multiple configurations under the prefix, spark.appStatusStore.diskStore.?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for now, no other config starts with spark.appStatusStore.diskStore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is no other config, Apache Spark community's configuration naming guide is not to introduce a namespace by removing .. In this case,
- spark.appStatusStore.diskStore.dir
+ spark.appStatusStore.diskStoreDir
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. thanks for the guide
core/src/main/scala/org/apache/spark/internal/config/Status.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this should be 3.4.0 because Today is the feature freeze date for Apache Spark 3.3.0 and this PR arrives a little late for review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dongjoon-hyun thanks for the review!
I'm wondering if it's possible to include this in 3.3.0. Here are my two cents:
- the community doesn't pay attention to Spark's troubleshooting ability for a while. If we can deliver this feature earlier, it could give a signal that the community starts to improve the debuggability and it can attract others to contribute (earlier).
- I know the timing is not good. but as you may see, this PR aims to reduce the impact on the driver and introduces useful features (i.e. show AQE plan changes). Such as separate listener, separate event queue, disk store instead of memory, rest API instead of UI. I know you have concerns about the disk space. I think it's something we can resolve.
Hence, I think it's worth considering.
core/src/main/scala/org/apache/spark/internal/config/Status.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/diagnostic/DiagnosticStore.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add some class description, please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a dedicated queue for this listener?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, because collecting/saving the diagnostics could be slow, I don't want it to impact other critical listeners, e.g. UI listener
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to sqlDiagnostics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mridulm
I put this listener in the SQL folder because I need to capture some SQL events. But in the future, this listener can capture other events to provide diagnostics for other components (e.g. executor). So I think a general name may be better.
But sqlDiagnostics is also fine to me.
sql/core/src/main/scala/org/apache/spark/sql/diagnostic/DiagnosticListener.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/diagnostic/DiagnosticListener.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know the required disk size with Int.MaxValue? It could kill driver pod due to OutOfDisk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point. It might be dangerous to get the untruncated plan description of a very large plan, but it's also hard to estimate the upper bound because theoretically, there is no limitation of the query plan size.
How about we add a flag to control plan truncation for the disk store? for example: spark.appStatusStore.diskStore.saveUntruncatedPlan=true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried very large plans locally with a 2GB memory spark driver. It turns out the Spark itself will OOM far before the query plan string becomes too large (~10MB). In addition, there is a retained number for diagnostic data (1000 by default), so the memory/disk consumption should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with @dongjoon-hyun , we should impose a limit here.
@shardulm94 can comment more on his observations with something similar in terms of increase in cost - he had done a streaming serialization implementation to get around the issue (he was writing to hdfs, so the solution directly wont apply here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, currently, all disk stores are broken in Apple Silicon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for letting me know. If the failure happens during initialization, I think we are safe here.
core/src/main/scala/org/apache/spark/status/AppStatusStore.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to add this in this PR, REST API should be documented here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do
sql/core/src/main/scala/org/apache/spark/status/api/v1/sql/ApiSqlRootResource.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @linhongliu-db . Thank you for making a PR. I left some comments.
I'd like to recommend you to consider this as Apache Spark 3.4 feature.
|
hi @linhongliu-db , thank you for the feature ! I wonder, if is it possible to give a more accuracy time of the plan changes. In particular, the phase of re-optimize and the query stage optimization in AQE. The We have encountered some performance issues, like SPARK-38406, SPARK-38401. I see the scope may be out of this pr. It would be very helpful if we can show more details. And as you have mentioned, It is friendly if we can also put some summary into UI. |
|
+CC @shardulm94, @thejdeep - since you worked on something similar recently. |
0a0f1ca to
0af72d1
Compare
|
@dongjoon-hyun, I addressed all the comments, could you please review this PR one more time? I also changed the version number to 3.4 |
|
@ulysses-you sure, I'll consider it. But in this PR, I'd like to minimize the changes in catalyst or execution in order to make the PR easy to review. Once it's accepted, we can keep improving or adding more diagnostic information |
|
Thank you for your update, @linhongliu-db . |
|
@dongjoon-hyun, just a soft ping. Do we have anything else to do to move this PR forward? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have enabled diskstore, thoughts on using it for everything at driver ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for now, we couldn't. The SQL UI needs to maintain the task metrics in memory in order to render UI quickly. But in the future, I think we can build a 2-layer store.
docs/monitoring.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this under sql ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about:
/applications/[app-id]/diagnostics/***sql***/[execution-id]
Then, in the future, we can have:
diagnostics/executor/
diagnostics/streaming/
diagnostics/environment/
diganostics/xxx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with @dongjoon-hyun , we should impose a limit here.
@shardulm94 can comment more on his observations with something similar in terms of increase in cost - he had done a streaming serialization implementation to get around the issue (he was writing to hdfs, so the solution directly wont apply here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to sqlDiagnostics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: ${MAX_TO_STRING_FIELDS.key}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Int.MaxValue -> event.qe.sparkSession.sessionState.conf.maxToStringFieldsForDiagnostic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should use event.qe.sparkSession.sessionState.conf rather than SQLConf.get since the listener is in other thread
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
event.physicalPlanDescription use the old maxToStringFields to do explain string, do you want to the same thing that re-explain it ?
just a small concern, there is a cost to do the explain if the plan is large but I do not have a good idea without re-explain it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not worried about the cost too much because a separate listener with a separate event queue won't slow down other listeners.
But I do want to make sure everything we added is necessary. I mean, usually, it's enough to only output full fields in the final plan, and plan change history can keep a truncated plan. Because, IIUC, the AQE aims to change the operators and expressions usually unchanged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and if we need more fields in the plan change history, it's always easy to add more than removing something.
faae926 to
d0b6123
Compare
|
Here is an update from
After the following commits, now RocksDB can be used in all OSes for this PR.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to share the same kvStore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dongjoon-hyun
I used the same way that the in-memory kvStore used. which is:
- define a shared kvStore in SparkContext
- share the kvStore with the listeners that need to update the store (Jobs UI Listener, SQL UI Listener)
- share the kvStore with the components that need to read the store (web UI, rest API)
|
cc @cloud-fan to review as well. |
b072aef to
b2efff6
Compare
@dongjoon-hyun I updated the code based on the latest master. Thanks for the comments. |
| import org.apache.spark.sql.internal.StaticSQLConf.UI_RETAINED_EXECUTIONS | ||
| import org.apache.spark.status.{ElementTrackingStore, KVUtils} | ||
|
|
||
| class DiagnosticListener( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add some classdoc?
|
LGTM, @dongjoon-hyun do you have more comments? |
|
Sorry for the delay, @linhongliu-db and @cloud-fan . I dismissed my previous review because my review went |
|
thanks, merging to master! |
What changes were proposed in this pull request?
In Spark, the UI lacks troubleshooting abilities. For example:
This is because the live UI depends on an in-memory KV store. We should always be worried
about the stability issues when adding more information to the store. Therefore, it's better to
add a disk-based store to save more information
This PR includes:
Why are the changes needed?
The troubleshooting ability is highly needed. Because without this, it's hard to
debug AQE related issues. Once we solve the blockers, we can make a long-term plan to improve the
observability.
Does this PR introduce any user-facing change?
Yes, a new REST API to expose more information of the application.
Rest API endpoint: http://localhost:4040/api/v1/applications/local-1647312132944/diagnostics/sql/0
Example:
Output:
{ "id" : 0, "physicalPlan" : "<plan description string>", "submissionTime" : "2022-03-15T03:41:42.226GMT", "completionTime" : "2022-03-15T03:41:43.387GMT", "errorMessage" : "", "planChanges" : [ { "physicalPlan" : "<plan description string>", "updateTime" : "2022-03-15T03:41:42.268GMT" }, { "physicalPlan" : "<plan description string>", "updateTime" : "2022-03-15T03:41:43.262GMT" } ] }How was this patch tested?
manually test