Skip to content

[native pos] Propagate taskInfo updates to Spark metrics#20295

Merged
shrinidhijoshi merged 1 commit intoprestodb:masterfrom
shrinidhijoshi:updateSparkMetrics
Jul 12, 2023
Merged

[native pos] Propagate taskInfo updates to Spark metrics#20295
shrinidhijoshi merged 1 commit intoprestodb:masterfrom
shrinidhijoshi:updateSparkMetrics

Conversation

@shrinidhijoshi
Copy link
Collaborator

Currently, we update the spark metrics accumulators only after the task is complete. This means that spark sees no progress or activity on the task until the task is complete. This also means that we can't effectively use the stalled tasks killer in spark scheduler as it relies on spark accumulators being updated regularly to measure if a task is making progress.

This commit fixes this by ensuring that everytime we receive TaskInfo from CPP process we immediately update Spark accumulators to let spark know that we are making progress

== NO RELEASE NOTE ==

Currently, we update the spark metrics accumulators
only after the task is complete. This means that spark
sees no progress or activity on the task until the task
is complete. This also means that we can't effectively use
the stalled tasks killer in spark scheduler as it relies on
spark accumulators being updated regularly to measure if a
task is making progress.

This commit fixes this by ensuring that everytime we receive
TaskInfo from CPP process we immediately update Spark accumulators
to let spark know that we are making progress
@shrinidhijoshi shrinidhijoshi requested a review from a team as a code owner July 12, 2023 19:15
Copy link
Member

@vermapratyush vermapratyush left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@AdeleoM
Copy link
Contributor

AdeleoM commented Jul 12, 2023

LGTM, but do you know which event trigger the check on Spark Side?

@miaoever
Copy link
Contributor

As offline discussion, this probably won't work (not enough) as IIUC, Presto-on-spark in general only pass back the taskInfo to driver/coordinator when task finished.

@shrinidhijoshi
Copy link
Collaborator Author

shrinidhijoshi commented Jul 12, 2023

LGTM, but do you know which event trigger the check on Spark Side?

@AdeleoM Spark's scheduler runs a loop to watch the task metrics. So we don't need a trigger.

@shrinidhijoshi
Copy link
Collaborator Author

As offline discussion, this probably won't work (not enough) as IIUC, Presto-on-spark in general only pass back the taskInfo to driver/coordinator when task finished

Clarified with @miaoever that we are updating the spark's TaskMetrics objects through this which should ideally trigger the hung task monitoring behavior

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@shrinidhijoshi shrinidhijoshi merged commit 76863c1 into prestodb:master Jul 12, 2023
shrinidhijoshi added a commit to shrinidhijoshi/presto that referenced this pull request Jul 28, 2023
Reverting (prestodb#20295) until
we figure out why metrics aren't updating.

Current investigation shows that TaskContext object, which is
needed during metrics update, is not available in the
HttpNativeExecutionTaskInfoFetcher thread. So reverting to doing
the task metrics update in the PrestoNativeTaskExecutorFactory
until we investigate and come with a fix.
shrinidhijoshi added a commit that referenced this pull request Jul 29, 2023
Reverting (#20295) until
we figure out why metrics aren't updating.

Current investigation shows that TaskContext object, which is
needed during metrics update, is not available in the
HttpNativeExecutionTaskInfoFetcher thread. So reverting to doing
the task metrics update in the PrestoNativeTaskExecutorFactory
until we investigate and come with a fix.
SeanIFitch pushed a commit to SeanIFitch/presto that referenced this pull request Aug 2, 2023
Reverting (prestodb#20295) until
we figure out why metrics aren't updating.

Current investigation shows that TaskContext object, which is
needed during metrics update, is not available in the
HttpNativeExecutionTaskInfoFetcher thread. So reverting to doing
the task metrics update in the PrestoNativeTaskExecutorFactory
until we investigate and come with a fix.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants