Send periodic query progress events by pgupta2 · Pull Request #23195 · prestodb/presto

pgupta2 · 2024-07-12T17:25:09Z

Description

Add support for sending periodic progress events for Queued/Running queries using EventListener interface.

Motivation and Context

This change enables coordinators to push their current state of Queued/Running queries as progress events which can be logged to different sources for analysis purposes. Progress event is sent every 1 minute for each queued/running queries.

Impact

None

Test Plan

Added a Unit test to check for queryProgress events.
Existing tests

Contributor checklist

Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
Documented new properties (with its default value), SQL syntax, functions, or other functionality.
If release notes are required, they follow the release notes guidelines.
Adequate tests were added if applicable.
CI passed.

== RELEASE NOTES ==
General Changes
* Add a new configuration property ``event.query-progress-publish-interval`` to specify the time interval at which progress events should be generated. Default is every 1 minute. :pr:`23195`

prithvip

It might be better (single responsibility principle) if there was a new singleton class that calls DispatchManager::getQueries periodically and sends the heartbeat, and we can wire in any heartbeat-specific configuration there. "Query heartbeat" is overloaded in DispatchManager context, since DispatchQuery has recordHeartbeat(), which is used for purging abandoned queries, and thats totally different purpose than this "heartbeat". "sendQueryStateInfos" might be better terminology and would match with QueryStateInfoResource::getQueryStateInfos API.

prithvip · 2024-07-15T20:41:29Z

presto-main/src/main/java/com/facebook/presto/dispatcher/DispatchManager.java

        this.securityConfig = requireNonNull(securityConfig, "securityConfig is null");
+
+        this.queryMonitor = requireNonNull(queryMonitor, "queryMonitor is null");
+        this.queryHeartbeatExecutor = Executors.newSingleThreadScheduledExecutor(threadsNamed("query-heartbeat-management-%s"));


Cleaner to instantiate this in CoordinatorModule and bind it to a annotated ScheduledExecutorService here - see CoordinatorModule.java for how we do this in other cases

prithvip · 2024-07-15T20:42:08Z

presto-main/src/main/java/com/facebook/presto/dispatcher/DispatchManager.java

+            catch (Throwable e) {
+                log.error(e, "Error sending Heartbeat Event");
+            }
+        }, 1, 1, TimeUnit.MINUTES);


Make this configurable

prithvip · 2024-07-15T20:45:08Z

presto-main/src/main/java/com/facebook/presto/dispatcher/DispatchManager.java

+                sendQueryHeartbeats();
+            }
+            catch (Throwable e) {
+                log.error(e, "Error sending Heartbeat Event");


Issues in sending the heartbeat should be caught and handled by QueryMonitor, DispatchManager shouldn't worry about that or log anything.

prithvip · 2024-07-15T20:46:19Z

presto-main/src/main/java/com/facebook/presto/dispatcher/DispatchManager.java


+    private void sendQueryHeartbeats()
+    {
+        List<BasicQueryInfo> basicQueryInfoList = queryTracker.getAllQueries().stream().map(q -> q.getBasicQueryInfo()).collect(Collectors.toList());


DispatchManager::getQueries already does this, let's use that.

pgupta2 · 2024-07-18T17:56:20Z

@prithvip @tdcmeehan : Gentle reminder for the review! Thanks.

prithvip · 2024-07-18T17:56:53Z

presto-main/src/main/java/com/facebook/presto/event/QueryMonitor.java

+                    ofEpochMilli(queryInfo.getQueryStats().getCreateTime().getMillis())));
+        }
+        catch (Throwable e) {
+            log.error(e, "Error sending Query Progress Event");


I don't think we need try-catch here, because any issues should be handled by the specific EventListener implementation which does not throw checked exception. The try-catch should happen there.

jaystarshot · 2024-07-19T02:42:30Z

Can you add how progress events can potentially be analyzed?
I feel that there is more value in adding whole cluster level events instead of individual query events

tdcmeehan

Can you please add some unit tests?

pgupta2 · 2024-07-20T01:14:56Z

Can you add how progress events can potentially be analyzed? I feel that there is more value in adding whole cluster level events instead of individual query events

Query progress events can be used to figure out the workload distribution by users or workload type which can be used for doing more realtime quota enforcement. Another use case would be to get insights into the queuing state of the cluster at real time and identify the workloads/users contributing to it. Query progress events allows us to aggregate data on different dimensions as needed. Cluster level events are very coarse and will not expose such detailed information.

pgupta2 · 2024-07-20T01:15:26Z

Can you please add some unit tests?

@tdcmeehan : Modified existing test to check for progress events and updated the PR. Thanks.

tdcmeehan · 2024-07-22T14:01:27Z

presto-main/src/main/java/com/facebook/presto/event/QueryProgressMonitor.java

+    @PostConstruct
+    public void start()
+    {
+        queryProgressMonitorExecutor.scheduleWithFixedDelay(


For a heavily contended cluster, I think this would be pretty wasteful, because most queries will publish identical information each time even though they're queued and it won't change until they dequeue. What do you think about making it based off of query state transition? I.e. it publishes once per query state transition, and when it transitions into EXECUTING then we start a periodic timer?

I feel that the decision to process/ignore events for QUEUED queries should be left to the consumer itself. It also depends on the use case for which these events are being utilized. For instance, within Meta itself, these events will be used for two different purposes:

Resource utilization of in-progress queries to perform quota enforcement and throttling. This consumer simply ignore events for QUEUED queries and only logs events for running queries.

Getting the current queuing state of the cluster and power certain alerts/dashboard. This requires getting events for queued queries at periodic interval in order to generate a recent snapshot. Another problem I see is that if a query fails after queueing and before logging query completed event (in case of coordinator crash or cluster restart), then we will not be able to figure out if a query is in QUEUED state because its waiting for resources or whether it died due to to some reason. So, having periodic samples will surface out more accurate snaphot of the cluster state.

tdcmeehan · 2024-07-22T14:11:51Z

presto-spi/src/main/java/com/facebook/presto/spi/eventlistener/QueryProgressEvent.java

+
+import static java.util.Objects.requireNonNull;
+
+public class QueryProgressEvent


I think what makes this different from similar classes in this package is this is designed to publish multiple events, instead of an event per state transition or at query start/finish. If you imagine that someone might put this into a queue that has at least once semantics, then it might be possible for messages to arrive out of order or duplicated. That's not a problem for peer classes in this package because you know the state transition order, or you know that you only need to look at one query completed even if there's a duplicate, but for this class, it's impossible to infer correct ordering. This may lead to strange queries or incorrect analysis of the events. I'd recommend you include a monotonically increasing identifier that downstream systems can use to infer ordering and to de-duplicate at the reader side.

@tdcmeehan : Added an increasing event Id field in the QueryProgressEvent class.

tdcmeehan · 2024-07-22T14:17:04Z

Can you add how progress events can potentially be analyzed? I feel that there is more value in adding whole cluster level events instead of individual query events

Query progress events can be used to figure out the workload distribution by users or workload type which can be used for doing more realtime quota enforcement. Another use case would be to get insights into the queuing state of the cluster at real time and identify the workloads/users contributing to it. Query progress events allows us to aggregate data on different dimensions as needed. Cluster level events are very coarse and will not expose such detailed information.

Shouldn't higher level quota enforcement work hand-in-hand with local resource management in the resource group? If so, wouldn't it be better to publish these events at the granularity of resource group IDs? If not, how do you ensure that local resources are in line with global quota?

Likewise, wouldn't events at the resource group level give you the insights needed to identify the workloads and users contributing to the queueing state?

pgupta2 · 2024-07-22T23:57:35Z

Shouldn't higher level quota enforcement work hand-in-hand with local resource management in the resource group? If so, wouldn't it be better to publish these events at the granularity of resource group IDs? If not, how do you ensure that local resources are in line with global quota?

My understanding might be wrong, but I feel that higher level quota enforcement can be different from local resource management provided by resource groups. AFAIK, RG provides quota management on concurrency and memory share. So, if a resource group is configured to use 50% memory share and with 100 concurrency, it can run 100 queries in parallel which can consume a total of 50% cluster memory. Global quota management could happen at team, org, product level wherein a team might have quota to use X memory resources per day and once they have consumed these resources, their workload will get blocked/throttled. This enforcement cannot be achieved at RG level IMO. Also, an org might have multiple teams using different resource groups for their workload and we need to aggregate resource usage at org level to perform such quota enforcement.

Likewise, wouldn't events at the resource group level give you the insights needed to identify the workloads and users contributing to the queueing state?

Yes, we should be able to collect this data at RG level. However, even at RG level, we will be collecting BasicQueryInfos for all running queries and publishing them at periodic interval. In terms of information published, it will most likely still be same, just grouped at RG level.

pgupta2 · 2024-07-25T17:49:49Z

@tdcmeehan : Gentle reminder for the review! Thanks.

pgupta2 · 2024-07-29T16:06:40Z

@tdcmeehan : Another gentle reminder for the review!! :)

steveburnett · 2024-07-29T17:31:33Z

Suggest adding the PR number to the release note entry per the Release Notes Guidelines.

== RELEASE NOTES ==
General Changes
* Add a new configuration property ``event.query-progress-publish-interval`` to specify the time interval at which progress events should be generated. Default is every 1 minute. :pr:`23195`

pgupta2 · 2024-08-05T18:44:40Z

@tdcmeehan : Ping.

tdcmeehan · 2024-08-05T18:50:39Z

presto-main/src/main/java/com/facebook/presto/event/QueryProgressMonitor.java

+    private void publishQueryProgressEvent()
+    {
+        for (BasicQueryInfo basicQueryInfo : dispatchManager.getQueries()) {
+            if (basicQueryInfo.getState() != FINISHED && basicQueryInfo.getState() != FAILED) {


Suggested change

if (basicQueryInfo.getState() != FINISHED && basicQueryInfo.getState() != FAILED) {

if (!basicQueryInfo.getState().isDone()) {

tdcmeehan · 2024-08-05T18:54:08Z

presto-main/src/main/java/com/facebook/presto/event/QueryMonitorConfig.java

 public class QueryMonitorConfig
 {
    private DataSize maxOutputStageJsonSize = new DataSize(16, Unit.MEGABYTE);
+    private Duration queryProgressPublishInterval = new Duration(1, MINUTES);


Can this be defaulted to 0 (don't publish events)?

tdcmeehan · 2024-08-05T18:55:19Z

presto-main/src/main/java/com/facebook/presto/event/QueryProgressMonitor.java

+    @PostConstruct
+    public void start()
+    {
+        queryProgressMonitorExecutor.scheduleWithFixedDelay(


Can you check if the duration is 0, and if so, make this a no-op? And also make the executor loaded lazily?

tdcmeehan · 2024-08-05T19:07:01Z

Yes, we should be able to collect this data at RG level. However, even at RG level, we will be collecting BasicQueryInfos for all running queries and publishing them at periodic interval. In terms of information published, it will most likely still be same, just grouped at RG level.

From your message earlier, I read two use cases: more realtime quota enforcement, and identifying sources of queueing. For both of these, I'm wondering if you get sufficient information from resource group nodes. Instead of publishing BasicQueryInfos, I'm wondering if you can publish ResourceGroupInfos, perhaps with additional set of QueryIds if you want to drill down into particular queries. Overall, I'm just wondering how practical this change is for most people, since it seems like a ton of information, and you think that this sort of information is spooled somewhere and hence would require potentially costly, online storage to provide that spooling. And if the main things you need are all for the purposes of aggregating it to the resource group level, why not just publish at the resource group level?

pgupta2 · 2024-08-06T01:25:13Z

Instead of publishing BasicQueryInfos, I'm wondering if you can publish ResourceGroupInfos, perhaps with additional set of QueryIds if you want to drill down into particular queries.

Just to confirm my understanding, are you suggesting that we can publish ResourceGroupInfos which will contain information about all running queries in the given resource group as a single event instead of publishing information about each query as a separate event?

ResourceGroupInfo only contains information about running queries so we might have to extend it to expose queued queries information as well. It can be done but first I want to ensure that my above understanding is correct or not.

perhaps with additional set of QueryIds if you want to drill down into particular queries.

Drilling down is not possible with the realtime quota enforcement use case given that the enforcement system will consume these raw events to make quota decision at realtime. There is no offline processing that will happen later to get more information about these queries using queryIds. So, we need to expose detailed information in the event itself.

And if the main things you need are all for the purposes of aggregating it to the resource group level, why not just publish at the resource group level?

I think I mentioned above that aggregation will happen at levels of teams, products, orgs which can be mapped to several RGs in a cluster. So, we need to log detailed information at query level and perform aggregation based on these dimensions.

I feel that if we really want to publish events at RG level, we can still do that and push the logic to publish more detailed information (at query level perhaps) to the event consumer itself. Given ResourceGroupInfo contains information about queued/running queries, they can be iterated upon and published as needed.

tdcmeehan

I think I understand better now the intended use case, and I'm fine with the direction of these changes. Thanks @pgupta2. Just one final nit.

tdcmeehan · 2024-08-06T17:27:19Z

presto-main/src/main/java/com/facebook/presto/event/QueryProgressMonitor.java

+    private ScheduledExecutorService queryProgressMonitorExecutor;
+
+    @Inject
+    public QueryProgressMonitor(
+            QueryMonitor queryMonitor,
+            DispatchManager dispatchManager,
+            QueryMonitorConfig queryMonitorConfig,
+            ResourceGroupManager resourceGroupManager)
+    {
+        resourceGroupManager.getRootResourceGroups();
+        this.queryMonitor = requireNonNull(queryMonitor, "queryMonitor is null");
+        this.dispatchManager = requireNonNull(dispatchManager, "dispatchManager is null");
+        this.queryProgressPublishInterval = requireNonNull(queryMonitorConfig, "queryMonitorConfig is null").getQueryProgressPublishInterval();
+    }
+
+    @PostConstruct
+    public void start()


Suggested change

private ScheduledExecutorService queryProgressMonitorExecutor;

@Inject

public QueryProgressMonitor(

QueryMonitor queryMonitor,

DispatchManager dispatchManager,

QueryMonitorConfig queryMonitorConfig,

ResourceGroupManager resourceGroupManager)

{

resourceGroupManager.getRootResourceGroups();

this.queryMonitor = requireNonNull(queryMonitor, "queryMonitor is null");

this.dispatchManager = requireNonNull(dispatchManager, "dispatchManager is null");

this.queryProgressPublishInterval = requireNonNull(queryMonitorConfig, "queryMonitorConfig is null").getQueryProgressPublishInterval();

}

@PostConstruct

public void start()

@GuardedBy("this")

private ScheduledExecutorService queryProgressMonitorExecutor;

@Inject

public QueryProgressMonitor(

QueryMonitor queryMonitor,

DispatchManager dispatchManager,

QueryMonitorConfig queryMonitorConfig,

ResourceGroupManager resourceGroupManager)

{

resourceGroupManager.getRootResourceGroups();

this.queryMonitor = requireNonNull(queryMonitor, "queryMonitor is null");

this.dispatchManager = requireNonNull(dispatchManager, "dispatchManager is null");

this.queryProgressPublishInterval = requireNonNull(queryMonitorConfig, "queryMonitorConfig is null").getQueryProgressPublishInterval();

}

@PostConstruct

public synchronized void start()

tdcmeehan · 2024-08-06T17:31:20Z

presto-main/src/main/java/com/facebook/presto/event/QueryMonitorConfig.java

+    }
+
+    @Config("event.query-progress-publish-interval")
+    @ConfigDescription("How frequently to publish query progress events")


Suggested change

@ConfigDescription("How frequently to publish query progress events")

@ConfigDescription("How frequently to publish query progress events. 0ms disables the publication of these events.")

This change enables coordinators to push their current state of Queued/Running queries as progress events which can be logged to different sources for analysis purposes. Progress event is sent every 1 minute for each queued/running queries.

steveburnett · 2024-08-07T13:42:19Z

Please add documentation for this new configuration property to https://github.com/prestodb/presto/blob/master/presto-docs/src/main/sphinx/admin/properties.rst .

pgupta2 marked this pull request as ready for review July 12, 2024 20:20

pgupta2 requested a review from a team as a code owner July 12, 2024 20:20

pgupta2 requested review from neeradsomanchi, presto-oss and prithvip July 12, 2024 20:20

prithvip reviewed Jul 15, 2024

View reviewed changes

pgupta2 changed the title ~~Send periodic query heartbeat events~~ Send periodic query progress events Jul 16, 2024

pgupta2 force-pushed the periodic_update_event branch 2 times, most recently from f0e0fd6 to 3b20785 Compare July 16, 2024 18:52

tdcmeehan self-assigned this Jul 16, 2024

pgupta2 requested a review from prithvip July 16, 2024 20:01

prithvip previously approved these changes Jul 18, 2024

View reviewed changes

prithvip reviewed Jul 18, 2024

View reviewed changes

pgupta2 dismissed prithvip’s stale review via b4e5624 July 18, 2024 23:51

pgupta2 force-pushed the periodic_update_event branch from 3b20785 to b4e5624 Compare July 18, 2024 23:51

pgupta2 requested a review from tdcmeehan July 18, 2024 23:59

tdcmeehan requested changes Jul 19, 2024

View reviewed changes

pgupta2 force-pushed the periodic_update_event branch from b4e5624 to 48b34c6 Compare July 20, 2024 01:05

pgupta2 requested review from elharo, feilong-liu, jaystarshot and vivek-bharathan as code owners July 20, 2024 01:05

pgupta2 requested a review from tdcmeehan July 20, 2024 01:15

pgupta2 force-pushed the periodic_update_event branch from 48b34c6 to d1d0747 Compare July 20, 2024 06:47

pgupta2 requested a review from prithvip July 20, 2024 06:47

pgupta2 force-pushed the periodic_update_event branch from d1d0747 to 9460ae2 Compare July 21, 2024 19:01

tdcmeehan requested changes Jul 22, 2024

View reviewed changes

pgupta2 force-pushed the periodic_update_event branch from 9460ae2 to ec4ad6a Compare July 22, 2024 23:59

pgupta2 requested a review from tdcmeehan July 23, 2024 16:13

prithvip previously approved these changes Aug 5, 2024

View reviewed changes

tdcmeehan requested changes Aug 5, 2024

View reviewed changes

pgupta2 dismissed prithvip’s stale review via 8acbcd1 August 6, 2024 05:02

pgupta2 force-pushed the periodic_update_event branch 2 times, most recently from 8acbcd1 to 4eef95e Compare August 6, 2024 16:02

pgupta2 requested a review from tdcmeehan August 6, 2024 16:03

tdcmeehan reviewed Aug 6, 2024

View reviewed changes

Send periodic query progress events

1285c88

This change enables coordinators to push their current state of Queued/Running queries as progress events which can be logged to different sources for analysis purposes. Progress event is sent every 1 minute for each queued/running queries.

pgupta2 force-pushed the periodic_update_event branch from 4eef95e to 1285c88 Compare August 6, 2024 23:31

pgupta2 requested a review from tdcmeehan August 7, 2024 01:52

tdcmeehan approved these changes Aug 7, 2024

View reviewed changes

pgupta2 merged commit 9ce424c into prestodb:master Aug 7, 2024

This was referenced Aug 7, 2024

Add documentation for query progress event publish interval config #23400

Closed

Add documentation for query progress event publish interval config #23414

Merged

rschlussel mentioned this pull request Aug 14, 2024

TestEventListener.testPrepareAndExecute is flaky #23448

Closed

tdcmeehan mentioned this pull request Aug 23, 2024

Add release notes for 0.289 #23513

Merged

34 tasks


		import static java.util.Objects.requireNonNull;

		public class QueryProgressEvent

	if (basicQueryInfo.getState() != FINISHED && basicQueryInfo.getState() != FAILED) {
	if (!basicQueryInfo.getState().isDone()) {

	@ConfigDescription("How frequently to publish query progress events")
	@ConfigDescription("How frequently to publish query progress events. 0ms disables the publication of these events.")

Conversation

pgupta2 commented Jul 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Impact

Test Plan

Contributor checklist

Uh oh!

prithvip left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pgupta2 commented Jul 18, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jaystarshot commented Jul 19, 2024

Uh oh!

tdcmeehan left a comment

Choose a reason for hiding this comment

Uh oh!

pgupta2 commented Jul 20, 2024

Uh oh!

pgupta2 commented Jul 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tdcmeehan commented Jul 22, 2024

Uh oh!

pgupta2 commented Jul 22, 2024

Uh oh!

pgupta2 commented Jul 25, 2024

Uh oh!

pgupta2 commented Jul 29, 2024

Uh oh!

steveburnett commented Jul 29, 2024

Uh oh!

pgupta2 commented Aug 5, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tdcmeehan commented Aug 5, 2024

Uh oh!

pgupta2 commented Aug 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tdcmeehan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

steveburnett commented Aug 7, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

pgupta2 commented Jul 12, 2024 •

edited

Loading

pgupta2 commented Jul 20, 2024 •

edited

Loading

pgupta2 commented Aug 6, 2024 •

edited

Loading