Skip to content

Support multi coordinator while serving resource group info#16205

Merged
tdcmeehan merged 1 commit intoprestodb:masterfrom
abhiseksaikia:disagg_pqs_resourcegrp
Jun 28, 2021
Merged

Support multi coordinator while serving resource group info#16205
tdcmeehan merged 1 commit intoprestodb:masterfrom
abhiseksaikia:disagg_pqs_resourcegrp

Conversation

@abhiseksaikia
Copy link
Contributor

@abhiseksaikia abhiseksaikia commented Jun 4, 2021

Currently v1/resourceGroupState/* endpoint serves resource group info available within a particular coordinator. The change here is to enable these endpoints to retrieve cluster wide resource group info.

This PR depends on #16163

Test plan - unit test

== RELEASE NOTES ==

General Changes
* Support multi coordinator for v1/resourceGroupState/* endpoint

@abhiseksaikia abhiseksaikia force-pushed the disagg_pqs_resourcegrp branch 13 times, most recently from cf5f3fd to c73f641 Compare June 8, 2021 23:24
@abhiseksaikia abhiseksaikia changed the title [WIP] Support multi coordinator while serving resource group info Support multi coordinator while serving resource group info Jun 9, 2021
@abhiseksaikia abhiseksaikia force-pushed the disagg_pqs_resourcegrp branch from c73f641 to bbdd003 Compare June 9, 2021 05:47
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DistributedQueryStateInfoResource to make the name consistent with the QueryInfoStateResource

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getQueryStateInfos to make the name consistent with the non distributed version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

user info needs to be passed to the coordinator and only return queries that matches the given user.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make the error message generic as we don't know if it's one or more Error getting query info from coordinators.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of going to all coordinators we can rely on the nodeQueryStates to filter coordinators which have queries running/queued.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have any use case where coordinator endpoint is called to pull entire cluster information? If not there we don't need to have this branching and just return local queries.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto, if not needed, remove branching logic and just return local query info.

Copy link
Contributor

@swapsmagic swapsmagic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few minor comments, rest looks good.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename includeFromLocal to includeLocalInfoOnly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't see us supporting xForwardedProto anywhere else. Do we know if it's been used or not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its not currently used in the resource group flow

@abhiseksaikia abhiseksaikia force-pushed the disagg_pqs_resourcegrp branch from bbdd003 to 9c5fefc Compare June 11, 2021 17:26
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check the not null message here.

Comment on lines 98 to 111
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would write this more along these lines:

            ListenableFuture<List<ResourceGroupInfo>> resourceGroupInfosFuture = Futures.allAsList(queryStateInfoFutureList);
            addSuccessCallback(resourceGroupInfosFuture, resourceGroupInfos -> {
                ResourceGroupInfo aggregatedResourceGroupInfo = buildResourceGroupInfo(resourceGroupInfos);
                if (aggregatedResourceGroupInfo == null) {
                    asyncResponse.resume(Response.status(NOT_FOUND).build());
                    return;
                }
                asyncResponse.resume(Response.ok(aggregatedResourceGroupInfo).build());
            });
            addExceptionCallback(resourceGroupInfosFuture, e -> {
                log.error(e, "Error in getting resource group info from one of the coordinators");
                asyncResponse.resume(Response.serverError().entity(e.getMessage()).build());
            });

Where buildResourceGroupInfo is a new method that takes in a list of ResourceGroupInfo and aggregates it in one shot. We can also remove AggregatedResourceGroupInfoBuilder this way.

Copy link
Contributor Author

@abhiseksaikia abhiseksaikia Jun 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One problem I see with this is, it does not handle the case where one coordinator throws 404 and others return successful response. In this case, we still want to aggregate the successful response and return.

I think with moving the aggregation to inline method will complicate aggregation for subgroups, having a builder will avoids this complication as well as need of creating more objects.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See other comment: I think this class can be inlined into a method.

@abhiseksaikia abhiseksaikia force-pushed the disagg_pqs_resourcegrp branch from 9c5fefc to a38c1bf Compare June 18, 2021 15:08
@tdcmeehan
Copy link
Contributor

Looks like this needs a rebase?

@abhiseksaikia abhiseksaikia force-pushed the disagg_pqs_resourcegrp branch 5 times, most recently from 1967993 to 3446024 Compare June 22, 2021 22:03
Currently v1/resourceGroupState/* endpoint serves resource group info available within a particular coordinator. The change here is to enable these endpoints to retrieve cluster wide resource group info.
@abhiseksaikia abhiseksaikia force-pushed the disagg_pqs_resourcegrp branch from 3446024 to 878b8fd Compare June 25, 2021 19:25
@tdcmeehan tdcmeehan merged commit f50e888 into prestodb:master Jun 28, 2021
@ajaygeorge ajaygeorge mentioned this pull request Jul 7, 2021
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants