Skip to content

[Verifier] Support reuse of source query output table#22169

Merged
gggrace14 merged 1 commit intoprestodb:masterfrom
gggrace14:scaledverifierbase
Jun 13, 2024
Merged

[Verifier] Support reuse of source query output table#22169
gggrace14 merged 1 commit intoprestodb:masterfrom
gggrace14:scaledverifierbase

Conversation

@gggrace14
Copy link
Contributor

@gggrace14 gggrace14 commented Mar 12, 2024

Add support to Presto Verifier to reuse the output tables of the source query for control and test. Add configs --control-reuse-table and --test-reuse-table to turn the feature on and off. If reuse-table is turned on, skip the execution of main query as well as the setup and teardown queries. Compute checksums on the output table of main query directly.

== NO RELEASE NOTE ==

@gggrace14 gggrace14 requested review from a team as code owners March 12, 2024 16:22
@gggrace14 gggrace14 force-pushed the scaledverifierbase branch 2 times, most recently from b9385cd to 6746754 Compare March 12, 2024 20:33
@mbasmanova
Copy link
Contributor

@gggrace14 Ge, how much faster verifier runs with this optimization?

@spershin
Copy link
Contributor

@gggrace14 Ge, how much faster verifier runs with this optimization?

@mbasmanova

This needs to be thoroughly tested to get some numbers and also depends on workflow and number of queries.
We can run this for a while in Native and compare the difference.

But, obviously it is the time without optimization minus time it takes to run setup, main and teardown queries.
Note, that it looks like (I noticed) the verifier runs control query 1st and then runs the test one, so both main queries are SERIALIZED.
Not sure if there is a verifier option to run them in parallel.

@gggrace14
Copy link
Contributor Author

class QueryRewriter, class AbstractVerification and class VerifierDao have the main changes. Others are about passing the required info around.

@gggrace14
Copy link
Contributor Author

Not sure if there is a verifier option to run them in parallel.

For control & test checksum queries, current implementation is to run them in sequence. For main queries, param --concurrent-control-and-test can run them in parallel.

spershin
spershin previously approved these changes Mar 13, 2024
Copy link
Contributor

@spershin spershin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gggrace14
Thanks for this change (and others) to enabled re-use of the tables and reduce verification time and compute.

The change looks good to me, few questions.

Maybe worth @kewang1024, @ajaygeorge or @aweisberg to take a quick look at it too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, why is this required in this PR?

Copy link
Contributor Author

@gggrace14 gggrace14 Mar 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the very tricky part of mvn dependencies that I need to check with java experts. No one knows about it so far. The above is the only one solution I find online.

In general, when packaging, this jdbc/pom.xml changes the path or namespace of presto.client

<relocation>
  <pattern>com.facebook.presto.client</pattern>
  <shadedPattern>${shadeBase}.client</shadedPattern>
</relocation>

This causes compiling error for the following line called by my change
In presto-jdbc

import com.facebook.presto.client.StatementStats;
...
public static QueryStats create(String queryId, StatementStats stats) {}

The above is the only one solution I find online. I will separate this pom change into a standalone PR, for people to find the change easily. Need to check what it means to the java build.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Figured out a way to avoid this attribute and changing the packaging.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange, didn't we put control and test query Ids into the table before this change?
The ones that you click on, like "Control Query ID".

Copy link
Contributor Author

@gggrace14 gggrace14 Mar 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the "Control Query ID" link on UI is populated with the query_id returned by the actual control query run triggered by Verifier today. If we reuse the shadow/prod query output and don't trigger the control/test query runs, we have to pass the query_ids all the way to Verifer through xdb test suite table, so that the UI can display them.

Otherwise the debugging experience is bad. Found it out from testing.

Comment on lines 48 to 52
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we need from the presto-client in the verifier?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One place is the following lines of change. The change is about passing the query_id from xdb to the UI by setting them in the related data structures.

In presto-verifier

import com.facebook.presto.client.StatementStats;
...
public static QueryActionStats queryIdStats(String queryId)
    {
        return new QueryActionStats(Optional.of(QueryStats.create(queryId, StatementStats.builder().setState(QueryState.FINISHED.toString()).build())), Optional.empty());
    }

@gggrace14 gggrace14 force-pushed the scaledverifierbase branch 3 times, most recently from ea03b3b to df80863 Compare March 13, 2024 22:09
Comment on lines +208 to +213
controlReuseTable = control.isPresent() && control.get() instanceof QueryObjectBundle && ((QueryObjectBundle) control.get()).isReuseTable();
if (controlReuseTable) {
controlQueryContext.setState(QueryState.REUSE);
controlQueryContext.setMainQueryStats(QueryActionStats.queryIdStats(sourceQuery.getQueryId(CONTROL).get()));
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be clearer to put the logic inside the getQueryRewrite()

Comment on lines +216 to +221
testReuseTable = test.isPresent() && test.get() instanceof QueryObjectBundle && ((QueryObjectBundle) test.get()).isReuseTable();
if (testReuseTable) {
testQueryContext.setState(QueryState.REUSE);
testQueryContext.setMainQueryStats(QueryActionStats.queryIdStats(sourceQuery.getQueryId(TEST).get()));
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

return reuseTable;
}

@ConfigDescription("If true, reuse the output table of the source query. Otherwise, run the query and write to a temporary shadow table.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1.Is it the best effort?
if it is set to true, but the output table is deleted, does the verification fail or follow original routine?

2.Does it only apply to INSERT and CTAS queries? if so, we should specify in the config description

Copy link
Contributor Author

@gggrace14 gggrace14 Jun 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. If the output table is deleted, the verification will fail with the table not exists error. The user is responsible to ensure the output table exists.
  2. Revised.

@gggrace14 gggrace14 force-pushed the scaledverifierbase branch 2 times, most recently from 9f54569 to e1c2eb6 Compare March 21, 2024 04:44
@gggrace14 gggrace14 force-pushed the scaledverifierbase branch 2 times, most recently from ce525d2 to b41a863 Compare April 3, 2024 21:45
@gggrace14 gggrace14 force-pushed the scaledverifierbase branch from b41a863 to cd55682 Compare April 9, 2024 17:21
@gggrace14 gggrace14 force-pushed the scaledverifierbase branch from cd55682 to d1c9520 Compare April 26, 2024 15:58
@steveburnett
Copy link
Contributor

Suggest adding

== RELEASE NOTES ==

Verifier Changes
* Add  `control-reuse-table` and `test-reuse-table` configuration properties for the Presto Verifier to reuse the output tables of the source query for control and test :pr:`22169`

Optional<VerifierQueryEvent> event = result.getEvent();
completed++;
if (!event.isPresent()) {
statusCount.compute(SKIPPED, (status, count) -> count == null ? 1 : count + 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this no longer necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!event.isPresent() is when the test case is to be resubmitted. From our experience, we find it confusing if we also add the resubmission count to the counter SKIPPED. This problem becomes more apparent when reuse-table is enabled, as we resubmit more to be safe.

The main condition of SKIPPED (line 361) is when the control query run fails, ex., due to syntax error, which is how ppl usually assume the counter SKIPPED. When SKIPPED includes resubmissions as today, we find it confusing that the counters SUCCEED, FAILED and SKIPPED don't add up to the total test cases, as one test case can be resubmitted multiple times. So I split it in this change and add one more counter RESUBMITTED (line 366).

QualifiedName originalTableName = insert.getTarget();
if (shouldReuseTable) {
return new QueryObjectBundle(
originalTableName,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the original table contains some other data then the data inserted by the query being verifier. How do we know what partitions of the original table to compute the checksum over?

Copy link
Contributor Author

@gggrace14 gggrace14 Jun 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A good question! I have a next PR on top of this one that assembles a partition predicate and applies it to the checksum query. That needs a separate PR. #22965


private final QueryActions queryActions;
private final SourceQuery sourceQuery;
private SourceQuery sourceQuery;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: consider keeping it final and return a new instance in VerificationResult (instead of this)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A good catch. Also thought it's better if I can make this class immutable. However, I can't figure out how the constructor can look like, esp. since this class AbstractVerification is an abstract class. Please let me know if you have a good suggestion.

I thought I could add an abstract method to this class

public abstract AbstractVerification withSourceQuery(SourceQuery sourceQuery);

Then its subclass DataVerification overrides this method, creates and returns a DataVerification instance. However, we can't access those private fields, ex. queryActions, of the base abstract class in the subclass when calling the DataVerification constructor.

Copy link
Contributor Author

@gggrace14 gggrace14 Jun 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @arhimondr , I figured out a different way that doesn't need to make this sourceQuery field mutable. The original objective is to pass a signal so that the "resubmitted" verification workflow will not reuse output tables. Previously I constructed a new SourceQuery instance. A different way I figured is to check the VerificationContext.getResubmissionCount() as below. I figured I'm able to access VerificationContext when calling queryRewriter.rewriteQuery()

    @Override
    protected QueryObjectBundle getQueryRewrite(ClusterType clusterType)
    {
        return queryRewriter.rewriteQuery(getSourceQuery().getQuery(clusterType), getSourceQuery().getQueryConfiguration(clusterType), clusterType,
                getVerificationContext().getResubmissionCount() == 0);
    }

@gggrace14 gggrace14 force-pushed the scaledverifierbase branch from d1c9520 to eef0472 Compare June 7, 2024 14:35
@gggrace14
Copy link
Contributor Author

gggrace14 commented Jun 9, 2024

Suggest adding

== RELEASE NOTES ==

Verifier Changes
* Add  `control-reuse-table` and `test-reuse-table` configuration properties for the Presto Verifier to reuse the output tables of the source query for control and test :pr:`22169`

Adding the corresponding documentation to the verifier.rst.

For the release note, I have a next PR #22965 on top of this for the complete feature. I'd prefer to add release note when the complete feature is in, to expose it to the user.

@gggrace14 gggrace14 force-pushed the scaledverifierbase branch from eef0472 to ece9d5e Compare June 9, 2024 19:02
@gggrace14 gggrace14 requested a review from steveburnett as a code owner June 9, 2024 19:02
@gggrace14 gggrace14 requested a review from arhimondr June 10, 2024 15:43
Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the doc! Nit of formatting, no other concerns.

are emitted to ``stdout``.
``control.table-prefix`` The table prefix to be appended to the control target table.
``test.table-prefix`` The table prefix to be appended to the test target table.
``control.reuse-table`` If true, reuse the output table of the control source Insert and CreateTableAsSelect
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
``control.reuse-table`` If true, reuse the output table of the control source Insert and CreateTableAsSelect
``control.reuse-table`` If ``true``, reuse the output table of the control source Insert and CreateTableAsSelect

Nit for formatting, same as #22965.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revised

``test.table-prefix`` The table prefix to be appended to the test target table.
``control.reuse-table`` If true, reuse the output table of the control source Insert and CreateTableAsSelect
query. Otherwise, run the control source query and write to a temporary table.
``test.reuse-table`` If true, reuse the output table of the test source Insert and CreateTableAsSelect
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
``test.reuse-table`` If true, reuse the output table of the test source Insert and CreateTableAsSelect
``test.reuse-table`` If ``true``, reuse the output table of the test source Insert and CreateTableAsSelect

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revised

@gggrace14 gggrace14 force-pushed the scaledverifierbase branch 2 times, most recently from 3f915f8 to 7f4ee12 Compare June 12, 2024 21:45
Add support to Presto Verifier to reuse the output tables of the source query
for control and test. Add configs --control-reuse-table and --test-reuse-table
to turn the feature on and off. If reuse-table is turned on, skip the execution
of main query as well as the setup and teardown queries. Compute checksums
on the output table of main query directly.
@gggrace14 gggrace14 force-pushed the scaledverifierbase branch from 7f4ee12 to 417472a Compare June 12, 2024 23:34
@gggrace14 gggrace14 requested a review from steveburnett June 13, 2024 00:19
@gggrace14
Copy link
Contributor Author

Thanks for the doc! Nit of formatting, no other concerns.

Hi @steveburnett , revised the format of verifier.rst document. Would you be able to help review again?

@gggrace14
Copy link
Contributor Author

Thanks for the doc! Nit of formatting, no other concerns.

Hi @steveburnett , revised the format of verifier.rst document. Would you be able to help review again?

Hi @steveburnett , thanks for the review! It looks to need your approval to be able to merge.

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Pull updated branch, new local build, looks good. Thanks!

@steveburnett
Copy link
Contributor

Not a required test so not blocking merge, but the test

singlestore tests / singlestore-dockerized-tests (pull_request)

failed here. I believe I've seen other mentions of that test being flaky lately, wanted to mention it in case people are looking for examples of it failing.

@gggrace14
Copy link
Contributor Author

LGTM! (docs)

Pull updated branch, new local build, looks good. Thanks!

Thank you, @steveburnett !

@gggrace14 gggrace14 merged commit 2ef2ba8 into prestodb:master Jun 13, 2024
@gggrace14 gggrace14 deleted the scaledverifierbase branch June 13, 2024 18:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants