Skip to content

Fix trino-cli to display plans for EXPLAIN ANALYZE for update statements#13907

Merged
findepi merged 1 commit intotrinodb:masterfrom
homar:homar/fix_explain_analyze_tri_cli_output
Sep 27, 2022
Merged

Fix trino-cli to display plans for EXPLAIN ANALYZE for update statements#13907
findepi merged 1 commit intotrinodb:masterfrom
homar:homar/fix_explain_analyze_tri_cli_output

Conversation

@homar
Copy link
Copy Markdown
Member

@homar homar commented Aug 29, 2022

Description

Previously trino-cli didn't display plans for EXPLAIN ANALYZe

Is this change a fix, improvement, new feature, refactoring, or other?
a fix
Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)
a client library
How would you describe this change to a non-technical end user or system administrator?
Gives user proper output for EXPLAIN ANALYZe

Related issues, pull requests, and links

Documentation

(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

( ) No release notes entries required.
(x) Release notes entries required with the following suggested text:

# cli
* trino-cli displays query plan for EXPLAIN ANALYZE queries ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Aug 29, 2022
@homar homar requested a review from findepi August 29, 2022 15:58
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the server not set the update type in results if it's an explain query?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this, see my comment below with link to a pr that has links to 2 others :D I did 3 different approaches but none of them seems good enough

Copy link
Copy Markdown
Member

@electrum electrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a bug on the server side. We shouldn't set update type if there is a real result.

The client discards results for an update because the server sends a fake result in that case, for compatibility with older clients (many years ago). The client should not be guessing as to the type of result based on the name/shape of the results.

@electrum
Copy link
Copy Markdown
Member

Actually, I think we still want to send update type and count, since an EXPLAIN ANALYZE can still have an update count, and the client might want to see or access it.

We can remove the code that produces the fake result in the server. We should verify that all of our clients support this properly.

For the JDBC driver, it needs to properly implement Statement.getMoreResults() so that it returns the result set rows if they exist (i.e. if the server returns a columns field in the query results).

@homar
Copy link
Copy Markdown
Member Author

homar commented Aug 29, 2022

Actually, I think we still want to send update type and count, since an EXPLAIN ANALYZE can still have an update count, and the client might want to see or access it.

We can remove the code that produces the fake result in the server. We should verify that all of our clients support this properly.

For the JDBC driver, it needs to properly implement Statement.getMoreResults() so that it returns the result set rows if they exist (i.e. if the server returns a columns field in the query results).

@electrum Currently we don't send updateCount, it is set to null for EXPLAIN ANALYZE, we only send updateType.
Should I try to fix sending updateCount for EXPLAIN ANALYZE? If so, assuming updateCound and updateType are present how should I fix this if not by using hacky solution like this one ?

@electrum
Copy link
Copy Markdown
Member

EXPLAIN ANALYZE is special in that it runs the statement. Its update type should be the update type of the underlying statement (if present). It should also return the update count (if present).

The reason we have the "discard results" hack in the CLI and JDBC driver is due to the server adding a fake result set for update statements. If we remove that on the server side, then there doesn't need to be any special handling in the clients.

With this change, the JDBC driver will need to support the multiple result sets: the update count and the explain analyze output. JDBC explicitly supports multiple results as explained in the Javadoc. We should have the update count be returned first (the first result set) since that will work correctly with older servers that return the fake result set.

@homar homar force-pushed the homar/fix_explain_analyze_tri_cli_output branch from e0beb8d to 4e3e483 Compare September 6, 2022 12:54
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should still be called but after the call to both renderUpdate and renderResults, on line 168. If results are not consumed, queries can get stuck in the FINISHING phase. Also, there are no other calls to discardResults.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm ok, I will give it a look thanks

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nineinchnick after renderResults completes, we shouldn't need to do discardResults, should we?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nineinchnick thoughts?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

@homar homar force-pushed the homar/fix_explain_analyze_tri_cli_output branch from 4e3e483 to 89a2ad2 Compare September 6, 2022 20:26
@homar homar changed the title Fix trino-cli to display plans for EXPLAIN ANALYZE [WIP] Fix trino-cli to display plans for EXPLAIN ANALYZE Sep 6, 2022
@github-actions github-actions bot added the jdbc Relates to Trino JDBC driver label Sep 6, 2022
@homar homar force-pushed the homar/fix_explain_analyze_tri_cli_output branch from c09b74c to 666663b Compare September 7, 2022 08:21
@homar homar force-pushed the homar/fix_explain_analyze_tri_cli_output branch 3 times, most recently from d5cc678 to 4be06bd Compare September 9, 2022 15:02
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this is a good change.

It looks we're going to call client.advance(); until we drain the results.
I don't know what are the assumptions for the code above (it originates a while back from 8ccea3a and doesn't carry much comments), but it feels wrong to me.

I think the intention here is

  • call client.advance() until we know the columns of the result, since this information is one of the first send back by the server

if the server now doesn't return columns in some cases, maybe it needs to indicate no columns will ever be provided, so that clients don't need to drain / buffer all the results?

eg some related code in the python client
https://github.com/trinodb/trino-python-client/blob/cffd2b232d9298321c34430f0ac829f6b6ebea8f/trino/client.py#L652-L655

cc. @electrum

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the server now doesn't return columns in some cases, maybe it needs to indicate no columns will ever be provided, so that clients don't need to drain / buffer all the results?

It always returns the columns, eventually, unless the query fails before it gets analyzed. But it may not happen in the first (or first few) requests, depending on how long the query gets queued, etc.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it may not happen in the first (or first few) requests

that's true

It always returns the columns, eventually

@martint Do we always need to produce columns, also for INSERT, CALL, SET or GRANT statements?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, for update queries this results in updated count being printed twice

trino:tpch> UPDATE nation SET nationkey = 4;
UPDATE: 25 rows
 rows
------
   25
(1 row)

@homar homar force-pushed the homar/fix_explain_analyze_tri_cli_output branch from 4be06bd to cf245ef Compare September 13, 2022 20:39
@homar homar changed the title [WIP] Fix trino-cli to display plans for EXPLAIN ANALYZE Fix trino-cli to display plans for EXPLAIN ANALYZE Sep 15, 2022
@homar
Copy link
Copy Markdown
Member Author

homar commented Sep 16, 2022

@electrum Could you please take a look now?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why go here only when updateCount is null?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for example -- after this change -- for CTAS we print number of rows written:

trino:default> CREATE TABLE t AS TABLE tpch.tiny.nation;
CREATE TABLE: 25 rows

but for EXPLAIN ANALYZE CTAS we print only the operation name "CREATE TABLE", without the row count:

trino:default> EXPLAIN ANALYZE CREATE TABLE t AS TABLE tpch.tiny.nation;
CREATE TABLE
<query plan>

The query plan is correctly printed, but row count should be too

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add a code comment, mainly to avoid printing same information twice as in your comment above #13907 (comment)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nineinchnick after renderResults completes, we shouldn't need to do discardResults, should we?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assertThat(lines).contains("CREATE TABLE", "Query Plan");
assertThat(lines).contains("CREATE TABLE: 10 rows", "Query Plan");

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is hard to do it now due to #14253, but that could be a good enhancement for the future

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assertThat(lines).contains("INSERT", "Query Plan");
assertThat(lines).contains("INSERT: 1 row", "Query Plan");

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is hard to do it now due to #14253, but that could be a good enhancement for the future

@homar
Copy link
Copy Markdown
Member Author

homar commented Sep 20, 2022

Pushing a version without all of the changes to make sure changes so far are correct

@homar homar force-pushed the homar/fix_explain_analyze_tri_cli_output branch from cf245ef to 49769c7 Compare September 20, 2022 14:18
@homar homar force-pushed the homar/fix_explain_analyze_tri_cli_output branch from 49769c7 to 931f750 Compare September 22, 2022 12:00
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nineinchnick thoughts?

@findepi
Copy link
Copy Markdown
Member

findepi commented Sep 23, 2022

will merge tomorrow

if (!resultsConsumed && queryInfo.getOutputStage().isEmpty()) {
return queryResultRowsBuilder(session)
.withSingleBooleanValue(createColumn("result", BooleanType.BOOLEAN, supportsParametricDateTime), true)
.withColumnsAndTypes(ImmutableList.of(), ImmutableList.of())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure, could this affect older versions of Trino CLI or other clients?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the changes in the Trino CLI don't require these, should they be in separate commits? If they do, would the Trino CLI still work correctly with older Trino servers?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we run tests against older versions and they work so I think we are fine

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does seems to affect other clients though probably - see trinodb/trino-python-client#212 (comment)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, PR title could be probably better worded (and release notes as well) to mention we no longer return result sets for queries which don't return output (which is correct behaviour but older code could be relying on this).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem. The comment was meant more to the reviewers. Thanks for the offer - if you want to and feel comfortable with that sure but don't feel responsible for it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc: @findepi probably

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel responsible but not comfortable, what should I do in such a case ?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hashhar you know i strive to make PR titles as informative as I can be, but it's also not possible to fit the whole story in that line
the biggest miss is that we broke the Python client, unknowingly.

Would it make sense to have some Python client comaptibility smoke tests within this repo?
This would supplement the ones maintained in the Python client repo. That ones cover current Py client ws current and old servers.
Those new ones would cover current server against older clients.

Copy link
Copy Markdown
Member

@hashhar hashhar Sep 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel responsible but not comfortable, what should I do in such a case ?

No problem. I'd request a review to make sure you can verify the functional changes there match what we did for the JDBC client here. @homar

the biggest miss is that we broke the Python client, unknowingly.

@findepi The Python client isn't my biggest concern - my main concern is this is a user-facing change where if someone wrote code relying on DDL queries as well returning results then they'd be affected (even though it'd be wrong to rely on that in the first place).

We can consider adding a smoke test here that tests current server against current client only so that at least we can ensure that there's at-least 1 working Python (and Go) client for each Trino server release.

I'll create a separate issue about such tests and let's continue here.

@findepi findepi merged commit bc794cd into trinodb:master Sep 27, 2022
@findepi findepi mentioned this pull request Sep 27, 2022
@github-actions github-actions bot added this to the 398 milestone Sep 27, 2022
@martint
Copy link
Copy Markdown
Member

martint commented Sep 28, 2022

What was actually fixed with this PR?

The current version (397) shows query plans for explain analyze:

trino> select version();
 _col0
-------
 397
(1 row)

Query 20220925_231758_00003_r6mch, FINISHED, 1 node
Splits: 1 total, 1 done (100.00%)
0.06 [0 rows, 0B] [0 rows/s, 0B/s]

trino> EXPLAIN ANALYZE select * from tpch.tiny.orders;
                                                                                              Query Plan
-------------------------------------------------------------------------------------------------------------------------------------------
 Fragment 1 [tpch:orders:15000]
     CPU: 194.57ms, Scheduled: 194.66ms, Blocked 0.00ns (Input: 0.00ns, Output: 0.00ns), Input: 15000 rows (1.86MB); per task: avg.: 15000.
     Output layout: [orderkey, custkey, orderstatus, totalprice, orderdate, orderpriority, clerk, shippriority, comment]
     Output partitioning: SINGLE []
     TableScan[table = tpch:tiny:orders]
         Layout: [orderkey:bigint, custkey:bigint, orderstatus:varchar(1), totalprice:double, orderdate:date, orderpriority:varchar(15), cl
         Estimates: {rows: 15000 (1.52MB), cpu: 1.52M, memory: 0B, network: 0B}
         CPU: 194.00ms (100.00%), Scheduled: 194.00ms (100.00%), Blocked: 0.00ns (?%), Output: 15000 rows (1.86MB)
         Input avg.: 3750.00 rows, Input std.dev.: 0.00%
         clerk := tpch:clerk
         orderkey := tpch:orderkey
         orderstatus := tpch:orderstatus
             :: [[F], [O], [P]]
         custkey := tpch:custkey
         totalprice := tpch:totalprice
         comment := tpch:comment
         orderdate := tpch:orderdate
         orderpriority := tpch:orderpriority
         shippriority := tpch:shippriority


(1 row)

@findepi
Copy link
Copy Markdown
Member

findepi commented Sep 29, 2022

What was actually fixed with this PR?

EXPLAIN ANALYZE for update statements. Will fix the PR title

@findepi findepi changed the title Fix trino-cli to display plans for EXPLAIN ANALYZE Fix trino-cli to display plans for EXPLAIN ANALYZE for update statements Sep 29, 2022
@findepi
Copy link
Copy Markdown
Member

findepi commented Sep 29, 2022

for posterity, i've updated #14245 (comment) too
thanks for catching this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed jdbc Relates to Trino JDBC driver

Development

Successfully merging this pull request may close these issues.

6 participants