Skip to content

Add support for ANALYZE FOR QUERY statement#12019

Closed
tangjiangling wants to merge 1 commit intotrinodb:masterfrom
tangjiangling:add-support-for-analyze-for-query-statement
Closed

Add support for ANALYZE FOR QUERY statement#12019
tangjiangling wants to merge 1 commit intotrinodb:masterfrom
tangjiangling:add-support-for-analyze-for-query-statement

Conversation

@tangjiangling
Copy link
Copy Markdown
Member

@tangjiangling tangjiangling commented Apr 19, 2022

Description

Is this change a fix, improvement, new feature, refactoring, or other?

New sql syntax.

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

Core query engine.

How would you describe this change to a non-technical end user or system administrator?

Allow user to execute the ANALYZE FOR QUERY statement to collect statistics on the granularity of the query.

Related issues, pull requests, and links

Fixes #11517

Documentation

( ) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

( ) No release notes entries required.
( ) Release notes entries required with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@findepi findepi added enhancement New feature or request syntax-needs-review labels Apr 19, 2022
@findepi
Copy link
Copy Markdown
Member

findepi commented Apr 19, 2022

@tangjiangling
Copy link
Copy Markdown
Member Author

cc @wubiaoi

@tangjiangling tangjiangling marked this pull request as ready for review April 19, 2022 21:04
@tangjiangling
Copy link
Copy Markdown
Member Author

Once the code has been reviewed, I'll file a PR for the documentation.

@tangjiangling tangjiangling removed the WIP label Apr 19, 2022
@tangjiangling
Copy link
Copy Markdown
Member Author

Run the following command (to check if I need to add additional tests)

git --no-pager grep -n '"ANALYZE ' | awk '{print $1}' | awk -F':' '{print $1}' | sort | uniq

Results:

core/trino-main/src/test/java/io/trino/sql/analyzer/TestAnalyzer.java
core/trino-main/src/test/java/io/trino/sql/planner/TestLogicalPlanner.java
core/trino-parser/src/main/java/io/trino/sql/SqlFormatter.java
core/trino-parser/src/test/java/io/trino/sql/parser/TestSqlParser.java
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java
plugin/trino-delta-lake/src/test/java/io/trino/plugin/deltalake/BaseDeltaLakeConnectorSmokeTest.java
plugin/trino-delta-lake/src/test/java/io/trino/plugin/deltalake/TestDeltaLakeAnalyze.java
plugin/trino-hive/src/test/java/io/trino/plugin/hive/BaseHiveConnectorTest.java
plugin/trino-hive/src/test/java/io/trino/plugin/hive/TestAnalyzeForQuery.java
plugin/trino-mysql/src/test/java/io/trino/plugin/mysql/BaseMySqlTableStatisticsIndexStatisticsTest.java
plugin/trino-mysql/src/test/java/io/trino/plugin/mysql/TestMySqlAutomaticJoinPushdown.java
plugin/trino-mysql/src/test/java/io/trino/plugin/mysql/TestMySqlTableStatisticsMySql8Histograms.java
plugin/trino-postgresql/src/test/java/io/trino/plugin/postgresql/TestJoinReorderingWithJoinPushdown.java
plugin/trino-postgresql/src/test/java/io/trino/plugin/postgresql/TestPostgreSqlAutomaticJoinPushdown.java
plugin/trino-postgresql/src/test/java/io/trino/plugin/postgresql/TestPostgreSqlTableStatistics.java
testing/trino-product-tests/src/main/java/io/trino/tests/product/TestTwoHives.java
testing/trino-product-tests/src/main/java/io/trino/tests/product/hive/TestExternalHiveTable.java
testing/trino-product-tests/src/main/java/io/trino/tests/product/hive/TestHiveBasicTableStatistics.java
testing/trino-product-tests/src/main/java/io/trino/tests/product/hive/TestHiveTableStatistics.java
testing/trino-testing/src/main/java/io/trino/testing/BaseFailureRecoveryTest.java
testing/trino-tests/src/test/java/io/trino/security/TestAccessControl.java
testing/trino-tests/src/test/java/io/trino/tests/tpch/TestTpchConnectorTest.java

So it looks like I still need to add PTs about Hive/Deltalake (I'm not sure if Deltalake needs to support this at the moment, see

public ConnectorTableHandle getTableHandleForStatisticsCollection(ConnectorSession session, SchemaTableName tableName, Map<String, Object> analyzeProperties)
)

@tangjiangling tangjiangling added the needs-docs This pull request requires changes to the documentation label Apr 20, 2022
@tangjiangling tangjiangling force-pushed the add-support-for-analyze-for-query-statement branch 2 times, most recently from 6f32546 to 31711db Compare April 20, 2022 06:25
@tangjiangling
Copy link
Copy Markdown
Member Author

(Rebased from master to resolve conflicts)

@tangjiangling
Copy link
Copy Markdown
Member Author

(Rebased from master to resolve conflicts)

@tangjiangling
Copy link
Copy Markdown
Member Author

@findepi If you are available, please take a look at this PR.

@tangjiangling tangjiangling force-pushed the add-support-for-analyze-for-query-statement branch from d0b314c to bf585ab Compare May 11, 2022 08:56
@tangjiangling tangjiangling force-pushed the add-support-for-analyze-for-query-statement branch from bf585ab to 5fffdac Compare May 11, 2022 09:13
@findepi findepi marked this pull request as draft May 11, 2022 10:50
@findepi
Copy link
Copy Markdown
Member

findepi commented May 11, 2022

Marked this as a draft until #11517 (comment) is addressed.


// verify the target tables exists, and it's not all views
if (tableNames.isEmpty()) {
throw semanticException(NOT_SUPPORTED, node, "Analyzing views is not supported");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this situation also occurs if you run a query like

ANALYZE FOR SELECT 123

So this error should probably be something like "No tables to analyze"

.map(TableScanNode::getTable)
.forEach(tableHandle -> {
QualifiedObjectName tableName = metadata.getTableMetadata(session, tableHandle).getQualifiedName();
if (!viewNames.contains(tableName)) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this logic? A TableScanNode will contain a TableHandle which is for the underlying table of a view. It should never itself be a view.

* The returned table handle can contain information in tupleDomain.
*/
@Nullable
default ConnectorTableHandle getTableHandleForStatisticsCollection(ConnectorSession session, SchemaTableName tableName, TupleDomain<ColumnHandle> tupleDomain)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a PR #12388 that changes the analyze API to return ConnectorAnalyzeMetadata. We should make this new method use the new return type. cc @findepi

@colebow
Copy link
Copy Markdown
Member

colebow commented Mar 30, 2023

👋 @tangjiangling - this PR is inactive and doesn't seem to be under development. If you'd like to continue work on this at any point in the future, feel free to re-open.

@colebow colebow closed this Mar 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed enhancement New feature or request needs-docs This pull request requires changes to the documentation syntax-needs-review

Development

Successfully merging this pull request may close these issues.

Add support for ANALYZE FOR SELECT ...

4 participants