Make HiveQueryRunner#main more user friendly#11151
Conversation
There was a problem hiding this comment.
If we converge everywhere to standard, this is useful.
until then, it's a no go to be default in query runner's main.
Query runner's main must allow running test queries from the tests themselves, and tests currently rely on the simplified names.
There was a problem hiding this comment.
That's a good point. Yeah, let's keep the defaults. What do you think about commenting out these two lines so the standard naming can be easily enabled?
There was a problem hiding this comment.
I am fine with commented out lines.
There was a problem hiding this comment.
why do we think it's generally useful?
There was a problem hiding this comment.
HiveQueryRunner#main is a convenient way to start pseudo distributed cluster for testing. Having tpcds connector deploy allows to run tpcds queries and play with the tpcds dataset. Is the concern here that tpcds will be using unnecessary resources when running integration tests? If that's the case and if we think that the resources used are significant I can make this conditional and only enable it in HiveQueryRunner#main
There was a problem hiding this comment.
I am more concerned about startup time. but probably it doesn't cost much.
(i almost never use tpcds data set, so i am surprised this is useful outside of benchmarks)
There was a problem hiding this comment.
We are planning to use tpcds dataset for project Tardigrade testing
0591a49 to
8671cd0
Compare
|
Updated |
findepi
left a comment
There was a problem hiding this comment.
LGTM except "Support decimal columns in Tpch statistics provider"
There was a problem hiding this comment.
That's a weird change. The commit says "Support decimal columns in Tpch statistics provider", but it looks like removing support for it.
There was a problem hiding this comment.
Without this line it fails on DecimalType being not supported. Perhaps a better solution would be to try to return values, but that requires a bigger refactor. I need this change just to unblock running tpch queries with decimal columns enabled.
There was a problem hiding this comment.
Perhaps a better solution would be to try to return values
agreed
but that requires a bigger refactor.
i hoped not. The actual ranges should be the same, and io.trino.spi.statistics.ColumnStatistics is already "normalized" into doubles, so we should be able use the min/max values we have both for doubles and decimals.
There was a problem hiding this comment.
Hmm, you are probably right, let me take a closer look.
There was a problem hiding this comment.
Actually yeah, the stats for decimal columns are available and it was super easy to propagate them. Thanks for the suggestion. Updated.
It is useful to have tpcds connector enabled when trying to debug tpcds queries locally
To avoid dealing with roles and permissions when debugging
To allow running unmodified tpch queries when debugging
8671cd0 to
69a6841
Compare
Description
Change
HiveQueryRunner#mainconfigurationtpcdsconnector by defaulttpchschema to allow running unmodifiedtpchqueriesImprovement
Tests
Non end user visible change
Related issues, pull requests, and links
-Documentation
(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.
Release notes
(x) No release notes entries required.
( ) Release notes entries required with the following suggested text: