Allow users to estimate query cost before executing it#8172
Allow users to estimate query cost before executing it#8172betodealmeida merged 12 commits intoapache:masterfrom
Conversation
etr2460
left a comment
There was a problem hiding this comment.
a bunch of comments, also @john-bodley or @villebro should probably double check the db engine specs stuff
superset/migrations/versions/8786d6374caa_add_column_for_query_estimate.py
Show resolved
Hide resolved
|
@etr2460, I addressed all the comments. I removed the DB migration, and the feature is enabled per DB in extras. Also, I use the DB version to determine if it's supported. |
|
|
||
| prefixes = ["K", "M", "G", "T", "P", "E", "Z", "Y"] | ||
| prefix = "" | ||
| to_next_prefix = 1000 |
There was a problem hiding this comment.
commenting again, shouldn't this be 1024? And we should make it a const
There was a problem hiding this comment.
Sorry, I replied to your comment but I think I resolved the conversation. This is used not just for bytes, but also for cpu and network cost, so 1000 is the correct unit. Also, 1000 is the correct unit for the prefixes K, M, G, etc. For 1024 the prefixes are Ki, Mi, Gi.
Eg, 1024 B = 1 KiB = 1.024 KB.
There was a problem hiding this comment.
I had the same concern as @etr2460 , learned something new here (had to google to double check) 👍
superset/db_engine_specs/presto.py
Outdated
|
|
||
| db_engine_spec.execute(cursor, sql) | ||
| polled = cursor.poll() | ||
| while polled: |
There was a problem hiding this comment.
Is this the only way to tell if the query is finished? This seems a little sketchy, can we not pass a callback or something on success?
There was a problem hiding this comment.
Yeah, let me simplify this.
superset/db_engine_specs/presto.py
Outdated
| result = json.loads(first) | ||
| estimate = result["estimate"] | ||
|
|
||
| def humanize(value, suffix): |
superset/models/core.py
Outdated
| return self.db_engine_spec.allows_subqueries | ||
|
|
||
| @property | ||
| def allows_cost_estimate(self): |
superset/views/core.py
Outdated
| @expose("/estimate_query_cost/<database_id>/", methods=["POST"]) | ||
| @expose("/estimate_query_cost/<database_id>/<schema>/", methods=["POST"]) | ||
| @event_logger.log_this | ||
| def estimate_query_cost(self, database_id, schema=None): |
superset/db_engine_specs/base.py
Outdated
| try_remove_schema_from_table_name = True | ||
|
|
||
| @classmethod | ||
| def get_allow_cost_estimate(cls, version=None): |
|
@etr2460, I added types and cleaned up the query execution. |
etr2460
left a comment
There was a problem hiding this comment.
Would you mind making the test plan a little more robust? Test with the feature flag both enabled and disabled, with presto dbs that are configured at a passing version and prior version? With non presto dbs? I'm sure you've tested other cases, but right now the test plan only references the happy path, so a bit more detail would be great.
other than that and my 2 comments here, this lgtm! I'll approve to unblock
CATEGORY
Choose one
SUMMARY
We currently added to Presto support for estimating the number of bytes scanned (trinodb/trino#806), and we'd like to surface that information to SQL Lab users before they actually run a query.
This PR extends the DB specs with an
allows_cost_estimateattribute and associated methods, allowing pre-execution costs to be computed from DBs that support it.In order to use it, the feature flag
ESTIMATE_QUERY_COSTmust be enabled, and it needs to be explicitly turned on for each database that supports query cost estimation. When all those conditions are met, a new button will show up in SQL Lab, allowing users to run cost estimates for the whole query or for the selected SQL.BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
DBs where the feature is not supported or not enabled are unmodified:
Here's a Presto DB:
Waiting for results:
The result:
And how errors (timeout, syntax errors) are surfaced:
TEST PLAN
Tested with a Presto cluster that supports query cost estimation, running version 0.319 and with the feature enabled via extra:
{ "version": "0.319", "cost_estimate_enabled": true, "metadata_params": {}, "engine_params": { "connect_args": { "protocol":"https", "source":"superset" } }, "metadata_cache_timeout": {"schema_cache_timeout": 86400, "table_cache_timeout": 86400}, "default_schemas": ["core", "default"] }Additionally, I tested:
ADDITIONAL INFORMATION
REVIEWERS
@etr2460 @mistercrunch