-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(sql lab): display presto and trino tracking url #20799
Conversation
67595f3
to
fe967b8
Compare
Codecov Report
@@ Coverage Diff @@
## master #20799 +/- ##
==========================================
+ Coverage 66.23% 66.24% +0.01%
==========================================
Files 1757 1757
Lines 66978 67031 +53
Branches 7117 7118 +1
==========================================
+ Hits 44360 44408 +48
- Misses 20801 20806 +5
Partials 1817 1817
Flags with carried forward coverage won't be shown. Click here to find out more.
Help us with your feedback. Take ten seconds to tell us how you rate us. |
6068fee
to
6f74d4a
Compare
@@ -279,6 +284,27 @@ def default_endpoint(self) -> str: | |||
def get_extra_cache_keys(query_obj: Dict[str, Any]) -> List[str]: | |||
return [] | |||
|
|||
@property | |||
def tracking_url(self) -> Optional[str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this just be a method called get_tracking_url
and then you wouldn't need to rename the column virtually or have a getter/setter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want the FAB-based API to always return the transformed url but I couldn't find a clean way to do it without adding an (unnecessary) new field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dpgaspar @betodealmeida do you have any recommendation on what's the best practice for such use case?
def handle_cursor(cls, cursor: Any, query: Query, session: Session) -> None: | ||
def get_tracking_url(cls, cursor: "Cursor") -> Optional[str]: | ||
try: | ||
return cursor.info_uri |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
info_uri
is only available in the latest version of the Trino client, which we haven't upgrade to yet.
@@ -279,6 +284,27 @@ def default_endpoint(self) -> str: | |||
def get_extra_cache_keys(query_obj: Dict[str, Any]) -> List[str]: | |||
return [] | |||
|
|||
@property | |||
def tracking_url(self) -> Optional[str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want the FAB-based API to always return the transformed url but I couldn't find a clean way to do it without adding an (unnecessary) new field.
afe9f3e
to
07f3687
Compare
@@ -25,7 +25,7 @@ | |||
MSG_FORMAT = "Failed to execute {}" | |||
|
|||
if TYPE_CHECKING: | |||
from superset.utils.sqllab_execution_context import SqlJsonExecutionContext | |||
from superset.sqllab.sqllab_execution_context import SqlJsonExecutionContext |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bycatch: wrong import
07f3687
to
a7cb2fd
Compare
@john-bodley I addressed your comments and added some integration tests. Can you take another look? |
a7cb2fd
to
83d165f
Compare
# pylint: disable=protected-access, line-too-long | ||
return f"{cursor._protocol}://{cursor._host}:{cursor._port}/ui/query.html?{cursor.last_query_id}" | ||
except AttributeError: | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pass | |
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Final return is expected to be explicit if you have a return statement earlier. I think there is a pylint or mypy warning for this.
return f"{cursor._protocol}://{cursor._host}:{cursor._port}/ui/query.html?{cursor.last_query_id}" | ||
except AttributeError: | ||
pass | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return None |
# pylint: disable=protected-access, line-too-long | ||
return f"{conn.http_scheme}://{conn.host}:{conn.port}/ui/query.html?{cursor._query.query_id}" | ||
except AttributeError: | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See previous comment.
if tracking_url: | ||
job_id = tracking_url.split("/")[-2] | ||
logger.info( | ||
"Query %s: Found the tracking url: %s", | ||
str(query_id), | ||
tracking_url, | ||
) | ||
transformer = current_app.config["TRACKING_URL_TRANSFORMER"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ktmud per you PR description this logic has changed. Can we confirmed that the previous behavior is preserved? I just want to confirm this isn't a breaking change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
83d165f
to
11a0658
Compare
).start_time | ||
# Test search queries on time filter | ||
from_time = "from={}".format(int(first_query_time)) | ||
to_time = "to={}".format(int(second_query_time)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bycatch: this test is flaky. [QUERY_1, QUERY_2, QUERY_3]
are executed consequently, with supposedly increasing start_time
.
Since the original code int(...)
basically rounded down the float epoch, the actual query results returned [QUERY_1, QUERY_2]
---even though we filtered second_query_time
by QUERY_3
. But this still sometimes fail because the rounded down time can be anywhere between QUERY_1 start time and QUERY_3 start time, so if QUERY_1 and QUERY_2 are executed within the same second, we wouldn't have QUERY_2 in the filtered results. E.g. when the start time for the queries are 1.1, 1.5, 2
---we'd be filtered to between [1, 1]
.
84ded7a
to
d400ac4
Compare
resp = self.get_resp("/superset/search_queries?" + "&".join(params)) | ||
data = json.loads(resp) | ||
self.assertEqual(2, len(data)) | ||
# pylint: disable=line-too-long |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add this at the end of line #420, otherwise I think it disables the error for the entire block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reformatted it so pylint disable isn't necessary.
SUMMARY
Enable tracking URL for Presto and Trino (previously it is only available for Hive queries) and also change
TRACKING_URL_TRANSFORMER
to run at runtime (as opposed to when queries are fetched) so that we can display different tracking URL based on query age (most tracking URLs will expire after a certain amount of time---in Trino/Preso, this is configured byquery.min-expire-age
.Had to update query execution logics to add end time for failed executions, too.
While debugging, I also realized the TIMED_OUT status isn't actually in use---and now the frontend also cannot handle that status. We should probably add it back, but let's save it for another PR.
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
When queries are still running:
We will now display the tracking url even for failed jobs, this is useful for users to debug their failed queries:
TESTING INSTRUCTIONS
TRACKING_URL_TRANSFORMER
. For example, following setting will only display tracking urls for queries finished within the last 5 minutes:ADDITIONAL INFORMATION