-
-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(tracing) Add a UI and backend for trace data #2309
Conversation
Add a (very rudimentary) UI that allows a query to be run and displays the trace for that query. It will also display any error that may have arrived from running the query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of questions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, outside of a couple comments.
Were you able to test this locally e.g. with the curl command in the comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is quite large and involves a lot of refactoring that touches other functionality including core ClickHouse driver functionality. It's also hard to figure out what's changed and what's just moved around. Please consider splitting it up.
Currently you must run make build-admin
manually before pushing the bundle.js file as we do not build it in CI for prod and rely on the checked in version of this file. Currently you have pushed the dev version of the bundle.
c83062c
to
f88f7e5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some minor comments
snuba/admin/clickhouse/common.py
Outdated
if storage_key in CONNECTIONS: | ||
return CONNECTIONS[storage_key] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this will work.
The issue is that there are two separate use cases, you cannot use the same logic to retrieve a connection for the tracing tool (that needs to talk to a query node through envoy) and for the system tables tool (which needs to access individual nodes and uses the storage key just to fetch the right cluster to validate the host name/port).
Specifically:
Here the system table tool will create a connection to the required host the first time and cache it. Subsequently it will always return the same connection to the same host for the same storage, no matter which host/port the user wants. Which is wrong. If you want to cache connections for the system tables tool, the key must contain at least the host name as well.
I think you want two methods, one to return a query connection to a cluster given a storage key. And another one to take host/port and validates they are part of a storage and then return a connection to those.
The system tables tool would use the first. The tracing tool would use the second (or both if you want to get a trace from a query ran on a storage node)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I split the connection functions into two, one for system queries that cache by hostname, and one for tracing queries that cache by cluster.
Codecov Report
@@ Coverage Diff @@
## master #2309 +/- ##
==========================================
+ Coverage 92.76% 92.93% +0.17%
==========================================
Files 561 563 +2
Lines 25790 25888 +98
==========================================
+ Hits 23924 24059 +135
+ Misses 1866 1829 -37
Continue to review full report at Codecov.
|
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fyi: the readonly user is not present on the query nodes. So you will need to send a request to create it. @nikhars could help you with this.
def run_query_and_get_trace(storage_name: str, query: str) -> ClickhouseResult: | ||
validate_trace_query(query) | ||
connection = get_ro_cluster_connection(storage_name) | ||
query_result = connection.execute(query=query, capture_trace=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will run the query on one single thread. Which is good to start with.
Though later we will want to add a switch to use 1 thread vs 2 or 4 to test how the query parallelizes.
Add a (very rudimentary) UI that allows a query to be run and displays the trace
for that query. It will also display any error that may have arrived from
running the query.