Add query clause information to function analysis#13885
Add query clause information to function analysis#13885assaf2 wants to merge 1 commit intotrinodb:masterfrom
Conversation
291130b to
3f6a7a7
Compare
3f6a7a7 to
1a811be
Compare
There was a problem hiding this comment.
This is used
- in CallTask -- add explicit CALL entry here
- in the deprecated RoutineInfo constructor, so let's deprecate the UNSPECIFIED as well (or maybe we can remove it, see below)
There was a problem hiding this comment.
I do think that RoutineInfo should be meant for consumption by the plugins, so ctor changes should be excempt for backward compat checks. However, we also annotated the class with JSON annotations, so we made JSON form part of the contract.
If we take backward compat seriously here, we need to allow this class to deserialize previous forms, those missing clauseInfo.
Therefore make clauseInfo an Optional field.
There was a problem hiding this comment.
The RoutineInfo is part of QueryCompletedEvent->QueryMetadata which can be used by plugins
There was a problem hiding this comment.
make
clauseInfoan Optional field.
Done
if you're not filling the PR template, please remove it from the PR description. Thanks |
There was a problem hiding this comment.
Maybe add HAVING sum(linenumber) > 1 to make sure the same function would be correctly reported twice with a different clauseInfo.
|
The PR description (nor commit description) doesn't say anything about rationale for the change. |
Added |
If you look at event listener we now have capabilities to get anonymized query plans (cc @gaurav8297). Would that suite your needs? |
1a811be to
ca6b3ac
Compare
I want to allow statistics calculation (for example how many times function X appeared in the |
IMO your specific question Why you don't want to parse plans? |
|
It's a natural extension to |
I though |
|
@sopel39 can you please elaborate on the lineage use case? |
|
@assaf2 there is a conflict, can you please rebase? |
ca6b3ac to
7616a43
Compare
7616a43 to
72d6a8f
Compare
Lineage and stats collections are very different problems. Lineage allows for audit trail (e.g. who used which functions) and tracing as data progresses thought the system I would flip the question: Why collecting of clause information for routines is that important? It seems to be one of the many questions that we can ask about plan. Are other questions (e.g. projections used, type of joins, etc...) less important? If we figure we need to collect another important metric, will we extend event listener (and add complexity) SPI again? |
Why did we come down to function granularity? Why do we want to audit who used which function? Is there a concept of authorization for running functions? Why user+function combination is "traceable data"? Who traces it through which path, and for what purpose?
My claim here is that |
Note that this change would help disambiguate repeated entries in the |
True, but we'll still see duplications in cases like |
If you do that, and you later proceed with this PR, it would be a backward incompat change. |
What's the difference between changing |
It's not intentional. Also, BTW, the purpose of RoutineInfo is to track what functions (and with what user/role) a query uses, in the same way we track which tables and with what user/role they are read. This is for auditing purposes, and is especially important when queries contain complex row filters, column masks, or depend on views. |
|
What's the purpose of tracking in which SQL-level clause a function shows up? SQL clauses have very little relevance as to whether a function will be pushed down, where it will evaluate in the query plan, etc. It would be more appropriate to track which operator they are associated with in the query plan, but then we're just talking about duplicating information that already exists in the event (namely, the query plan itself). |
I think the context is my own question -- which functions occur more often in predicates? What are the predicate pushdown capabilities we should improve next? |
Seems like one of the hypothesizes we could answer by looking at the plan itself. |
Description
Add the information from which query clause each function was called. This information can be valuable to subscribers of the event listener because each query clause has different characteristics - some are pushed down, some affect performance more than others, and so on.
Improvement
RoutineInfowhich is part ofQueryMetadatathat exposed through the event listener)Trino exposes some telemetry about queries. In particular, which functions were used. This change adds the information from which query clause each function was called.
Documentation
(X) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.
Release notes
( ) No release notes entries required.
(?) Release notes entries required with the following suggested text: