Add anonymized query plan in json format to QueryCompletedEvent#12968
Add anonymized query plan in json format to QueryCompletedEvent#12968sopel39 merged 7 commits intotrinodb:masterfrom gaurav8297:gaurav8297/anonymized_plan_new
Conversation
raunaqmorarka
left a comment
There was a problem hiding this comment.
Please add examples of anonymised plans to the PR
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/PlanPrinter.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/PlanPrinter.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/PlanPrinter.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/PlanPrinter.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/SystemPartitioningHandle.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/anonymize/Anonymizer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/execution/scheduler/SqlQueryScheduler.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/TableInfoSupplier.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/execution/scheduler/SqlQueryScheduler.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/execution/TableInfo.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/PlanPrinter.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/anonymize/Anonymizer.java
Outdated
Show resolved
Hide resolved
.../trino-main/src/main/java/io/trino/sql/planner/planprinter/anonymize/HashCodeAnonymizer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/anonymize/Anonymizer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/PlanPrinter.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/PlanPrinter.java
Outdated
Show resolved
Hide resolved
ping |
|
Test failures are related, PTAL |
core/trino-main/src/main/java/io/trino/sql/analyzer/QueryExplainer.java
Outdated
Show resolved
Hide resolved
Query: |
core/trino-main/src/test/java/io/trino/sql/planner/planprinter/TestJsonRepresentation.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/PlanPrinter.java
Outdated
Show resolved
Hide resolved
.../trino-main/src/main/java/io/trino/sql/planner/planprinter/anonymize/HashCodeAnonymizer.java
Outdated
Show resolved
Hide resolved
.../trino-main/src/main/java/io/trino/sql/planner/planprinter/anonymize/HashCodeAnonymizer.java
Outdated
Show resolved
Hide resolved
.../trino-main/src/main/java/io/trino/sql/planner/planprinter/anonymize/HashCodeAnonymizer.java
Outdated
Show resolved
Hide resolved
.../trino-main/src/main/java/io/trino/sql/planner/planprinter/anonymize/HashCodeAnonymizer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/PlanPrinter.java
Outdated
Show resolved
Hide resolved
...ino-main/src/test/java/io/trino/sql/planner/planprinter/TestAnonymizeJsonRepresentation.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/anonymize/NoOpAnonymizer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/PlanPrinter.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
this should always exists, right?
There was a problem hiding this comment.
Ideally yes, but I don't know if we should assume that. In future if catalogs can be removed on the fly, I'm not sure if that assumption would continue to hold.
There was a problem hiding this comment.
In future if catalogs can be removed on the fly, I'm not sure if that assumption would continue to hold.
I don't think they can be removed for active query
There was a problem hiding this comment.
this should always exists, right?
I tried making it not optional, but there were many tests which started failing. IIRC these errors were coming in the case of an information schema.
There was a problem hiding this comment.
Ok. Could you tell why for information schema connector name is missing?
core/trino-main/src/main/java/io/trino/execution/TableInfo.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Why is this an interface? There should only be one implementation of the anonymizer, so an interface is overkill.
There was a problem hiding this comment.
One implementation performs anonymisation, the other doesn't. Could you take a look at its usage in PlanPrinter and suggest a different way if this can be improved ?
There was a problem hiding this comment.
What's the use case for including both the regular plan and the anonymized plan? Most event listeners will just forward and store the event as is, which defeats the purpose of having an anonymized plan.
There was a problem hiding this comment.
It is possible to have different use cases for regular plan and anonymised plan. E.g. Regular plan can be used to display query plan for past queries to a user in a UI. This would be user specific data that might not be made easily accessible to everyone, it's also formatted in a way that is not ideal for offline analysis.
Anonymized plan could be sent to a different downstream system which is more widely accessible and easier to work with for offline analysis.
Currently we send same event to all event listeners, so we also can't use different event listeners to get different versions of plan.
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/anonymize/Anonymizer.java
Outdated
Show resolved
Hide resolved
|
@raunaqmorarka @sopel39 PTAL |
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/HashCodeAnonymizer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/HashCodeAnonymizer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/HashCodeAnonymizer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/HashCodeAnonymizer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/HashCodeAnonymizer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/HashCodeAnonymizer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/HashCodeAnonymizer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/PlanPrinter.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/HashCodeAnonymizer.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Ok. Could you tell why for information schema connector name is missing?
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/Anonymizer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/HashCodeAnonymizer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/HashCodeAnonymizer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/HashCodeAnonymizer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/HashingAnonymizer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/HashingAnonymizer.java
Outdated
Show resolved
Hide resolved
For some reason, We don't register the system or information schema's catalog name to PS: Github is not letting me add a comment in the open thread 😞 cc @sopel39 |
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/JsonRenderer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/HashingAnonymizer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/HashingAnonymizer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/HashingAnonymizer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/HashingAnonymizer.java
Outdated
Show resolved
Hide resolved
...ino-main/src/test/java/io/trino/sql/planner/planprinter/TestAnonymizeJsonRepresentation.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
why not keep using formatHash method here?
There was a problem hiding this comment.
We've made formatHash to be a non-static method
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/PlanPrinter.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/PlanPrinter.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
nit: consider making it non static so that you don't have to pass anonymizer
There was a problem hiding this comment.
It'll be redundant to make formatAggregation static because it is being used in GraphVizPlanPrinter too.
There was a problem hiding this comment.
Anyways in PlanPrinter, formatAggregation is only used at 2-3 places.
|
Is failure related? |
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/PlanPrinter.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/HashingAnonymizer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/test/java/io/trino/sql/planner/planprinter/TestExpressionAnonymization.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/test/java/io/trino/sql/planner/planprinter/TestExpressionAnonymization.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/test/java/io/trino/sql/planner/planprinter/TestExpressionAnonymization.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/test/java/io/trino/sql/planner/planprinter/TestJsonRepresentation.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/HashingAnonymizer.java
Outdated
Show resolved
Hide resolved
|
@martint @raunaqmorarka @sopel39 Implemented counter-based anonymization |
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/JsonRenderer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/eventlistener/EventListenerManager.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/testing/TestingEventListenerManager.java
Outdated
Show resolved
Hide resolved
core/trino-spi/src/main/java/io/trino/spi/eventlistener/EventListener.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/CounterBasedAnonymizer.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
What's the purpose of this?
The AST is meant to be a closed hierarchy and not to be extended by third-parties. There's a lot of infrastructure that depends on knowing exactly what classes are part of the AST. Eventually, we'll update it to use Java 17's sealed types and this won't be possible to do at all and enforced at compile time.
There was a problem hiding this comment.
The idea here is to test the scenario that a new sub-class of Literal is added but CounterBasedAnonymizer#anonymizeExpression isn't updated to handle that new class. In this scenario, we should still anonymise the literal.
It's not strictly necessary though, let me know if you want this removed or handled differently.
There was a problem hiding this comment.
Yes, let’s remove it. It’s not a correct usage of the AST classes and will break in the future.
As long as the anonymizer fails if a new class is added (very unlikely), we’ll be able to catch that very quickly and add the relevant code, so we don’t even have to handle anonymization for the general case.
There was a problem hiding this comment.
I've removed this now and modified CounterBasedAnonymizer#anonymizeExpression to throw UnsupportedOperationException in case of un-handled Literal sub-class.
There was a problem hiding this comment.
I still have concerns about this, which I mentioned before in a related context. Anonymized expressions are not valid SQL, so we should not be trying to construct an AST out of them (effectively what this method does)
There was a problem hiding this comment.
The API of Anonymizer is String anonymize(Expression expression), so we don't really want an Anonymized expression, an anonymised string representation of the Expression will do.
This method is creating an Expression because it was convenient to use an ExpressionRewriter to anonymise literals and then use Expression#toString on the result.
I think an alternative could be that we write something similar to ExpressionFormatter#Formatter that delegates to existing formatter for all methods except the ones we want to anonymise (visitXXXLiteral).
Would that be better or is there another way that you would recommend instead ?
There was a problem hiding this comment.
That would be better. Alternatively, we should consider and explore making the expression formatter itself anonymizer-aware.
There was a problem hiding this comment.
I've updated the expression formatter such that we could use that directly instead of creating anonymized AST. PTAL @martint
This will be used in PlanPrinter to print connector name as part of table scan node in case of anonymization.
The general approach looks good now. I'll leave it up to @sopel39 to do the final review
core/trino-parser/src/main/java/io/trino/sql/ExpressionFormatter.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
We should anonymize both jsonPlan and plan. Could you create an issue + add TODO?
There was a problem hiding this comment.
We are anonymizing both plan and jsonPlan. It's just there's no test for plan in TestEventListenerBasic. In general, there's no good testing of the text plan.
core/trino-spi/src/main/java/io/trino/spi/eventlistener/EventListener.java
Outdated
Show resolved
Hide resolved
This can be used to collect anonymised plans through query event listeners and to print anonymized plans from EXPLAIN.
|
@sopel39 I've addressed comments |
| } | ||
|
|
||
| /** | ||
| * Specify whether the plan included in QueryCompletedEvent should be anonymized or not |
There was a problem hiding this comment.
nit: this should mention that both plan and jsonPlan are anonymized
Description
Add anonymized query plan in json format to QueryCompletedEvent
Implement EXPLAIN (TYPE DISTRIBUTED, FORMAT JSON)
New feature
Event listener SPI, EXPLAIN
Provides anonymised query plan in json format to event listener to enable offline analysis without leaking sensitive info.
Implements EXPLAIN (TYPE DISTRIBUTED, FORMAT JSON)
Documentation
( ) No documentation is needed.
(x) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.
Release notes
( ) No release notes entries required.
(x) Release notes entries required with the following suggested text: