feat: Preserve selectedUser (identity) in Write Queries (#27360)#27360
feat: Preserve selectedUser (identity) in Write Queries (#27360)#27360shelton408 merged 1 commit intoprestodb:masterfrom
Conversation
Reviewer's GuideExtends SessionRepresentation across Java, C++ protocol, Thrift, and Spark integration to carry selectedUser and reasonForSelect so identity is preserved through serialization/deserialization and into native/Spark paths, and adjusts the thrift2json preprocessing script to skip structured annotations that were breaking codegen. Sequence diagram for preserving identity through SessionRepresentation and ThriftsequenceDiagram
participant Client
participant Coordinator
participant Session
participant SessionRepresentation
participant Thrift
participant NativeWorker
participant Metastore
Client->>Coordinator: Submit write query
Coordinator->>Session: Create Session with Identity
Session-->>Session: Identity holds selectedUser,reasonForSelect
Coordinator->>Session: toSessionRepresentation()
Session->>SessionRepresentation: construct with selectedUser,reasonForSelect
SessionRepresentation-->>Coordinator: SessionRepresentation (selectedUser,reasonForSelect set)
Coordinator->>Thrift: Serialize SessionRepresentation
Thrift-->>Coordinator: Thrift SessionRepresentation (fields selectedUser,reasonForSelect)
Coordinator->>NativeWorker: Send Thrift SessionRepresentation
NativeWorker->>Thrift: Deserialize Thrift to protocol SessionRepresentation
Thrift-->>NativeWorker: Protocol SessionRepresentation(selectedUser,reasonForSelect)
NativeWorker->>SessionRepresentation: toSession(sessionPropertyManager,extraCredentials,extraAuthenticators)
SessionRepresentation->>Session: construct Session with Identity(selectedUser,reasonForSelect)
Session-->>NativeWorker: Session with preserved Identity
NativeWorker->>Metastore: Authenticate using Identity.selectedUser and Identity.reasonForSelect
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The
thrift2json.pychange currently drops all structured annotations and any include underthrift/annotation; consider tightening the regex / include filter or scoping it to the specific unsupported annotation patterns so that future annotations or unrelated includes aren’t silently discarded. - The regex
r"\s*@\w+\.\w+\{"inthrift2json.pyassumes a simplens.name{form with no spaces or additional namespace segments; if newer thrift annotations use different formatting (e.g.ns.sub.name {), this may fail to strip them and reintroduce the original generation issue—worth making the pattern more robust or documenting the exact expected shapes.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `thrift2json.py` change currently drops all structured annotations and any include under `thrift/annotation`; consider tightening the regex / include filter or scoping it to the specific unsupported annotation patterns so that future annotations or unrelated includes aren’t silently discarded.
- The regex `r"\s*@\w+\.\w+\{"` in `thrift2json.py` assumes a simple `ns.name{` form with no spaces or additional namespace segments; if newer thrift annotations use different formatting (e.g. `ns.sub.name {`), this may fail to strip them and reintroduce the original generation issue—worth making the pattern more robust or documenting the exact expected shapes.
## Individual Comments
### Comment 1
<location path="presto-native-execution/presto_cpp/main/thrift/thrift2json.py" line_range="41-42" />
<code_context>
lines = file.readlines()
modified_lines = []
for line in lines:
+ # Skip structured annotations and their includes that ptsd_jbroll cannot parse.
+ if re.match(r"\s*@\w+\.\w+\{", line):
+ continue
+ if re.match(r'\s*include\s+"thrift/annotation/', line):
</code_context>
<issue_to_address>
**issue (bug_risk):** Structured annotation skipping only removes the first line, leaving the body which may still break the parser.
To avoid leaving invalid lines for `ptsd_jbroll`, update the logic to skip the entire structured annotation block—for example, continue reading lines until the matching closing `}` is found, or at least skip subsequent indented lines until a blank or non-indented line is reached.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| # Skip structured annotations and their includes that ptsd_jbroll cannot parse. | ||
| if re.match(r"\s*@\w+\.\w+\{", line): |
There was a problem hiding this comment.
issue (bug_risk): Structured annotation skipping only removes the first line, leaving the body which may still break the parser.
To avoid leaving invalid lines for ptsd_jbroll, update the logic to skip the entire structured annotation block—for example, continue reading lines until the matching closing } is found, or at least skip subsequent indented lines until a blank or non-indented line is reached.
Summary: Pull Request resolved: prestodb#27360 WARNING: i ignored structured annotations using thrift2json.py for presto-native in this PR. Thrift generation was having trouble reading certain annotations without this, but I'm not sure if this is will cause any issues in protocol generation. If anyone has context please let me know. When plumbing identity to metastore for write/coordinator related queries (e.g. Insert/Delete), we convert Session to a SessionRepresentation for serialization, however this representation uses Optional.Empty as placeholders for identity. The identity is lost in the following steps 1.session is created with user identity 2. serialize session representation, drops identity in this serialization 3. deserialize the session representation and use this as the session for the query 4. uses deserialized session representation in metastore This poses an issue in cases where we want to pass the selected identity to an external metastore service for authentication. Changes: Update Session/Session Representation to include selectedUser and reasonForSelect in the conversion methods. Include both as thrift fields in SessionRepresentation CPP protocol and thrift updates to match the change to session representation thrift fields. thrift2json.py change to ignore certain annotations *** not sure if this should be committed. Included the fields in conversion back to Session for Spark as well for consistency. Differential Revision: D97002318
d33244c to
03a0ed2
Compare
Summary: Pull Request resolved: prestodb#27360 WARNING: i ignored structured annotations using thrift2json.py for presto-native in this PR. Thrift generation was having trouble reading certain annotations without this, but I'm not sure if this is will cause any issues in protocol generation. If anyone has context please let me know. When plumbing identity to metastore for write/coordinator related queries (e.g. Insert/Delete), we convert Session to a SessionRepresentation for serialization, however this representation uses Optional.Empty as placeholders for identity. The identity is lost in the following steps 1.session is created with user identity 2. serialize session representation, drops identity in this serialization 3. deserialize the session representation and use this as the session for the query 4. uses deserialized session representation in metastore This poses an issue in cases where we want to pass the selected identity to an external metastore service for authentication. Changes: Update Session/Session Representation to include selectedUser and reasonForSelect in the conversion methods. Include both as thrift fields in SessionRepresentation CPP protocol and thrift updates to match the change to session representation thrift fields. thrift2json.py change to ignore certain annotations *** not sure if this should be committed. Included the fields in conversion back to Session for Spark as well for consistency. Differential Revision: D97002318
03a0ed2 to
8b77f99
Compare
Description
WARNING: I modified thrift2json.py for presto-native thrift generation. Thrift generation was having trouble reading structured annotations so I had it ignore them, but I'm not sure if this is a correct change. Didn't include the thrift2json change in this though
The session identity fields selectedUser and reasonForSelect are dropped in session representation, causing them to be lost for queries like Insert and Delete which use the SessionRepresentation as their session.
Meta Internal Differential Revision: D97002318
Meta Internal Review by: gmh
Changes:
Update Session/Session Representation to include selectedUser and reasonForSelect in the conversion methods.
Include both as thrift fields in SessionRepresentation
2.CPP protocol and thrift updates to match the change to session representation thrift fields.
thrift2json.py change to ignore certain annotations *** not sure if this should be committed.
Included the fields in conversion back to Session for Spark as well for consistency.
Meta Internal review by: spershin
Meta Internal Differential Revision: D92632990
Motivation and Context
When plumbing identity to metastore for write/coordinator related queries (e.g. Insert/Delete), we convert Session to a SessionRepresentation for serialization, however this representation uses Optional.Empty as placeholders for identity.
The identity is lost in the following steps
1.session is created with user identity
2. serialize session representation, drops identity in this serialization
3. deserialize the session representation and use this as the session for the query
4. uses deserialized session representation in metastore
This poses an issue in cases where we want to pass the selected identity to an external metastore service for authentication.
Impact
selectedUser identity is now properly contained in Session during Insert and Delete queries, allowing it to be propogated e.g. through metastore context to metastore api calls.
Test Plan
Tested in Meta internal systems with user identity propogation showing that identity is correctly passed for Insert and Delete queries (previously only propogated for CREATE TABLE, SELECT, etc).
presto test snapshots INSERT INTO test shelton dolete (coll col2) VALUES (6 6) Timit 100
Pasted Graphic 1
Contributor checklist
Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
Documented new properties (with its default value), SQL syntax, functions, or other functionality.
If release notes are required, they follow the release notes guidelines.
Adequate tests were added if applicable.
CI passed.
Summary by Sourcery
Preserve the selected user identity and selection reason across session serialization, including thrift and JSON representations, so that identity is available to downstream components and native execution paths.
New Features:
Add selectedUser and reasonForSelect fields to SessionRepresentation and propagate them through Java, C++ protocol, thrift, and Spark session creation.
Enhancements:
Extend thrift2json preprocessing to skip unsupported structured annotations and related includes to allow thrift generation to succeed on newer annotated IDL files.
Release Notes
Please follow release notes guidelines and fill in the release notes below.
Release Notes
Please follow release notes guidelines and fill in the release notes below.