Skip to content

feat: Preserve selectedUser (identity) in Write Queries (#27360)#27360

Merged
shelton408 merged 1 commit intoprestodb:masterfrom
shelton408:export-D97002318
Mar 20, 2026
Merged

feat: Preserve selectedUser (identity) in Write Queries (#27360)#27360
shelton408 merged 1 commit intoprestodb:masterfrom
shelton408:export-D97002318

Conversation

@shelton408
Copy link
Copy Markdown
Contributor

@shelton408 shelton408 commented Mar 17, 2026

Description

WARNING: I modified thrift2json.py for presto-native thrift generation. Thrift generation was having trouble reading structured annotations so I had it ignore them, but I'm not sure if this is a correct change. Didn't include the thrift2json change in this though

The session identity fields selectedUser and reasonForSelect are dropped in session representation, causing them to be lost for queries like Insert and Delete which use the SessionRepresentation as their session.

Meta Internal Differential Revision: D97002318
Meta Internal Review by: gmh

Changes:

Update Session/Session Representation to include selectedUser and reasonForSelect in the conversion methods.
Include both as thrift fields in SessionRepresentation
2.CPP protocol and thrift updates to match the change to session representation thrift fields.
thrift2json.py change to ignore certain annotations *** not sure if this should be committed.
Included the fields in conversion back to Session for Spark as well for consistency.
Meta Internal review by: spershin
Meta Internal Differential Revision: D92632990

Motivation and Context

When plumbing identity to metastore for write/coordinator related queries (e.g. Insert/Delete), we convert Session to a SessionRepresentation for serialization, however this representation uses Optional.Empty as placeholders for identity.

The identity is lost in the following steps
1.session is created with user identity
2. serialize session representation, drops identity in this serialization
3. deserialize the session representation and use this as the session for the query
4. uses deserialized session representation in metastore

This poses an issue in cases where we want to pass the selected identity to an external metastore service for authentication.

Impact

selectedUser identity is now properly contained in Session during Insert and Delete queries, allowing it to be propogated e.g. through metastore context to metastore api calls.

Test Plan

Tested in Meta internal systems with user identity propogation showing that identity is correctly passed for Insert and Delete queries (previously only propogated for CREATE TABLE, SELECT, etc).
presto test snapshots INSERT INTO test shelton dolete (coll col2) VALUES (6 6) Timit 100
Pasted Graphic 1

Contributor checklist

Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
Documented new properties (with its default value), SQL syntax, functions, or other functionality.
If release notes are required, they follow the release notes guidelines.
Adequate tests were added if applicable.
CI passed.
Summary by Sourcery
Preserve the selected user identity and selection reason across session serialization, including thrift and JSON representations, so that identity is available to downstream components and native execution paths.

New Features:

Add selectedUser and reasonForSelect fields to SessionRepresentation and propagate them through Java, C++ protocol, thrift, and Spark session creation.
Enhancements:

Extend thrift2json preprocessing to skip unsupported structured annotations and related includes to allow thrift generation to succeed on newer annotated IDL files.
Release Notes
Please follow release notes guidelines and fill in the release notes below.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* Update Session to serialize and deserialize selectedUser and reasonForSelect to SessionRepresentation, allowing INSERT and DELETE query sessions to contain these fields

@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai bot commented Mar 17, 2026

Reviewer's Guide

Extends SessionRepresentation across Java, C++ protocol, Thrift, and Spark integration to carry selectedUser and reasonForSelect so identity is preserved through serialization/deserialization and into native/Spark paths, and adjusts the thrift2json preprocessing script to skip structured annotations that were breaking codegen.

Sequence diagram for preserving identity through SessionRepresentation and Thrift

sequenceDiagram
  participant Client
  participant Coordinator
  participant Session
  participant SessionRepresentation
  participant Thrift
  participant NativeWorker
  participant Metastore

  Client->>Coordinator: Submit write query
  Coordinator->>Session: Create Session with Identity
  Session-->>Session: Identity holds selectedUser,reasonForSelect

  Coordinator->>Session: toSessionRepresentation()
  Session->>SessionRepresentation: construct with selectedUser,reasonForSelect
  SessionRepresentation-->>Coordinator: SessionRepresentation (selectedUser,reasonForSelect set)

  Coordinator->>Thrift: Serialize SessionRepresentation
  Thrift-->>Coordinator: Thrift SessionRepresentation (fields selectedUser,reasonForSelect)

  Coordinator->>NativeWorker: Send Thrift SessionRepresentation
  NativeWorker->>Thrift: Deserialize Thrift to protocol SessionRepresentation
  Thrift-->>NativeWorker: Protocol SessionRepresentation(selectedUser,reasonForSelect)

  NativeWorker->>SessionRepresentation: toSession(sessionPropertyManager,extraCredentials,extraAuthenticators)
  SessionRepresentation->>Session: construct Session with Identity(selectedUser,reasonForSelect)
  Session-->>NativeWorker: Session with preserved Identity

  NativeWorker->>Metastore: Authenticate using Identity.selectedUser and Identity.reasonForSelect
Loading

File-Level Changes

Change Details Files
Extend Java SessionRepresentation to carry selectedUser and reasonForSelect and wire them through Session conversion.
  • Add Optional selectedUser and Optional reasonForSelect fields to SessionRepresentation with Thrift/JSON annotations and null-safe initialization.
  • Expose new fields via @ThriftField/@JsonProperty getters with new field ids 24 and 25.
  • Include selectedUser and reasonForSelect when converting Session to SessionRepresentation using identity.getSelectedUser()/getReasonForSelect().
  • Propagate selectedUser and reasonForSelect back into Session.Identity when reconstructing Session from SessionRepresentation, including Spark-specific session creation.
presto-main-base/src/main/java/com/facebook/presto/SessionRepresentation.java
presto-main-base/src/main/java/com/facebook/presto/Session.java
presto-spark-base/src/main/java/com/facebook/presto/spark/execution/task/PrestoSparkTaskExecutorFactory.java
Update Thrift schema and native C++ protocol bridges so SessionRepresentation thrift/json includes selectedUser and reasonForSelect.
  • Add optional selectedUser and reasonForSelect fields to Thrift SessionRepresentation struct with ids 24 and 25.
  • Extend C++ SessionRepresentation struct to include selectedUser and reasonForSelect as shared_ptr.
  • Serialize/deserialize selectedUser and reasonForSelect in C++ JSON marshalling helpers to_json/from_json for SessionRepresentation.
  • Map selectedUser and reasonForSelect in ProtocolToThrift::toThrift/fromThrift between protocol SessionRepresentation and thrift SessionRepresentation.
presto-native-execution/presto_cpp/main/thrift/presto_thrift.thrift
presto-native-execution/presto_cpp/presto_protocol/core/presto_protocol_core.h
presto-native-execution/presto_cpp/presto_protocol/core/presto_protocol_core.cpp
presto-native-execution/presto_cpp/main/thrift/ProtocolToThrift.cpp
Relax thrift2json preprocessing to skip structured annotations and includes that the generator cannot parse.
  • Extend thrift2json.py preprocess step to skip lines containing structured annotations of the form @ns.Name{...}.
  • Skip includes of thrift/annotation/* files which were causing the ptsd_jbroll tool to fail while leaving existing drift.recursive_reference stripping logic intact.
presto-native-execution/presto_cpp/main/thrift/thrift2json.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • The thrift2json.py change currently drops all structured annotations and any include under thrift/annotation; consider tightening the regex / include filter or scoping it to the specific unsupported annotation patterns so that future annotations or unrelated includes aren’t silently discarded.
  • The regex r"\s*@\w+\.\w+\{" in thrift2json.py assumes a simple ns.name{ form with no spaces or additional namespace segments; if newer thrift annotations use different formatting (e.g. ns.sub.name {), this may fail to strip them and reintroduce the original generation issue—worth making the pattern more robust or documenting the exact expected shapes.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The `thrift2json.py` change currently drops all structured annotations and any include under `thrift/annotation`; consider tightening the regex / include filter or scoping it to the specific unsupported annotation patterns so that future annotations or unrelated includes aren’t silently discarded.
- The regex `r"\s*@\w+\.\w+\{"` in `thrift2json.py` assumes a simple `ns.name{` form with no spaces or additional namespace segments; if newer thrift annotations use different formatting (e.g. `ns.sub.name {`), this may fail to strip them and reintroduce the original generation issue—worth making the pattern more robust or documenting the exact expected shapes.

## Individual Comments

### Comment 1
<location path="presto-native-execution/presto_cpp/main/thrift/thrift2json.py" line_range="41-42" />
<code_context>
         lines = file.readlines()
     modified_lines = []
     for line in lines:
+        # Skip structured annotations and their includes that ptsd_jbroll cannot parse.
+        if re.match(r"\s*@\w+\.\w+\{", line):
+            continue
+        if re.match(r'\s*include\s+"thrift/annotation/', line):
</code_context>
<issue_to_address>
**issue (bug_risk):** Structured annotation skipping only removes the first line, leaving the body which may still break the parser.

To avoid leaving invalid lines for `ptsd_jbroll`, update the logic to skip the entire structured annotation block—for example, continue reading lines until the matching closing `}` is found, or at least skip subsequent indented lines until a blank or non-indented line is reached.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +41 to +42
# Skip structured annotations and their includes that ptsd_jbroll cannot parse.
if re.match(r"\s*@\w+\.\w+\{", line):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Structured annotation skipping only removes the first line, leaving the body which may still break the parser.

To avoid leaving invalid lines for ptsd_jbroll, update the logic to skip the entire structured annotation block—for example, continue reading lines until the matching closing } is found, or at least skip subsequent indented lines until a blank or non-indented line is reached.

@shelton408 shelton408 changed the title Preserve Identity in Session Representation feat: Preserve Identity in Session Representation Mar 17, 2026
@shelton408 shelton408 changed the title feat: Preserve Identity in Session Representation feat: Preserve selectedUser (identity) in Session Representation Mar 17, 2026
@shelton408 shelton408 changed the title feat: Preserve selectedUser (identity) in Session Representation feat: Preserve selectedUser (identity) in Write Queries Mar 17, 2026
feilong-liu
feilong-liu previously approved these changes Mar 17, 2026
shelton408 added a commit to shelton408/presto that referenced this pull request Mar 18, 2026
Summary:
Pull Request resolved: prestodb#27360

WARNING: i ignored structured annotations using thrift2json.py for presto-native in this PR. Thrift generation was having trouble reading certain annotations without this, but I'm not sure if this is will cause any issues in protocol generation. If anyone has context please let me know.

When plumbing identity to metastore for write/coordinator related queries (e.g. Insert/Delete), we convert Session to a SessionRepresentation for serialization, however this representation uses Optional.Empty as placeholders for identity.

The identity is lost in the following steps
1.session is created with user identity
2. serialize session representation, drops identity in this serialization
3. deserialize the session representation and use this as the session for the query
4. uses deserialized session representation in metastore

This poses an issue in cases where we want to pass the selected identity to an external metastore service for authentication.

Changes:
Update Session/Session Representation to include selectedUser and reasonForSelect in the conversion methods.
Include both as thrift fields in SessionRepresentation
CPP protocol and thrift updates to match the change to session representation thrift fields.
thrift2json.py change to ignore certain annotations *** not sure if this should be committed.
Included the fields in conversion back to Session for Spark as well for consistency.

Differential Revision: D97002318
@meta-codesync meta-codesync bot changed the title feat: Preserve selectedUser (identity) in Write Queries Preserve Identity in Session Representation (#27360) Mar 18, 2026
@shelton408 shelton408 requested review from a team and shrinidhijoshi as code owners March 18, 2026 23:57
Summary:
Pull Request resolved: prestodb#27360

WARNING: i ignored structured annotations using thrift2json.py for presto-native in this PR. Thrift generation was having trouble reading certain annotations without this, but I'm not sure if this is will cause any issues in protocol generation. If anyone has context please let me know.

When plumbing identity to metastore for write/coordinator related queries (e.g. Insert/Delete), we convert Session to a SessionRepresentation for serialization, however this representation uses Optional.Empty as placeholders for identity.

The identity is lost in the following steps
1.session is created with user identity
2. serialize session representation, drops identity in this serialization
3. deserialize the session representation and use this as the session for the query
4. uses deserialized session representation in metastore

This poses an issue in cases where we want to pass the selected identity to an external metastore service for authentication.

Changes:
Update Session/Session Representation to include selectedUser and reasonForSelect in the conversion methods.
Include both as thrift fields in SessionRepresentation
CPP protocol and thrift updates to match the change to session representation thrift fields.
thrift2json.py change to ignore certain annotations *** not sure if this should be committed.
Included the fields in conversion back to Session for Spark as well for consistency.

Differential Revision: D97002318
@shelton408 shelton408 changed the title Preserve Identity in Session Representation (#27360) feat: Preserve selectedUser (identity) in Write Queries (#27360) Mar 19, 2026
@shelton408 shelton408 merged commit 81ea0f1 into prestodb:master Mar 20, 2026
115 of 127 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants