Skip to content

Conversation

@cookiedough77
Copy link
Contributor

What changes were proposed in this pull request?

This PR fixes the bug where quoted names were being set in the proto response fields for catalog, database, and identifier.

Changes include:

  • Removing the use of quoteIdentifier when populating resolvedCatalog, resolvedDb, and resolvedId in the proto response.
  • Updating the code to return the raw catalog, database, and identifier values from CatalystIdentifier instead.
  • Adjusting unit tests in SparkDeclarativePipelinesServerSuite to validate that proto fields contain unquoted strings, even when names contain special characters like backticks.

Why are the changes needed?

Quoted names are only required when constructing fully qualified string identifiers (e.g., cat.db.table). Proto messages keep catalog, database, and identifier as separate fields and should store the raw values. Returning quoted names in proto fields was a bug and could cause inconsistencies in client usage.

Does this PR introduce any user-facing change?

Yes. The proto responses for dataset and flow definitions will now return raw catalog, database, and identifier values instead of quoted versions. For example, a catalog named ab will be returned as a\bin the proto field, not ``ab .

How was this patch tested?

Updated SparkDeclarativePipelinesServerSuite to verify that proto fields contain raw, unquoted names.

Was this patch authored or co-authored using generative AI tooling?

No.

@cookiedough77 cookiedough77 changed the title Use unquoted for response fields [SPARK-53593][SDP] Fix: Use unquoted for response fields Sep 29, 2025
Copy link
Contributor

@sryza sryza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@sryza sryza closed this in 5dc061c Sep 30, 2025
huangxiaopingRD pushed a commit to huangxiaopingRD/spark that referenced this pull request Nov 25, 2025
### What changes were proposed in this pull request?
This PR fixes the bug where quoted names were being set in the proto response fields for catalog, database, and identifier.

Changes include:
- Removing the use of quoteIdentifier when populating resolvedCatalog, resolvedDb, and resolvedId in the proto response.
- Updating the code to return the raw catalog, database, and identifier values from CatalystIdentifier instead.
- Adjusting unit tests in SparkDeclarativePipelinesServerSuite to validate that proto fields contain unquoted strings, even when names contain special characters like backticks.

### Why are the changes needed?
Quoted names are only required when constructing fully qualified string identifiers (e.g., `cat`.`db`.`table`). Proto messages keep catalog, database, and identifier as separate fields and should store the raw values. Returning quoted names in proto fields was a bug and could cause inconsistencies in client usage.

### Does this PR introduce _any_ user-facing change?
Yes. The proto responses for dataset and flow definitions will now return raw catalog, database, and identifier values instead of quoted versions. For example, a catalog named a`b will be returned as a\bin the proto field, not ``ab` .

### How was this patch tested?
Updated SparkDeclarativePipelinesServerSuite to verify that proto fields contain raw, unquoted names.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#52483 from cookiedough77/jessie.luo-data/use-unquoted-for-resolvedidentifier.

Authored-by: Jessie Luo <[email protected]>
Signed-off-by: Sandy Ryza <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants