Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: preserve qualifiers when rewriting expressions #12341

Merged
merged 3 commits into from
Sep 6, 2024

Conversation

jonahgao
Copy link
Member

@jonahgao jonahgao commented Sep 5, 2024

Which issue does this PR close?

A more general fix for #12183

Rationale for this change

The two expressions in the following code have the same schema_names, so the NamePreserver will not take effect when expr1 is rewritten to expr2. But they will produce different fields, which will result in schema changes.

Therefore, the NamePreserver needs to preserve both the qualifier and the schema name to ensure that the rewritten expression has the same to_field() result as before.

   #[test]
    fn test_expr_schema_name() {
        let schema = Schema::new(vec![
            Field::new("a", DataType::Int32, true),
            Field::new("test.a", DataType::Utf8, true),
        ]);
        let dfschema = DFSchema::try_from_qualified_schema("test", &schema).unwrap();
        let expr1 = Expr::Column(Column::new_unqualified("test.a"));
        let expr2 = Expr::Column(Column::new(Some("test"), "a"));

        assert_eq!(
            expr1.schema_name().to_string(),
            expr2.schema_name().to_string()
        );

        let field1 = expr1.to_field(&dfschema).unwrap();
        let field2 = expr2.to_field(&dfschema).unwrap();

        // field1: qualifier: None, name: "test.a"
        println!(
            "field1: qualifier: {:?}, name: {:?}",
            field1.0,
            field1.1.name()
        );
        // field2: qualifier: Some(Bare { table: "test" }), name: "a"
        println!(
            "field2: qualifier: {:?}, name: {:?}",
            field2.0,
            field2.1.name()
        );
    }

What changes are included in this PR?

Are these changes tested?

Yes

Are there any user-facing changes?

No

@github-actions github-actions bot added logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt) labels Sep 5, 2024
@@ -336,12 +338,14 @@ impl NamePreserver {
}

impl SavedName {
/// Ensures the name of the rewritten expression is preserved
/// Ensures the qualified name of the rewritten expression is preserved
pub fn restore(self, expr: Expr) -> Result<Expr> {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return value no longer needs Result, but removing it breaks many optimization rules. Maybe we can handle it in the next PR.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a very nice fix to me -- thank you @jonahgao

cc @JasonLi-cn

@@ -303,9 +303,11 @@ pub struct NamePreserver {
use_alias: bool,
}

type QualifiedName = (Option<TableReference>, String);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would personally suggest creating an enum to make this more explicit (rather than two level of options)-- perhaps something like

pub enum SavedName {
  ///  name is not preserved
  None, 
  /// qualified name is preserved
  Saved {
    relation: QualifiedName,
    name: String,
  }
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

pub fn restore(self, expr: Expr) -> Result<Expr> {
let Self(original_name) = self;
match original_name {
Some(name) => expr.alias_if_changed(name),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I seems like we could potentially also remove Expr::alias_if_changed which doesn't properly account for qualifiers 🤔

The only other use of it seems to be in

pub fn rewrite_preserving_name<R>(expr: Expr, rewriter: &mut R) -> Result<Expr>
which itself is only used in tests 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alias_if_changed is being used by substrait. Maybe we can review this later.

rewrite_preserving_name has been removed.


let original_name = expr_from.schema_name().to_string();
let new_name = expr.schema_name().to_string();
let saved_name = NamePreserver { use_alias: true }.save(&expr_from).unwrap();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests use NamePreserver instead of rewrite_preserving_name.

@Dandandan Dandandan merged commit a444528 into apache:main Sep 6, 2024
24 checks passed
@jonahgao
Copy link
Member Author

jonahgao commented Sep 6, 2024

Thanks @JasonLi-cn @alamb @Dandandan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants