Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade sqlparser-rs to 0.51.0, support new interval logic from sqlparse-rs #12222

Merged
merged 9 commits into from
Sep 15, 2024

Conversation

samuelcolvin
Copy link
Contributor

@samuelcolvin samuelcolvin commented Aug 28, 2024

Which issue does this PR close?

WIP adopt sqlparser-rs/sqlparser-rs#1398

fixes #12190

Rationale for this change

Interval parsing is a bit of a mess, this is fixed I think in sqlparser-rs/sqlparser-rs#1398, this adopts that change and removes some hacks

What changes are included in this PR?

Are these changes tested?

Yes

Are there any user-facing changes?

Not AFAIK.

Cargo.toml Outdated Show resolved Hide resolved
@findepi
Copy link
Member

findepi commented Sep 12, 2024

nit: We could have "fixes #12190" in the PR description to let the issue auto-close when PR is merged.

@github-actions github-actions bot added the core Core DataFusion crate label Sep 12, 2024
@samuelcolvin samuelcolvin marked this pull request as ready for review September 12, 2024 15:48
@github-actions github-actions bot removed the core Core DataFusion crate label Sep 12, 2024
@samuelcolvin
Copy link
Contributor Author

@alamb I believe this is ready to review/merge.

@alamb alamb changed the title Support new interval logic from sqlparse-rs Upgrade sqlparser-rs to 0.51.0, support new interval logic from sqlparse-rs Sep 12, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @samuelcolvin -- this looks great to me. ❤️

I think we should merge this after the 42.0.0 release is cut (scheduled for today) to give it "maximum bake time" on main (aka so we can test with early adopters before releasing to crates.io)

I think technically this PR also introduces a regression/change in parsing certain interval arithmetic (see comments inline) but I also think we can support the syntax again with some changing of the coercion rules.

Let's file a follow on ticket to track that work before we merge this pr (I can do so if no one beats me to it)

@@ -196,127 +195,42 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
);
}

// Only handle string exprs for now
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is great to remove all this stuff from the sql planner (as it is now in the parser)

@@ -495,11 +495,17 @@ fn test_table_references_in_plan_to_sql() {
assert_eq!(format!("{}", sql), expected_sql)
}

test("catalog.schema.table", "SELECT catalog.\"schema\".\"table\".id, catalog.\"schema\".\"table\".\"value\" FROM catalog.\"schema\".\"table\"");
test("schema.table", "SELECT \"schema\".\"table\".id, \"schema\".\"table\".\"value\" FROM \"schema\".\"table\"");
test(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a driveby (unrelated) cleanup, right? (this is fine I am just verifying my understanding)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was necessary to get tests to pass, I assume it came from some other change in sqlparser, I didn't look too hard.

But it's not just cosmetic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this mostly changes catalog to "catalog"
a driveby is usage or raw literals (which is a nice thing on its own)

@@ -1381,6 +1391,11 @@ SELECT extract(microsecond from arrow_cast('23:32:50.123456789'::time, 'Time64(N
----
50123456.789000005

query R
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

# Use `interval` SQL literal syntax with MySQL dialect

# this should fail with the generic dialect
query error DataFusion error: Error during planning: Cannot coerce arithmetic expression Interval\(MonthDayNano\) \+ Utf8 to valid types
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe technically this is a regression in that this query used to work.

However, as this PR shows, it works in a different dialect

I think it would be possible to add coercion rules to automatically coerce the Utf8 to Interval to make this query continue work. I can file a follow on ticket

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I double checked and this works today in datafusion-cli so this is technically a regression in functionality

andrewlamb@Andrews-MacBook-Pro-2:~/Software/influxdb_iox$ datafusion-cli
DataFusion CLI v41.0.0
> select interval '1' + '1' month;
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| IntervalMonthDayNano("IntervalMonthDayNano { months: 1, days: 0, nanoseconds: 0 }") + IntervalMonthDayNano("IntervalMonthDayNano { months: 1, days: 0, nanoseconds: 0 }") |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 2 mons                                                                                                                                                                    |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.008 seconds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well it "works" but it leads to many very confusing scenarios and doesn't match the behaviour of ANSI SQL or any other SQL dialect.

It's less permissive but also less ambiguos, I wouldn't call it a regression.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense. There is a workaround as well (use the mysql dialect).

@alamb
Copy link
Contributor

alamb commented Sep 12, 2024

Thank you also @findepi for the review

@alamb
Copy link
Contributor

alamb commented Sep 15, 2024

The 42.0.0 release has been cut -- let's start the code flowing!

@alamb alamb merged commit 3f0fb4a into apache:main Sep 15, 2024
29 of 30 checks passed
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
functions sql SQL Planner sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect behavior of arithmetic operations between time values
3 participants