Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect behavior of arithmetic operations between time values #12190

Open
Abdullahsab3 opened this issue Aug 27, 2024 · 10 comments · May be fixed by #12222
Open

Incorrect behavior of arithmetic operations between time values #12190

Abdullahsab3 opened this issue Aug 27, 2024 · 10 comments · May be fixed by #12222
Labels
bug Something isn't working

Comments

@Abdullahsab3
Copy link
Contributor

Describe the bug

Consider the following query:

DataFusion CLI v41.0.0
> select ('2024-08-27T08:21:27Z'::timestamp + interval '1 day' -  interval '1 second' - interval '1 day');
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Utf8("2024-08-27T08:21:27Z") + IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 0, nanoseconds: 1000000000 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 2024-08-29T08:21:26                                                                                                                                                                                                                                                                                     |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.005 seconds.

While the expected result should be 2024-08-27 08:21:26.000000 .

When I add parentheses to the query, the results are what I expect:

> select ('2024-08-27T08:21:27Z'::timestamp + interval '1 day' -  interval '1 second') - interval '1 day';
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Utf8("2024-08-27T08:21:27Z") + IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 0, nanoseconds: 1000000000 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 2024-08-27T08:21:26                                                                                                                                                                                                                                                                                     |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.002 seconds.

The issue seems to happening when you have multiple (more than 2?) operands in arithmetic. e.g.:

> select ('2024-08-27T08:21:27Z'::timestamp + interval '1 day' -  interval '1 day' - interval '2 day');
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Utf8("2024-08-27T08:21:27Z") + IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 2, nanoseconds: 0 }") |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 2024-08-29T08:21:27                          -- Should be     2024-08-25 08:21:27.000000                                                                                                                                                                                                                              |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.001 seconds.

When you have two operands, the results seem to match what you'd expect:

> select ('2024-08-27T08:21:27Z'::timestamp + interval '1 day' - interval '2 day');
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Utf8("2024-08-27T08:21:27Z") + IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 2, nanoseconds: 0 }") |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 2024-08-26T08:21:27                                                                                                                                                                                      |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.002 seconds.

To Reproduce

You can use the same queries mentioned above.

Expected behavior

Mentioned above as well

Additional context

No response

@Abdullahsab3 Abdullahsab3 added the bug Something isn't working label Aug 27, 2024
@tshauck
Copy link
Contributor

tshauck commented Aug 29, 2024

I don't know the root cause, but I'd just add it seems to to where there's subtraction. E.g.

> select interval '1 day' + interval '2 day' + interval '1 day' as i;
+-------------------------------------------------------+
| i                                                     |
+-------------------------------------------------------+
| 0 years 0 mons 4 days 0 hours 0 mins 0.000000000 secs |
+-------------------------------------------------------+

works, but

> select interval '1 day' - interval '2 day' - interval '1 day' as i;
+-------------------------------------------------------+
| i                                                     |
+-------------------------------------------------------+
| 0 years 0 mons 0 days 0 hours 0 mins 0.000000000 secs |
+-------------------------------------------------------+
1 row(s) fetched. 
Elapsed 0.007 seconds.

doesn't. I wasn't able to find a purely add case that doesn't produce the correct answer (though doesn't necessarily rule it out).

@Abdullahsab3
Copy link
Contributor Author

Just an educated guess, but could it be that the arithmetic operators are not passed correctly from the parsed AST, or that the AST is not constructed correctly? Not entirely sure, but it seems that the first arithmetic operator is always the one that is applied on the entire expression. For example:

> select interval '2 day' - interval '1 day' + interval '1 day';
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 2, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") + IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 0 years 0 mons 0 days 0 hours 0 mins 0.000000000 secs                                                                                                                                                                                                           |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Minus is applied on all of them. I think this is the result of select interval '2 day' - interval '1 day' - interval '1 day';.

Another example:

> select interval '2 day' - interval '1 day' + interval '1 day' + interval '5 day' - interval '3 day';
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 2, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") + IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") + IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 5, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 3, nanoseconds: 0 }") |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 0 years 0 mons -2 days 0 hours 0 mins 0.000000000 secs                                                                                                                                                                                                                                                                                                                                                                                      |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.002 seconds.

which seems to be of similar behavior. The minus is applied on all of them, even transforming the latest minus into a plus. I think that's the result of select interval '2 day' - interval '1 day' - interval '1 day' - interval '5 day' + interval '3 day';

Another weird example with a plus:

> select interval '5 day' + interval '1 day' - interval '2 day' - interval '4 day';
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 5, nanoseconds: 0 }") + IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 2, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 4, nanoseconds: 0 }") |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 0 years 0 mons 8 days 0 hours 0 mins 0.000000000 secs                                                                                                                                                                                                                                                                                                 |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.001 seconds.

The results seem to correspond with select interval '5 day' + interval '1 day' - interval '2 day' + interval '4 day';

@Abdullahsab3
Copy link
Contributor Author

Abdullahsab3 commented Aug 30, 2024

Just found something that could be interesting. I think it's a bug in the AST and the precedence order of operators.

I tried explicitly defining the precedence/order with parentheses for the same queries above (trying to mimick how the precedence between the operators is preserved):
Precedence from right to left (which is not correct)

select interval '2 day' - (interval '1 day' + (interval '1 day' + (interval '5 day' - ( interval '3 day'))));
select interval '2 day' - (interval '1 day' + (interval '1 day'));
select interval '5 day' + (interval '1 day' - (interval '2 day' - (interval '4 day')));

This corresponds with the wrong results that I get from Datafusion:

DataFusion CLI v41.0.0
> select interval '2 day' - (interval '1 day' + (interval '1 day' + (interval '5 day' - ( interval '3 day'))));
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 2, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") + IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") + IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 5, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 3, nanoseconds: 0 }") |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 0 years 0 mons -2 days 0 hours 0 mins 0.000000000 secs                                                                                                                                                                                                                                                                                                                                                                                      |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.003 seconds.

> select interval '2 day' - (interval '1 day' + (interval '1 day'));
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 2, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") + IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 0 years 0 mons 0 days 0 hours 0 mins 0.000000000 secs                                                                                                                                                                                                           |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.001 seconds.

> select interval '5 day' + (interval '1 day' - (interval '2 day' - (interval '4 day')));
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 5, nanoseconds: 0 }") + IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 2, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 4, nanoseconds: 0 }") |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 0 years 0 mons 8 days 0 hours 0 mins 0.000000000 secs                                                                                                                                                                                                                                                                                                 |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.001 seconds.

Precdence from left to right (which is what I think is what should be happening):

select (((interval '2 day' - interval '1 day') + interval '1 day') + interval '5 day') -  interval '3 day';
select (interval '2 day' - interval '1 day') + interval '1 day';
select ((interval '5 day' + interval '1 day') - interval '2 day') - interval '4 day';

The results of these queries match the expected results:

> select (((interval '2 day' - interval '1 day') + interval '1 day') + interval '5 day') -  interval '3 day';
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 2, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") + IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") + IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 5, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 3, nanoseconds: 0 }") |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 0 years 0 mons 4 days 0 hours 0 mins 0.000000000 secs                                                                                                                                                                                                                                                                                                                                                                                       |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.003 seconds.

> select (interval '2 day' - interval '1 day') + interval '1 day';
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 2, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") + IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 0 years 0 mons 2 days 0 hours 0 mins 0.000000000 secs                                                                                                                                                                                                           |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.001 seconds.

> select ((interval '5 day' + interval '1 day') - interval '2 day') - interval '4 day';
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 5, nanoseconds: 0 }") + IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 2, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 4, nanoseconds: 0 }") |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 0 years 0 mons 0 days 0 hours 0 mins 0.000000000 secs                                                                                                                                                                                                                                                                                                 |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.001 seconds.

@tshauck
Copy link
Contributor

tshauck commented Aug 30, 2024

I think the examples have select (interval '2 day' - interval '1 day') + interval '1 day'; backwards?

> select interval '2 day' - (interval '1 day' + (interval '1 day'));
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 2, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") + IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 0 years 0 mons 0 days 0 hours 0 mins 0.000000000 secs                                                                                                                                                                                                           |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched. 
Elapsed 0.003 seconds.

> select (interval '2 day' - interval '1 day') + interval '1 day';
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 2, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") + IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 0 years 0 mons 2 days 0 hours 0 mins 0.000000000 secs                                                                                                                                                                                                           |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched. 
Elapsed 0.007 seconds.

Otherwise, there's a potentially related example: sqlparser-rs/sqlparser-rs#1345

@findepi
Copy link
Member

findepi commented Sep 5, 2024

Just found something that could be interesting. I think it's a bug in the AST and the precedence order of operators.

Indeed, this is parsing problem.

I modified datafusion/sql/tests/cases/plan_to_sql.rs to include this:

--- datafusion/sql/tests/cases/plan_to_sql.rs
+++ datafusion/sql/tests/cases/plan_to_sql.rs
@@ -54,6 +54,12 @@ fn roundtrip_expr() {
             "sum((age * 2))",
             r#"sum((age * 2))"#,
         ),
+        (
+            TableReference::bare("person"),
+            "'2024-01-10 01:23:45'::timestamp - interval '1 day' - interval '2 day'",
+            // FIXME this is incorrect!
+            "(CAST('2024-01-10 01:23:45' AS TIMESTAMP) - (INTERVAL '0 YEARS 0 MONS 1 DAYS 0 HOURS 0 MINS 0.000000000 SECS' - INTERVAL '0 YEARS 0 MONS 2 DAYS 0 HOURS 0 MINS 0.000000000 SECS'))",
+        ),
     ];

     let roundtrip = |table, sql: &str| -> Result<String> {

DFParser just wraps sqlparser::parser::Parser, just we will need to fix this in https://github.com/sqlparser-rs/sqlparser-rs

@Abdullahsab3
Copy link
Contributor Author

I think sqlparser-rs/sqlparser-rs#1398 fixed this issue.

@findepi
Copy link
Member

findepi commented Sep 6, 2024

thank you @samuelcolvin and @alamb

@alamb
Copy link
Contributor

alamb commented Sep 9, 2024

Nice! I expect the next release of sqlparser to be released sometime this week and thus the change will be available in DataFusion as well: sqlparser-rs/sqlparser-rs#1384

@alamb
Copy link
Contributor

alamb commented Sep 11, 2024

sqlparser-rs 0.51.0 is now available on crates.io: https://crates.io/crates/sqlparser/0.51.0 🎉

@alamb
Copy link
Contributor

alamb commented Sep 12, 2024

I believe this will be fixed by #12222 from @samuelcolvin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
6 participants
@findepi @tshauck @alamb @Abdullahsab3 and others