-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
INTERVAL bug (wrong endian?) #1696
Comments
Why does the problem then not occur in Python? >>> import duckdb
>>> duckdb.sql("SELECT INTERVAL '2 year'").arrow()
# pyarrow.Table
# CAST('2 year' AS INTERVAL): month_day_nano_interval
# ----
# CAST('2 year' AS INTERVAL): [[24M0d0ns]] |
Actually since both |
That's possible yeah, I'd assumed that wasm used arrow-js, but it does seem to use arrow-rs yes Max commented in Discord:
|
Looks like they did have issues here at one point too: https://github.com/apache/arrow-rs/pull/2235/files |
While possible a lifetime issue seems unlikely to me here - the problem here seems to be that 24 months are interpreted as 24 seconds, and the issue appears to be consistent. |
Seems like it is wrong for all time units. See here. |
Looks like wasm uses a very old version of arrow-rs. @carlopi perhaps we can try to upgrade this? |
Using arrow-js (basically when not using the shell infrastructure but passing arrow results directly to JS) the results are correctly parsed generated. I will look at bumping arrow-rs. |
Sorry, I said I was going to look into this but then went on vacation and been distracted with other things to get in before the release since I got back. Ill investigate more tomorrow. |
This occurs in the rust client as well, even with latest |
My guess at this point is that this is a bug within |
If @Maxxen would be up to it, I think given duckdb-rs is easier to test in isolation (also for the arrow people) and it's simpler to trust since there are less layers of things going wrong wrt duckdb-wasm's shell, I think that would work better as reproducer (and it's not on me!). |
Actually, I think we might be in the wrong here.
But our
Which we just reinterpret straight up when writing out the arrow buffer.
But thats not the right order when reinterpreting as a i128 (which is what rust is doing on the other side)
pub fn to_parts(
i: <IntervalMonthDayNanoType as ArrowPrimitiveType>::Native,
) -> (i32, i32, i64) {
let months = (i >> 96) as i32;
let days = (i >> 64) as i32;
let nanos = i as i64;
(months, days, nanos)
} Im not sure why this works in python though... |
The rust code looks wrong to me. The definition is in little-endian. It also doesn't really make sense for the Rust driver to be correct when it is the one that disagrees with everything else. Either all DuckDB, Arrow-C++, Arrow-Python, Arrow-Go, Arrow-R and Arrow-JS are wrong, or Arrow-RS is wrong. The latter seems more logical to me. |
Are you saying it was correct before this change? apache/arrow-rs#2235 |
EDIT: Disregard, it's the same as https://github.com/apache/arrow-rs/pull/2235/files#diff-c47b2e57ea72cdad6427c3149de134be3437e403984d367e3604cd0943a9a963R288, just backwards
|
@Mause The diagram is from here: https://github.com/apache/arrow-rs/blob/36a6e515f99866eae5332dfc887c6cb5f8135064/arrow-array/src/types.rs#L265-L269 I agree seems like the rust def is wrong. So what now? |
I would say we should double-check that the rust version is wrong and try to create a reproducer separate from DuckDB, then file a bug report in the arrow-rs repository. |
Here's a stand-alone C++ program showcasing the problem: #include <stdint.h>
#include <stdio.h>
using uint128_t = __uint128_t;
struct MonthDayNanos {
int32_t months;
int32_t days;
int64_t nanoseconds;
};
int main() {
MonthDayNanos ival;
ival.months = 32;
ival.days = 64;
ival.nanoseconds = 128;
uint128_t u128val = *reinterpret_cast<uint128_t *>(&ival);
printf("Rust code (incorrect)\n");
printf("months %u\n", uint32_t(u128val >> 96));
printf("days %u\n", uint32_t(u128val >> 64));
printf("nanos %llu\n", uint64_t(u128val));
printf("\n");
printf("Correct shifts\n");
printf("months %u\n", uint32_t(u128val));
printf("days %u\n", uint32_t(u128val >> 32));
printf("nanos %llu\n", uint64_t(u128val >> 64));
}
Prints:
|
Hi, is there any update on this? Should I file a bug report in arrow-rs, or what is necessary in getting this resolved? |
That's mentioned above, the Arrow C++ layout is: struct MonthDayNanos {
int32_t months;
int32_t days;
int64_t nanoseconds;
}; The Rust code for parsing that as a pub fn to_parts(
i: <IntervalMonthDayNanoType as ArrowPrimitiveType>::Native,
) -> (i32, i32, i64) {
let months = (i >> 96) as i32;
let days = (i >> 64) as i32;
let nanos = i as i64;
(months, days, nanos)
} As shown in my stand-alone program here - #1696 (comment) - those shifts produce the wrong results given the Arrow C++ layout. |
I noticed the integration tests linked involve files - perhaps the byte order for Arrow files is different from the in-memory byte order (big-endian vs little-endian)? |
Yes, sorry, I was just posting this for reference. |
No, it's probably just that arrow-rs has the same misinterpretation bug in all code paths. So when you ask it to roundtrip the data from JSON to IPC, for example, it will produce the intended IPC results. |
Makes sense, thanks for picking this up! |
Well, you'll have to thank @tustvold for that :-) |
@Mytherin brief update, about ~two weeks away: apache/arrow-rs#5654 (comment) |
Thanks for driving this @david542542! |
It looks like this has been released (or awaiting release): apache/arrow-rs#5688. Are we able to add that item into the upcoming DuckDB bug-fix release? |
Thanks everyone for pushing forward on this. I have a local branch that bumps arrow-rs version, will need to double check a few things, but ideally this can be soon fixed uptream. |
@carlopi cool, will this be in the next DuckDB release (1.1.0), or when will this fix be added in? |
What happens?
Running
SELECT INTERVAL '2 year'
(or really any otherINTERVAL
) produces the wrong output in the wasm shell. It seems like it has to do with endianness: 2 years is 24 months and here 24 is put into fractional seconds, i.e at the other end of the number.To Reproduce
You can run this query here (click to open in wasm shell).
SELECT INTERVAL '2 day'
OS:
Mac > Chrome > DuckDB Wasm Shell
DuckDB Version:
v0.10.1
DuckDB Client:
Wasm
Full Name:
David Litwin
Affiliation:
None
Have you tried this on the latest nightly build?
I have tested with a nightly build
Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?
The text was updated successfully, but these errors were encountered: