Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

INTERVAL bug (wrong endian?) #1696

Closed
1 task done
david542542 opened this issue Apr 9, 2024 · 33 comments · Fixed by #1769
Closed
1 task done

INTERVAL bug (wrong endian?) #1696

david542542 opened this issue Apr 9, 2024 · 33 comments · Fixed by #1769

Comments

@david542542
Copy link

david542542 commented Apr 9, 2024

What happens?

Running SELECT INTERVAL '2 year' (or really any other INTERVAL) produces the wrong output in the wasm shell. It seems like it has to do with endianness: 2 years is 24 months and here 24 is put into fractional seconds, i.e at the other end of the number.

image

To Reproduce

You can run this query here (click to open in wasm shell).

SELECT INTERVAL '2 day'

OS:

Mac > Chrome > DuckDB Wasm Shell

DuckDB Version:

v0.10.1

DuckDB Client:

Wasm

Full Name:

David Litwin

Affiliation:

None

Have you tried this on the latest nightly build?

I have tested with a nightly build

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • Yes, I have
@david542542 david542542 changed the title INTERVAL bug INTERVAL bug (wrong endian?) Apr 9, 2024
@Mytherin Mytherin transferred this issue from duckdb/duckdb Apr 9, 2024
@Mause
Copy link
Member

Mause commented Apr 9, 2024

@Mytherin I think this is actually an issue with DuckDB core's arrow layer, as I see the same issue with duckdb-rs. I did briefly discuss this with @Maxxen as well

@Mytherin
Copy link
Contributor

Mytherin commented Apr 9, 2024

Why does the problem then not occur in Python?

>>> import duckdb
>>> duckdb.sql("SELECT INTERVAL '2 year'").arrow()
# pyarrow.Table
# CAST('2 year' AS INTERVAL): month_day_nano_interval
# ----
# CAST('2 year' AS INTERVAL): [[24M0d0ns]]

@Mytherin
Copy link
Contributor

Mytherin commented Apr 9, 2024

Actually since both duckdb-rs and duckdb-wasm use the Rust arrow library, is it perhaps possible the problem is in the Rust Arrow library?

@Mause
Copy link
Member

Mause commented Apr 9, 2024

That's possible yeah, I'd assumed that wasm used arrow-js, but it does seem to use arrow-rs yes

Max commented in Discord:

Sounds like a lifetime issue with the arrow return values then, might be some arrow array destructor we set too early or something. Will have a look

@Mause
Copy link
Member

Mause commented Apr 9, 2024

Looks like they did have issues here at one point too: https://github.com/apache/arrow-rs/pull/2235/files

@Mytherin
Copy link
Contributor

Mytherin commented Apr 9, 2024

While possible a lifetime issue seems unlikely to me here - the problem here seems to be that 24 months are interpreted as 24 seconds, and the issue appears to be consistent.

@david542542
Copy link
Author

david542542 commented Apr 9, 2024

Seems like it is wrong for all time units. See here.

image

@Mytherin
Copy link
Contributor

Mytherin commented Apr 9, 2024

Looks like wasm uses a very old version of arrow-rs. @carlopi perhaps we can try to upgrade this?

@carlopi
Copy link
Collaborator

carlopi commented Apr 9, 2024

Using arrow-js (basically when not using the shell infrastructure but passing arrow results directly to JS) the results are correctly parsed generated.

I will look at bumping arrow-rs.

@Maxxen
Copy link
Member

Maxxen commented Apr 9, 2024

Sorry, I said I was going to look into this but then went on vacation and been distracted with other things to get in before the release since I got back. Ill investigate more tomorrow.

@Maxxen
Copy link
Member

Maxxen commented Apr 10, 2024

This occurs in the rust client as well, even with latest arrow.

@Mytherin
Copy link
Contributor

Mytherin commented Apr 10, 2024

My guess at this point is that this is a bug within arrow-rs related to ingestion of intervals through the Arrow C API. Perhaps we can try to create a minimal example and file an issue in their repo?

@carlopi
Copy link
Collaborator

carlopi commented Apr 10, 2024

If @Maxxen would be up to it, I think given duckdb-rs is easier to test in isolation (also for the arrow people) and it's simpler to trust since there are less layers of things going wrong wrt duckdb-wasm's shell, I think that would work better as reproducer (and it's not on me!).

@Maxxen
Copy link
Member

Maxxen commented Apr 10, 2024

Actually, I think we might be in the wrong here.
Arrow stores the interval in a 128bit integer, interpreted as multiple fields;

┌──────────────────────────────┬─────────────┬──────────────┐
│            Nanos             │    Days     │    Months    │
│          (64 bits)           │ (32 bits)   │  (32 bits)   │
└──────────────────────────────┴─────────────┴──────────────┘
  0                            63            95           127 bit offset

But our ArrowInterval type looks like this:

struct ArrowInterval {
   int32_t months;
   int32_t days;
   int64_t nanoseconds;
}

Which we just reinterpret straight up when writing out the arrow buffer.

  68                                   continue;
  69                           }
  70                           result_data[result_idx] = OP::template Operation<TGT, SRC>(data[source_idx]);
  71                   }
  72                   append_data.row_count += size;
  73           }
  74   };
(lldb) p result_data[result_idx]
(duckdb::ArrowInterval)  (months = 24, days = 0, nanoseconds = 0)

But thats not the right order when reinterpreting as a i128 (which is what rust is doing on the other side)

(lldb) p *(__int128*)&result_data[result_idx]
(__int128) 24
    pub fn to_parts(
        i: <IntervalMonthDayNanoType as ArrowPrimitiveType>::Native,
    ) -> (i32, i32, i64) {
        let months = (i >> 96) as i32;
        let days = (i >> 64) as i32;
        let nanos = i as i64;
        (months, days, nanos)
    }

Im not sure why this works in python though...

@Mytherin
Copy link
Contributor

The rust code looks wrong to me. The definition is in little-endian. i >> 96 means we take the most significant bits for the month, when it should be the least-significant bits. Here are some other definitions in the official Arrow project:

It also doesn't really make sense for the Rust driver to be correct when it is the one that disagrees with everything else. Either all DuckDB, Arrow-C++, Arrow-Python, Arrow-Go, Arrow-R and Arrow-JS are wrong, or Arrow-RS is wrong. The latter seems more logical to me.

@Mause
Copy link
Member

Mause commented Apr 10, 2024

Are you saying it was correct before this change? apache/arrow-rs#2235

@Mause
Copy link
Member

Mause commented Apr 10, 2024

Where's this diagram from @Maxxen ?

┌──────────────────────────────┬────────────┬──────────────┐
│            Nanos             │    Days     │    Months    │
│          (64 bits)           │ (32 bits)   │  (32 bits)   │
└──────────────────────────────┴────────────┴──────────────┘
  0                            63            95           127 bit offset

EDIT: Disregard, it's the same as https://github.com/apache/arrow-rs/pull/2235/files#diff-c47b2e57ea72cdad6427c3149de134be3437e403984d367e3604cd0943a9a963R288, just backwards

All integers in the types below are stored in the endianness indicated by the schema.

@Maxxen
Copy link
Member

Maxxen commented Apr 10, 2024

@Mause The diagram is from here: https://github.com/apache/arrow-rs/blob/36a6e515f99866eae5332dfc887c6cb5f8135064/arrow-array/src/types.rs#L265-L269

I agree seems like the rust def is wrong. So what now?

@Mytherin
Copy link
Contributor

I would say we should double-check that the rust version is wrong and try to create a reproducer separate from DuckDB, then file a bug report in the arrow-rs repository.

@Mytherin
Copy link
Contributor

Here's a stand-alone C++ program showcasing the problem:

#include <stdint.h>
#include <stdio.h>

using uint128_t = __uint128_t;

struct MonthDayNanos {
    int32_t months;
    int32_t days;
    int64_t nanoseconds;
};

int main() {
	MonthDayNanos ival;
	ival.months = 32;
	ival.days = 64;
	ival.nanoseconds = 128;

	uint128_t u128val = *reinterpret_cast<uint128_t *>(&ival);

	printf("Rust code (incorrect)\n");
	printf("months %u\n", uint32_t(u128val >> 96));
	printf("days %u\n", uint32_t(u128val >> 64));
	printf("nanos %llu\n", uint64_t(u128val));

	printf("\n");

	printf("Correct shifts\n");
	printf("months %u\n", uint32_t(u128val));
	printf("days %u\n", uint32_t(u128val >> 32));
	printf("nanos %llu\n", uint64_t(u128val >> 64));
}

Prints:

Rust code (incorrect)
months 0
days 128
nanos 274877906976

Correct shifts
months 32
days 64
nanos 128

@david542542
Copy link
Author

Hi, is there any update on this? Should I file a bug report in arrow-rs, or what is necessary in getting this resolved?

@Mytherin
Copy link
Contributor

Arrow C++ has https://github.com/apache/arrow/blob/7003e90c113cec58a620deddf56af71eb305af2a/cpp/src/arrow/type.h#L1775-L1784

Arrow Java uses the same layout: https://github.com/apache/arrow/blob/7003e90c113cec58a620deddf56af71eb305af2a/java/vector/src/main/java/org/apache/arrow/vector/IntervalMonthDayNanoVector.java#L171-L181

That's mentioned above, the Arrow C++ layout is:

struct MonthDayNanos {
    int32_t months;
    int32_t days;
    int64_t nanoseconds;
};

The Rust code for parsing that as a uint128_t is as follows:

    pub fn to_parts(
        i: <IntervalMonthDayNanoType as ArrowPrimitiveType>::Native,
    ) -> (i32, i32, i64) {
        let months = (i >> 96) as i32;
        let days = (i >> 64) as i32;
        let nanos = i as i64;
        (months, days, nanos)
    }

As shown in my stand-alone program here - #1696 (comment) - those shifts produce the wrong results given the Arrow C++ layout.

@Mytherin
Copy link
Contributor

I noticed the integration tests linked involve files - perhaps the byte order for Arrow files is different from the in-memory byte order (big-endian vs little-endian)?

@pitrou
Copy link

pitrou commented Apr 17, 2024

Yes, sorry, I was just posting this for reference.

@pitrou
Copy link

pitrou commented Apr 17, 2024

I noticed the integration tests linked involve files - perhaps the byte order for Arrow files is different from the in-memory byte order (big-endian vs little-endian)?

No, it's probably just that arrow-rs has the same misinterpretation bug in all code paths. So when you ask it to roundtrip the data from JSON to IPC, for example, it will produce the intended IPC results.

@Mytherin
Copy link
Contributor

Makes sense, thanks for picking this up!

@pitrou
Copy link

pitrou commented Apr 17, 2024

Well, you'll have to thank @tustvold for that :-)

@david542542
Copy link
Author

@Mytherin brief update, about ~two weeks away: apache/arrow-rs#5654 (comment)

@Maxxen
Copy link
Member

Maxxen commented May 13, 2024

Thanks for driving this @david542542!

@david542542
Copy link
Author

Hi @Mytherin + @Maxxen,

It looks like this has been released (or awaiting release): apache/arrow-rs#5688.

Are we able to add that item into the upcoming DuckDB bug-fix release?

@carlopi
Copy link
Collaborator

carlopi commented Jun 3, 2024

Thanks everyone for pushing forward on this.

I have a local branch that bumps arrow-rs version, will need to double check a few things, but ideally this can be soon fixed uptream.

@david542542
Copy link
Author

david542542 commented Jun 13, 2024

@carlopi cool, will this be in the next DuckDB release (1.1.0), or when will this fix be added in?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants