-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Read support for Apache Iceberg #6227
Comments
Discussion on Rust iceberg sdk apache/iceberg#5122 |
Is this feature already in the Polars team roadmap? |
Could someone please advise where we can access the roadmap and verify if iceberg has been included in the Polars roadmap? |
I'd love to have it! |
FYI: I'm looking at further generalising our database support in Python (ref: #10121). Are you intending to use directly from Rust, or the usual Python API? (If the latter, which driver do you usually use?) |
@alexander-beedie, I noticed Delta is implemented in Python + Arrow. Iceberg comitters are working on improving Arrow compatibility, but I wonder if you all prefer Rust support? This is also almost complete as well. What's the best way forward in your opinion as both options are available?
|
Also related: apache/iceberg#7067 |
In my case, python @alexander-beedie |
My 2 cents on the topic, we can mimic the way we implemented things for delta and keep this on the python side of things via the |
IMO, it'd be preferred to do this on the rust side. That way we can have support for it in sql, python, .... a datafusion based project glardb recently added iceberg support. Looks like it may be easy to port their datafusion logic over to polars. The dependencies also seem pretty lightweight, only |
I got a working version using PyIceberg here: #10375
Unfortunately, a lot of the Rust implementations out there are far from complete. Looking at the implementation at GlareDB, a couple of things are missing:
The implementation only does file discovery, which is a pity since Iceberg has so much to offer. |
Apologies, and thanks for reaching out to us! This seemed to slip my radar; looks like @Fokko is well underway building a |
Hey @alexander-beedie, thanks for jumping in here. What kind of SQL interface are you thinking of? We're supporting SQL-like syntax for the expressions: large_rides_in_march = tbl.scan().filter("dt >= '2023-01-01' and dt < '2023-04-01' and passenger_count > 4").to_arrow() We could also accept this kind of expression Polars. Also, DuckDB has recently opened up their Iceberg support, however, this is also still in an experimental state and many of the features that make Iceberg shine are still missing. |
Great, I see it now - thanks for the clarification. I think I was conflating a quickly-scanned article involving DuckDB & Iceberg with there actually being a fully-fledged SQL query interface (and therefore also Python-side DBAPI -or equivalent- drivers), in which case things would "just work" once we start accepting user-created connection objects and their associated queries. Looks to me like the |
Problem description
Would like to see read support for Apache Iceberg similar to the support for delta. Cc @asheeshgarg
The text was updated successfully, but these errors were encountered: