feat: Implement partial "lazy" support for DuckDB (even with this PR, DuckDB support is work-in-progress!)#1725
Conversation
|
can't wait for this to be merged 💪 thanks @MarcoGorelli that's gonna provide a great DX |
| if subset is not None and any(x not in self.columns for x in subset): | ||
| msg = f"Column(s) {subset} not found in {self.columns}" | ||
| raise ColumnNotFoundError(msg) |
There was a problem hiding this comment.
should we create a check_columns_exist function in narwhals.utils so we can reuse everywhere else? :)
| constructor: Constructor, | ||
| request: pytest.FixtureRequest, | ||
| ) -> None: | ||
| if "duckdb" in str(constructor) and attr == "__floordiv__": |
There was a problem hiding this comment.
__floordiv__ should be implemented, or am I looking at the wrong thing?
There was a problem hiding this comment.
it behaves differently though, so i think we need a separate discussion for how to deal with it, e.g.
In [3]: duckdb.sql('select 1.5 // 2.5')
Out[3]:
┌──────────────┐
│ (1.5 // 2.5) │
│ double │
├──────────────┤
│ 0.6 │
└──────────────┘
In [4]: 1.5 // 2.5
Out[4]: 0.0|
|
||
| def test_cast_series( | ||
| constructor: Constructor, | ||
| constructor_eager: ConstructorEager, |
FBruzzesi
left a comment
There was a problem hiding this comment.
I will need to take a crash course in duckDB. I tried to leave a couple of comments.
Main point being, how we want to design the collect method in the main LazyFrame class
| return ArrowDataFrame( | ||
| native_dataframe=self._native_frame.arrow(), | ||
| backend_version=parse_version(pa.__version__), | ||
| version=self._version, | ||
| ) |
There was a problem hiding this comment.
My understanding was that duckdb is dependency free.
Should we jump to the discussion in #1479 before deciding how to collect for duckdb?
There was a problem hiding this comment.
I think even in that one the likely default for duckdb would still be pyarrow though, right? i've added a try-except anyway to show a less surprising error message
There was a problem hiding this comment.
I will try to make a draft/RFC tomorrow to follow up on my comment in the thread ;)
narwhals/_duckdb/dataframe.py
Outdated
| return self._native_frame.columns # type: ignore[no-any-return] | ||
|
|
||
| def to_pandas(self: Self) -> pd.DataFrame: | ||
| # only is version if v1, keep around for backcompat |
There was a problem hiding this comment.
TODO: implement version check? Same for to_arrow?
| # only is version if v1, keep around for backcompat | |
| # only if version is v1, keep around for backcompat |
There was a problem hiding this comment.
to_pandas wouldn't be available on nw.LazyFrame anyway so this wouldn't be reachable for non-v1
| assert left_on is not None # noqa: S101 | ||
| assert right_on is not None # noqa: S101 |
There was a problem hiding this comment.
This should already never be the case after all the checks in BaseFrame.join considering how in {"inner", "left"}
There was a problem hiding this comment.
true but then mypy complains 😭
|
thanks both for your reviews and comments, much appreciated! any objections to merging as-is before merge conflicts, and then we iterate on it until it's complete? |
|
thanks all for comments! doesn't look like there's been objections, and this PR is quite self-standing (it doesn't affect existing core backends) so I'll go ahead and ship it so we can release, then we can fill it out bit-by-bit and it can become truly incredible @choucavalier thanks for your interest - please note that this is really work-in-progress so you'll likely run into quite a few missing methods if you try it out. Nonetheless, i'd be curious to hear how you find it if you do |
|
thanks @MarcoGorelli i'll try it out and report any issue with proper failing tests :) thanks for your amazing work. you're a MACHINE. like @EdAbati said: exciting times!! 🚀 |
What type of PR is this? (check all applicable)
Related issues
Checklist
If you have comments or can explain your changes, please do so below