Skip to content

fix: Disallow lazy backend in read_parquet#2980

Merged
MarcoGorelli merged 3 commits intomainfrom
fix/read-parquet-with-lazy
Aug 13, 2025
Merged

fix: Disallow lazy backend in read_parquet#2980
MarcoGorelli merged 3 commits intomainfrom
fix/read-parquet-with-lazy

Conversation

@FBruzzesi
Copy link
Member

What type of PR is this? (check all applicable)

  • 💾 Refactor
  • ✨ Feature
  • 🐛 Bug Fix
  • 🔧 Optimization
  • 📝 Documentation
  • ✅ Test
  • 🐳 Other

Related issues

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below

@FBruzzesi FBruzzesi added the fix label Aug 12, 2025
Comment on lines +608 to +620
elif impl in {
Implementation.PYSPARK,
Implementation.DASK,
Implementation.DUCKDB,
Implementation.IBIS,
Implementation.SQLFRAME,
Implementation.PYSPARK_CONNECT,
}:
msg = (
f"Expected eager backend, found {impl}.\n\n"
f"Hint: use nw.scan_csv(source={source}, backend={backend})"
)
raise ValueError(msg)
Copy link
Member Author

@FBruzzesi FBruzzesi Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For read_csv I simply wanted to improve on the error message

@FBruzzesi FBruzzesi changed the title fix: Disallow lazy backend in read_parquet fix: Disallow lazy backend in read_parquet Aug 12, 2025
Copy link
Member

@dangotbanned dangotbanned left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well spotted @FBruzzesi, I also thought this was strange 😄

Just one question from me

squeeze cheeks narwhal

Comment on lines -745 to -746
Implementation.DUCKDB,
Implementation.IBIS,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible these were supposed to be supported for v1 only?

They stand out as the impls that had an interchange support for DataFrame

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thaaaat's totally possible! However I find it weird that:

  • read_parquet allows it, but read_csv doesn't
  • from_native in the return statement has the eager_only=True flag, not the eager_or_interchange_only

🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah true, I guess you could investigate 🧐 the blame?

Copy link
Member Author

@FBruzzesi FBruzzesi Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#1725 introduced full support for duckdb, and added duckdb in read_parquet
#2000 introduced full support for ibis, and added ibis in read_parquet

These are way after interchange protocol support, so it seems completely accidental, and we just missed it. Both PRs were 2k+ lines changes and we had no test to prevent that

@MarcoGorelli
Copy link
Member

nice one thanks both!

@MarcoGorelli MarcoGorelli merged commit b14d880 into main Aug 13, 2025
44 of 45 checks passed
@MarcoGorelli MarcoGorelli deleted the fix/read-parquet-with-lazy branch August 13, 2025 11:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: read_parquet allow lazy backend

3 participants