Skip to content

Conversation

MoritzPotthoffQC
Copy link
Contributor

@MoritzPotthoffQC MoritzPotthoffQC commented Sep 11, 2025

Motivation

We would like to clarify in the docs that the write_parquet methods use the mkdir kwarg to control whether directories should be created.

Changes

  • Bumped to polars 1.33 (needed for mkdir)
  • Updated docs
  • Added tests

@MoritzPotthoffQC MoritzPotthoffQC self-assigned this Sep 11, 2025
@github-actions github-actions bot added the fix label Sep 11, 2025
Copy link

codecov bot commented Sep 11, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (22f4b7d) to head (ff60351).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff            @@
##              main      #142   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           49        50    +1     
  Lines         2818      2851   +33     
=========================================
+ Hits          2818      2851   +33     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@MoritzPotthoffQC MoritzPotthoffQC marked this pull request as ready for review September 11, 2025 13:37
@MoritzPotthoffQC
Copy link
Contributor Author

@AndreasAlbertQC I did not consider delta storage in this PR. If there is a more elegant way to fix this, feel free to amend/close this PR.

Copy link
Member

@borchero borchero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mh, I thought that write_parquet takes care of this, unfortunately not... 😅 I think we should be consistent with write_parquet from polars though and not create parents (and document it accordingly).

) -> None:
file = kwargs.pop("file")
metadata = kwargs.pop("metadata", {})
file.parent.mkdir(parents=True, exist_ok=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we shouldn't do this here but rather in the specialized functions for collections. Also, I wouldn't set parents=True (see main comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm fair point. I looked into this more. pl.write_parquet has a parameter mkdir (default False). So to get the exact same behavior, I guess we would have to add this. But I see that the additional complexity is not worth it and we can just always not do it.

We currently rely on the fact that write_parquet sets mkdir to True for partitioned writes and creates all parent directories here because we add an extra layer of directories that is then going to be created by polars.

I think if we do not want an mkdir argument, we should not create anything anywhere, except for partitioned writes. That would not require any code changes, just updates to the docs. Is this what you have in mind?

Copy link
Contributor Author

@MoritzPotthoffQC MoritzPotthoffQC Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to just delegate this to the mkdir parameter in polars.

@borchero I just realized that the mkdir parameter is only exposed from polars 1.33 onwards. So I think we should wait with this PR until we bump there anyway.

Copy link
Member

@borchero borchero Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will bump the version requirement as part of #139, do you want to wait until then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that works for me. I will redraft this for now then.

@MoritzPotthoffQC MoritzPotthoffQC changed the title fix: Create the target path when writing to parquet storage docs: Document that the write_parquet methods use mkdir in polars Sep 16, 2025
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Sep 16, 2025
@MoritzPotthoffQC MoritzPotthoffQC marked this pull request as draft September 16, 2025 21:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants