Using write_parquet
with partition_by
breaks writing to S3
#20502
Labels
A-io-cloud
Area: reading/writing to cloud storage
accepted
Ready for implementation
enhancement
New feature or an improvement of an existing feature
P-medium
Priority: medium
python
Related to Python Polars
Checks
Reproducible example
Log output
Auto-selected credential provider: CredentialProviderAWS try_get_writeable: cloud: s3://my-bucket/test/a.parquet Async thread count: 4 object store cache key: s3://my-bucket/test/a.parquet S { url_base: "s3://my-bucket", cloud_options: Some(C { max_retries: 2, file_cache_ttl: 3600, config: Some(Aws([])), credential_provider: 140141317067952 }) } async upload_chunk_size: 67108864 [FetchedCredentialsCache]: Call update_func: current_time = 1735556996, last_fetched_expiry = 0 [FetchedCredentialsCache]: Finish update_func: new expiry = (never expires)
Issue description
Calling
write_parquet
with an S3 URI writes to a local file instead of S3 ifpartition_by
is specified. It works as expected ifpartition_by
is not specifiedExpected behavior
df.write_parquet('s3://my-bucket/hive/',partition_by=['b'])
should write a partitioned dataset to S3.Installed versions
The text was updated successfully, but these errors were encountered: