Skip to content

Conversation

simonelbaz
Copy link

@simonelbaz simonelbaz commented Sep 18, 2025

…d if ListBucket is not allowed for the user

Thanks for opening a pull request!

Rationale for this change

This PR gives the user to choose not to create directory in the bucket before writing dataset.
In case the create_directory option is set to FALSE, no verification will be made by R arrow.
The S3 storage will itself verify if the directory exists and if the users has the rigth to modify it.
This way no ListBucket or HeadBucket are necessary to achieve the write operation.

df |> arrow::write_dataset(
  minio$path(paste0("smartsla-bucket/rarrow/")),
  partitioning = "qualitative",
  create_directory = FALSE,
  format = "parquet"
)

What changes are included in this PR?

create_directory is now available to the user in the write_dataset function.
Before this PR, this option was automatically set to TRUE (by default).

Are these changes tested?

Yes

Are there any user-facing changes?

No, the default value for create_directory is still TRUE.

This PR includes breaking changes to public APIs. (If there are any breaking changes to public APIs, please explain which changes are breaking. If not, you can remove this.)

N/A

This PR contains a "Critical Fix". (If the changes fix either (a) a security vulnerability, (b) a bug that caused incorrect or invalid data to be produced, or (c) a bug that causes a crash (even when the API contract is upheld), please provide explanation. If not, you can remove this.)

N/A

Copy link

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

See also:

@simonelbaz simonelbaz changed the title [ISSUE 42173][R] Writing partitionned dataset with Rarrow on S3 failed if ListBucket is not allowed for the user [GH-42173][R] Writing partitionned dataset with Rarrow on S3 failed if ListBucket is not allowed for the user Sep 18, 2025
@simonelbaz simonelbaz changed the title [GH-42173][R] Writing partitionned dataset with Rarrow on S3 failed if ListBucket is not allowed for the user GH-42173: [R] Writing partitionned dataset with Rarrow on S3 failed if ListBucket is not allowed for the user Sep 18, 2025
Copy link

⚠️ GitHub issue #42173 has been automatically assigned in GitHub to PR creator.

@simonelbaz simonelbaz changed the title GH-42173: [R] Writing partitionned dataset with Rarrow on S3 failed if ListBucket is not allowed for the user GH-42173: [R][S3] Writing partitionned dataset with Rarrow on S3 failed if ListBucket is not allowed for the user Sep 18, 2025
Copy link

⚠️ GitHub issue #42173 has been automatically assigned in GitHub to PR creator.

@simonelbaz simonelbaz marked this pull request as draft September 18, 2025 21:08
@simonelbaz simonelbaz marked this pull request as ready for review September 25, 2025 08:07
@simonelbaz
Copy link
Author

Hi,

@jonkeane @thisisnic thanks for any comment || review

@thisisnic thisisnic changed the title GH-42173: [R][S3] Writing partitionned dataset with Rarrow on S3 failed if ListBucket is not allowed for the user GH-42173: [R][C++] Writing partitioned dataset with S3 fails if ListBucket is not allowed for the user Oct 13, 2025
Copy link

⚠️ GitHub issue #42173 has been automatically assigned in GitHub to PR creator.

Copy link
Member

@thisisnic thisisnic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making this PR @simonelbaz! I've given it a look over, and it looks like there's an existing option for this scenario that might be a better solution, though I haven't tried it out myself - let me know what you think.

@thisisnic thisisnic changed the title GH-42173: [R][C++] Writing partitioned dataset with S3 fails if ListBucket is not allowed for the user GH-42173: [R][C++] Writing partitioned dataset on S3 fails if ListBucket is not allowed for the user Oct 13, 2025
Copy link

⚠️ GitHub issue #42173 has been automatically assigned in GitHub to PR creator.

…3 failed if ListBucket is not allowed for the user
…3 failed if ListBucket is not allowed for the user
@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Oct 16, 2025
Co-authored-by: Antoine Pitrou <[email protected]>
@simonelbaz simonelbaz requested a review from pitrou October 16, 2025 09:32
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 from me. @thisisnic Are you ok with the new argument name and docstring?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants