Skip to content

Conversation

@jorisvandenbossche
Copy link
Member

No description provided.

@github-actions
Copy link

the schema will be inferred from the file path (and a
PartitioningFactory is returned).
field_names : list of str, default None
A list of strings (field names). If specified, the schema's types are
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bkietz @kszucs following up on #6022 (comment), I went with splitting the keyword into two separate ones.
But I can also rename it to schema_or_field_names and keep it as a single one (it's long, but you can use it positionally), if that has stronger preference.

I think having separate ones is a bit cleaner, and easier to explain.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it is easier to merge than separate later.

if isinstance(schema, pa.Schema):
return HivePartitioning(schema)
else:
raise ValueError(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is why I found useful the ability to restrict the HivePartitioning for certain field names. Perhaps I'm only interested in a subset of the partitioning fields, and it'd be easier to define them here.

@bkietz created an issue where we can discuss https://issues.apache.org/jira/browse/ARROW-7646

Copy link
Member

@kszucs kszucs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants