ARROW-7636: [Python] Clean-up the pyarrow.dataset.partitioning() API #6244

jorisvandenbossche · 2020-01-21T15:59:03Z

No description provided.

github-actions · 2020-01-21T16:01:35Z

https://issues.apache.org/jira/browse/ARROW-7636

jorisvandenbossche · 2020-01-21T16:04:06Z

python/pyarrow/dataset.py

+        the schema will be inferred from the file path (and a
+        PartitioningFactory is returned).
+    field_names :  list of str, default None
+        A list of strings (field names). If specified, the schema's types are


@bkietz @kszucs following up on #6022 (comment), I went with splitting the keyword into two separate ones.
But I can also rename it to schema_or_field_names and keep it as a single one (it's long, but you can use it positionally), if that has stronger preference.

I think having separate ones is a bit cleaner, and easier to explain.

And it is easier to merge than separate later.

kszucs · 2020-01-22T12:06:21Z

python/pyarrow/dataset.py

+            if isinstance(schema, pa.Schema):
+                return HivePartitioning(schema)
+            else:
+                raise ValueError(


This is why I found useful the ability to restrict the HivePartitioning for certain field names. Perhaps I'm only interested in a subset of the partitioning fields, and it'd be easier to define them here.

@bkietz created an issue where we can discuss https://issues.apache.org/jira/browse/ARROW-7646

kszucs

LGTM

split field_names in separate options for schema and field_names

3e5709c

jorisvandenbossche commented Jan 21, 2020

View reviewed changes

pass throug list of field names in source/dataset()

3e85fd0

kszucs reviewed Jan 22, 2020

View reviewed changes

kszucs approved these changes Jan 22, 2020

View reviewed changes

kszucs closed this in 047f87a Jan 22, 2020

jorisvandenbossche deleted the ARROW-7636 branch January 22, 2020 12:09

asfimport mentioned this pull request Jan 22, 2020

[Python] Clean-up the pyarrow.dataset.partitioning() API #23886

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ARROW-7636: [Python] Clean-up the pyarrow.dataset.partitioning() API #6244

ARROW-7636: [Python] Clean-up the pyarrow.dataset.partitioning() API #6244

Uh oh!

jorisvandenbossche commented Jan 21, 2020

Uh oh!

github-actions bot commented Jan 21, 2020

Uh oh!

jorisvandenbossche Jan 21, 2020

Uh oh!

kszucs Jan 22, 2020

Uh oh!

kszucs Jan 22, 2020

Uh oh!

kszucs left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ARROW-7636: [Python] Clean-up the pyarrow.dataset.partitioning() API #6244

ARROW-7636: [Python] Clean-up the pyarrow.dataset.partitioning() API #6244

Uh oh!

Conversation

jorisvandenbossche commented Jan 21, 2020

Uh oh!

github-actions bot commented Jan 21, 2020

Uh oh!

jorisvandenbossche Jan 21, 2020

Choose a reason for hiding this comment

Uh oh!

kszucs Jan 22, 2020

Choose a reason for hiding this comment

Uh oh!

kszucs Jan 22, 2020

Choose a reason for hiding this comment

Uh oh!

kszucs left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants