Skip to content

Conversation

@jun-he
Copy link
Collaborator

@jun-he jun-he commented Jun 9, 2021

Support custom target name in partition spec builder and also fix some other issues. Additionally, add unit tests for partition spec.

@jun-he jun-he changed the title support custom target name in partition spec builder [Python] support custom target name in partition spec builder Jun 9, 2021
@github-actions github-actions bot added the python label Jun 9, 2021
@TGooch44
Copy link
Contributor

TGooch44 commented Jun 9, 2021

Hi Jun, I think I have a very similar PR queued up on this. Let me review what you have here, and see what else needs to be added(I can rebase me PR). There is a lot of work in the partition spec that needs to be rolled out and I've been lagging on getting the PR's to the Open source

@jun-he
Copy link
Collaborator Author

jun-he commented Jun 10, 2021

@TGooch44 Thanks for the comment. Please let me know other related changes and I can add them. Also, we may ship them separately with multiple PRs.

Copy link
Contributor

@TGooch44 TGooch44 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. I'm not sure about the tox mypy issue for py38 when I run tox -epy38 locally on a fresh env, I don't get any errors.

self.partition_names.add(name)
return self

def check_for_redundant_partitions(self, field):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it's worthwhile introducing a function that checks for redundant partitions and adds the field? Each transform method does the same block creating the partition field and then adding it. Something like this:

def check_redundant_and_add_field(self, field_id: int, name: str, transform: 'Transform') -> None:
        part_field = (PartitionField(field_id,
                                     self.__next_field_id(),
                                     name,
                                     transform))
        self.check_for_redundant_partitions(part_field)
        self.fields.append(part_field)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, good idea. We can actually update check_for_redundant_partitions method to include adding field.

PartitionSpec.builder_for(schema).truncate("dec", 10).build(),
PartitionSpec.builder_for(schema).truncate("s", 10).build(),
# todo support them
# PartitionSpec.builder_for(schema).add_without_field_id(6, "dec_unsupported", "unsupported").build(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this not supported?

Copy link
Collaborator Author

@jun-he jun-he Jun 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, put todos for unimplemented Transforms, e.g. BucketUUID, etc. Also, need to update from_string to support unrecognized cases. In Java, it returns return new UnknownTransform<>(type, transform);
So I think it might be better to have them in a different PR.

@jun-he
Copy link
Collaborator Author

jun-he commented Jun 14, 2021

Seems the failures are from linters

  linters run-test: commands[2] | mypy --ignore-missing-imports iceberg/
  iceberg/api/transforms/transform_util.py:20: error: Library stubs not installed for "pytz" (or incompatible with Python 3.8)
  iceberg/api/expressions/literals.py:23: error: Library stubs not installed for "pytz" (or incompatible with Python 3.8)
  iceberg/api/expressions/literals.py:23: note: Hint: "python3 -m pip install types-pytz"
  iceberg/api/expressions/literals.py:23: note: (or run "mypy --install-types" to install all missing stub packages)
  iceberg/api/expressions/literals.py:23: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
  iceberg/api/expressions/literals.py:385: error: Library stubs not installed for "dateutil.parser" (or incompatible with Python 3.8)
  iceberg/api/expressions/literals.py:385: note: Hint: "python3 -m pip install types-python-dateutil"
  iceberg/api/expressions/literals.py:385: error: Library stubs not installed for "dateutil" (or incompatible with Python 3.8)

Copy link
Contributor

@TGooch44 TGooch44 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I'll get my merge access sorted out today, and I can get this merged. We may need to do something to address the linting issue, although I'm still able to run tox on your branch and get no linting errors.

@TGooch44 TGooch44 merged commit 4c013a8 into apache:master Jun 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants