-
Notifications
You must be signed in to change notification settings - Fork 3k
[Python] add identity transform #4908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
61b9690 to
47820bf
Compare
python/src/iceberg/transforms.py
Outdated
|
|
||
| @_human_string.register(bytes) | ||
| def _(self, value: bytes) -> str: | ||
| if type(self._type) in {FixedType, BinaryType}: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no need for an if here. The only way to handle binary is to base64 encode it. It doesn't depend on the type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👌
python/tests/test_transforms.py
Outdated
| (LongType(), None, "null"), | ||
| (DateType(), 17501, "2017-12-01"), | ||
| (TimeType(), 36775038194, "10:12:55.038194"), | ||
| (TimestamptzType(), 1512151975038194, "2017-12-01T18:12:55.038194Z"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Timestamptz literals must always include a valid zone offset that matches this regex: [+-]\d\d:\d\d$
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👌 Fixed.
python/src/iceberg/utils/datetime.py
Outdated
|
|
||
| def to_human_time(micros_from_midnight: int) -> str: | ||
| """Converts a TimeType value to human string""" | ||
| to_day = EPOCH_TIMESTAMP + timedelta(microseconds=micros_from_midnight) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Time shouldn't use epoch. There's no need to mix dates into this.
Getting a time is actually straightforward:
def time_from_micros(micros: int) -> time:
seconds = micros // 1_000_000
minutes = seconds // 60
hours = minutes // 60
time(hour=hours, minute=minutes % 60, second=seconds % 60, microsecond=micros % 1_000_000)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should use the code provided here, rather than creating a datetime and converting it to a time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👌 Updated.
python/src/iceberg/utils/datetime.py
Outdated
| def to_human_time(micros_from_midnight: int) -> str: | ||
| """Converts a TimeType value to human string""" | ||
| to_day = EPOCH_TIMESTAMP + timedelta(microseconds=micros_from_midnight) | ||
| return f"{to_day.time()}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this just str(to_day.time())?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should also use .isoformat() after constructing a time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
@jun-he do you have time to update this? If not, we can probably find someone to pick it up. |
47820bf to
3ef5cb6
Compare
|
@rdblue Yep, just addressed the comments accordingly. Can you take a look? Thanks. |
| return self._human_string(value) | ||
|
|
||
| @singledispatchmethod | ||
| def _human_string(self, value: Optional[S]) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: There's no need for these to be in the class, and it's actually faster if they are simple methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean moving them out of the class?
Wondering if we can save _human_string but only keeps _int_to_human_string.
like
def to_human_string(self, value: Optional[S]) -> str:
if value is None:
return "null"
elif isinstance(value, bytes):
return _base64encode(value)
elif isinstance(value, int):
return self._int_to_human_string(self._type, value)
else:
return str(value)
python/tox.ini
Outdated
| # specific language governing permissions and limitations | ||
| # under the License. | ||
|
|
||
| [tox] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we no longer use tox, can you remove this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👌
|
@jun-he, this is about ready to go. The main problem is that it reintroduces |
rdblue
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks correct, but needs to remove tox.ini.
8e8d849 to
3b63567
Compare
|
@rdblue updated the PR and removed tox.ini |
|
Thanks, @jun-he! |
To split the PR #3450, open a new PR for identity transform here.