Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for duration #79

Merged
merged 1 commit into from
Oct 14, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,10 @@ jobs:
fail-fast: false
steps:
- name: Checkout sources
uses: actions/checkout@v3
uses: actions/checkout@v4

- name: Setup Python
uses: actions/setup-python@v4
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
cache: "pip"
Expand All @@ -33,7 +33,7 @@ jobs:
run: tox

- name: Upload coverage to Codecov
uses: codecov/codecov-action@v3
uses: codecov/codecov-action@v4
if: "matrix.python-version == '3.11'"
with:
fail_ci_if_error: true
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -131,3 +131,4 @@ dmypy.json

# Custom
/protarrow_protos
.idea
8 changes: 8 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,14 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).

<!-- insertion marker -->
## [v0.7.0](https://github.com/tradewelltech/protarrow/releases/tag/v0.7.0) - 2024-10-14

<small>[Compare with v0.6.0](https://github.com/tradewelltech/protarrow/compare/v0.6.0...v0.7.0)</small>

### Added

- Add support for duration ([c9ab1e2](https://github.com/tradewelltech/protarrow/commit/c9ab1e203712a503dec6bbbd58d340ca37542de5) by aandres).

## [v0.6.0](https://github.com/tradewelltech/protarrow/releases/tag/v0.6.0) - 2024-09-12

<small>[Compare with v0.5.2](https://github.com/tradewelltech/protarrow/compare/v0.5.2...v0.6.0)</small>
Expand Down
2 changes: 1 addition & 1 deletion docs/contributing.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ git-changelog -io CHANGELOG.md

For new release, first prepare the change log, push and merge it.
```shell
git-changelog -bio CHANGELOG.md
git-changelog --bump=auto -io CHANGELOG.md
```

Then tag and push:
Expand Down
1 change: 1 addition & 0 deletions docs/types.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@
| google.protobuf.UInt64Value | uint64 | |
| google.type.Date | date32() | |
| google.type.TimeOfDay | **time64**/time32 | Unit and type are configurable |
| google.type.Duration | duration("ns") | Unit is configurable |

## Nullability

Expand Down
3 changes: 2 additions & 1 deletion docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,13 +57,14 @@ my_proto_1 = message_extractor.read_table_row(table, 1)

## Customize arrow type

The arrow type for `Enum`, `Timestamp` and `TimeOfDay` can be configured:
The arrow type for `Enum`, `Timestamp` and `TimeOfDay` and `Duration` can be configured:

```python
config = protarrow.ProtarrowConfig(
enum_type=pa.int32(),
timestamp_type=pa.timestamp("ms", "America/New_York"),
time_of_day_type=pa.time32("ms"),
duration_type=pa.duration("s"),
)
record_batch = protarrow.messages_to_record_batch(my_protos, MyProto, config)
```
Expand Down
32 changes: 32 additions & 0 deletions protarrow/arrow_to_proto.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@

import pyarrow as pa
from google.protobuf.descriptor import Descriptor, EnumDescriptor, FieldDescriptor
from google.protobuf.duration_pb2 import Duration
from google.protobuf.internal.containers import MessageMap
from google.protobuf.message import Message
from google.protobuf.timestamp_pb2 import Timestamp
Expand Down Expand Up @@ -75,6 +76,29 @@ def _time_64_ns_scalar_to_proto(scalar: pa.Time64Scalar) -> TimeOfDay:
)


def _duration_ns_scalar_to_proto(scalar: pa.DurationScalar) -> Duration:
total_nanos = scalar.value
return Duration(
nanos=total_nanos % 1_000_000_000, seconds=(total_nanos // 1_000_000_000)
)


def _duration_us_scalar_to_proto(scalar: pa.DurationScalar) -> Duration:
total_us = scalar.value
return Duration(
nanos=(total_us % 1_000_000) * 1_000, seconds=(total_us // 1_000_000)
)


def _duration_ms_scalar_to_proto(scalar: pa.DurationScalar) -> Duration:
total_us = scalar.value
return Duration(nanos=(total_us % 1_000) * 1_000_000, seconds=(total_us // 1_000))


def _duration_s_scalar_to_proto(scalar: pa.DurationScalar) -> Duration:
return Duration(seconds=scalar.value)


def _time_64_us_scalar_to_proto(scalar: pa.Time64Scalar) -> TimeOfDay:
total_us = scalar.value
return TimeOfDay(
Expand Down Expand Up @@ -112,6 +136,13 @@ def _time_32_s_scalar_to_proto(scalar: pa.Time32Scalar) -> TimeOfDay:
pa.time32("s"): _time_32_s_scalar_to_proto,
}

DURATION_CONVERTERS = {
"ns": _duration_ns_scalar_to_proto,
"us": _duration_us_scalar_to_proto,
"ms": _duration_ms_scalar_to_proto,
"s": _duration_s_scalar_to_proto,
}

TIMESTAMP_CONVERTERS = {
"ns": _timestamp_ns_scalar_to_proto,
"us": _timestamp_us_scalar_to_proto,
Expand All @@ -123,6 +154,7 @@ def _time_32_s_scalar_to_proto(scalar: pa.Time32Scalar) -> TimeOfDay:
Timestamp.DESCRIPTOR: lambda data_type: TIMESTAMP_CONVERTERS[data_type.unit],
Date.DESCRIPTOR: lambda _: _date_scalar_to_proto,
TimeOfDay.DESCRIPTOR: TIME_OF_DAY_CONVERTERS.__getitem__,
Duration.DESCRIPTOR: lambda data_type: DURATION_CONVERTERS[data_type.unit],
}

NULLABLE_TYPES = (
Expand Down
3 changes: 3 additions & 0 deletions protarrow/cast_to_proto.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
import pyarrow as pa
import pyarrow.compute as pc
from google.protobuf.descriptor import Descriptor, FieldDescriptor
from google.protobuf.duration_pb2 import Duration
from google.protobuf.message import Message
from google.protobuf.timestamp_pb2 import Timestamp
from google.type.timeofday_pb2 import TimeOfDay
Expand Down Expand Up @@ -52,6 +53,8 @@ def _cast_flat_array(
return array.cast(config.time_of_day_type)
elif field_descriptor.message_type == Timestamp.DESCRIPTOR:
return array.cast(config.timestamp_type)
elif field_descriptor.message_type == Duration.DESCRIPTOR:
return array.cast(config.duration_type)
elif field_descriptor.message_type in _PROTO_DESCRIPTOR_TO_PYARROW:
return array.cast(
_PROTO_DESCRIPTOR_TO_PYARROW[field_descriptor.message_type]
Expand Down
1 change: 1 addition & 0 deletions protarrow/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ class ProtarrowConfig:
enum_type: pa.DataType = pa.int32()
timestamp_type: pa.TimestampType = pa.timestamp("ns", "UTC")
time_of_day_type: Union[pa.Time64Type, pa.Time32Type] = pa.time64("ns")
duration_type: pa.DurationType = pa.duration("ns")
list_nullable: bool = False
map_nullable: bool = False
list_value_nullable: bool = False
Expand Down
12 changes: 12 additions & 0 deletions protarrow/proto_to_arrow.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
import pyarrow.compute as pc
from google.protobuf.descriptor import Descriptor, EnumDescriptor, FieldDescriptor
from google.protobuf.descriptor_pb2 import FieldDescriptorProto
from google.protobuf.duration_pb2 import Duration
from google.protobuf.internal.containers import MessageMap, RepeatedScalarFieldContainer
from google.protobuf.message import Message
from google.protobuf.timestamp_pb2 import Timestamp
Expand Down Expand Up @@ -127,6 +128,13 @@ def _proto_date_to_py_date(proto_date: Date) -> datetime.date:
"ns": _time_of_day_to_nanos,
}

_DURATION_CONVERTERS = {
"s": Duration.ToSeconds,
"ms": Duration.ToMilliseconds,
"us": Duration.ToMicroseconds,
"ns": Duration.ToNanoseconds,
}


@dataclasses.dataclass(frozen=True)
class FlattenedIterable(collections.abc.Iterable):
Expand Down Expand Up @@ -286,6 +294,8 @@ def field_descriptor_to_data_type(
return config.timestamp_type
elif field_descriptor.message_type == TimeOfDay.DESCRIPTOR:
return config.time_of_day_type
elif field_descriptor.message_type == Duration.DESCRIPTOR:
return config.duration_type
elif field_descriptor.type == FieldDescriptorProto.TYPE_MESSAGE:
try:
return _PROTO_DESCRIPTOR_TO_PYARROW[field_descriptor.message_type]
Expand Down Expand Up @@ -314,6 +324,8 @@ def _get_converter(
) -> Optional[Callable[[Any], Any]]:
if field_descriptor.message_type == Timestamp.DESCRIPTOR:
return _TIMESTAMP_CONVERTERS[config.timestamp_type.unit]
elif field_descriptor.message_type == Duration.DESCRIPTOR:
return _DURATION_CONVERTERS[config.duration_type.unit]
elif field_descriptor.message_type == TimeOfDay.DESCRIPTOR:
return _TIME_OF_DAY_CONVERTERS[config.time_of_day_type.unit]
elif field_descriptor.type == FieldDescriptorProto.TYPE_MESSAGE:
Expand Down
Loading