-
Notifications
You must be signed in to change notification settings - Fork 3.5k
/
extension_types.yaml
170 lines (162 loc) · 5.55 KB
/
extension_types.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
# substrait::{ExtensionTypeVariation, ExtensionType}s
# for wrapping types which appear in the arrow type system but
# are not first-class in substrait. These include:
# - null
# - unsigned integers
# - half-precision floating point numbers
# - 32-bit times and 64-bit dates
# - timestamps with units other than microseconds
# - timestamps with timezones other than UTC
# - 256-bit decimals
# - sparse and dense unions
# - dictionary encoded types
# - durations
# - string and binary with 64 bit offsets
# - list with 64-bit offsets
# - interval<months: i32>
# - interval<days: i32, millis: i32>
# - interval<months: i32, days: i32, nanos: i64>
# - arrow::ExtensionTypes
#
# These types fall into several categories of behavior:
# Certain Arrow data types are, from Substrait's point of view, encodings.
# These include dictionary, the view types (e.g. binary view, list view),
# and REE.
#
# These types are not logically distinct from the type they are encoding.
# Specifically, the types meet the following criteria:
# * There is no value in the decoded type that cannot be represented
# as a value in the encoded type and vice versa.
# * Functions have the same meaning when applied to the encoded type
#
# Note: if two types have a different range (e.g. string and large_string) then
# they do not satisfy the above criteria and are not encodings.
#
# These types will never have a Substrait equivalent. In the Substrait point
# of view these are execution details.
# The following types are encodings:
# binary_view
# list_view
# dictionary
# ree
# Arrow-cpp's Substrait serde does not yet handle parameterized UDTs. This means
# the following types are not yet supported but may be supported in the future.
# We define them below in case other implementations support them in the meantime.
# decimal256
# large_list
# fixed_size_list
# duration
# Other types are not encodings, but are not first-class in Substrait. These
# types are often similar to existing Substrait types but define a different range
# of values. For example, unsigned integer types are very similar to their integer
# counterparts, but have a different range of values. These types are defined here
# as extension types.
#
# A full description of the types, along with their specified range, can be found
# in Schema.fbs
#
# Consumers should take care when supporting the below types. Should Substrait decide
# later to support these types, the consumer will need to make sure to continue supporting
# the extension type names as aliases for proper backwards compatibility.
types:
- name: "null"
structure: {}
- name: interval_month
structure:
months: i32
- name: interval_day_milli
structure:
days: i32
millis: i32
- name: interval_month_day_nano
structure:
months: i32
days: i32
nanos: i64
# All unsigned integer literals are encoded as user defined literals with
# a google.protobuf.UInt64Value message.
- name: u8
structure: {}
- name: u16
structure: {}
- name: u32
structure: {}
- name: u64
structure: {}
# fp16 literals are encoded as user defined literals with
# a google.protobuf.UInt32Value message where the lower 16 bits are
# the fp16 value.
- name: fp16
structure: {}
# 64-bit integers are big. Even though date64 stores ms and not days it
# can still represent about 50x more dates than date32. Since it has a
# different range of values, it is an extension type.
#
# date64 literals are encoded as user defined literals with
# a google.protobuf.Int64Value message.
- name: date_millis
structure: {}
# time literals are encoded as user defined literals with
# a google.protobuf.Int32Value message (for time_seconds/time_millis)
# or a google.protobuf.Int64Value message (for time_nanos).
- name: time_seconds
structure: {}
- name: time_millis
structure: {}
- name: time_nanos
structure: {}
# Large string literals are encoded using a
# google.protobuf.StringValue message.
- name: large_string
structure: {}
# Large binary literals are encoded using a
# google.protobuf.BytesValue message.
- name: large_binary
structure: {}
# We cannot generate these today because they are parameterized UDTs and
# substrait-cpp does not yet support parameterized UDTs.
- name: decimal256
structure: {}
parameters:
- name: precision
type: integer
min: 0
max: 76
- name: scale
type: integer
min: 0
max: 76
- name: large_list
structure: {}
parameters:
- name: value_type
type: dataType
- name: fixed_size_list
structure: {}
parameters:
- name: value_type
type: dataType
- name: dimension
type: integer
min: 0
- name: duration
structure: {}
parameters:
- name: unit
type: string