[#8946] improvement(lance): supports more dataTypes for lance table creation#8947
Conversation
| } | ||
| // since the table metadata will load from Gravitino storage directly, we don't need to | ||
| // implement this method for now. | ||
| throw new UnsupportedOperationException("toGravitino is not implemented yet."); |
There was a problem hiding this comment.
Do we need such a conversion when request is sending from Lance rest server to Gravitino?
There was a problem hiding this comment.
Yes, I will implement this method when I refine the Lance REST server type conversion.
| which allows you to define any Arrow data type by providing the JSON string of an Arrow `Field`. | ||
|
|
||
| The JSON string must conform to the Apache Arrow `Field` [specification](https://github.com/apache/arrow-java/blob/ed81e5981a2bee40584b3a411ed755cb4cc5b91f/vector/src/main/java/org/apache/arrow/vector/types/pojo/Field.java#L80C1-L86C68), | ||
| including details such as the field name, data type, and nullability. For example, you can define a `LargeUtf8` type field using its JSON representation. |
There was a problem hiding this comment.
Can you list the types that we don't support, and need to use External to represent?
|
@yuqi1129 can you please take a look? |
| public Field toArrowField(String name, Type type, boolean nullable) { | ||
| switch (type.name()) { | ||
| case LIST: | ||
| Types.ListType listType = (Types.ListType) type; |
There was a problem hiding this comment.
There is a type named FixedSizeList in Arrow, which Lance commonly uses. Do you handle it? This is how lance stores vector data.
There was a problem hiding this comment.
yes, plz see the doc in this PR
| | `Large Utf8` | `External("{\"name\":\"col_name\",\"nullable\":true,\"type\":{\"name\":\"largeutf8\"},\"children\":[]}")` | | ||
| | `Large Binary` | `External("{\"name\":\"col_name\",\"nullable\":true,\"type\":{\"name\":\"largebinary\"},\"children\":[]}")` | | ||
| | `Large List` | `External("{\"name\":\"col_name\",\"nullable\":true,\"type\":{\"name\":\"largelist\"},\"children\":[{\"name\":\"element\",\"nullable\":true,\"type\":{\"name\":\"int\", \"bitWidth\":32, \"isSigned\": true},\"children\":[]}]}")` | | ||
| | `Fixed-Size List` | `External("{\"name\":\"col_name\",\"nullable\":true,\"type\":{\"name\":\"fixedsizelist\", \"listSize\":10},\"children\":[{\"name\":\"element\",\"nullable\":true,\"type\":{\"name\":\"int\", \"bitWidth\":32, \"isSigned\": true},\"children\":[]}]}")` | |
There was a problem hiding this comment.
Why is fixedsizelist not a camel-case string?
There was a problem hiding this comment.
It's the JSON spec from arrow lib
yuqi1129
left a comment
There was a problem hiding this comment.
Just a minor one, others LGTM
| name, | ||
| listField, | ||
| Lists.newArrayList( | ||
| toArrowField("element", listType.elementType(), listType.elementNullable()))); |
There was a problem hiding this comment.
Is the column name of subtype a constant value element or a random name?
There was a problem hiding this comment.
It's all okay, it didn't take effect in the actual parsing.
…able creation (apache#8947) ### What changes were proposed in this pull request? supports more dataTypes for lance table creation ### Why are the changes needed? Fix: apache#8946 ### Does this PR introduce _any_ user-facing change? yes, more column data types supports ### How was this patch tested? tests added
What changes were proposed in this pull request?
supports more dataTypes for lance table creation
Why are the changes needed?
Fix: #8946
Does this PR introduce any user-facing change?
yes, more column data types supports
How was this patch tested?
tests added