Skip to content

[#8946] improvement(lance): supports more dataTypes for lance table creation#8947

Merged
mchades merged 2 commits intoapache:branch-lance-namepspace-devfrom
mchades:lance-type
Oct 29, 2025
Merged

[#8946] improvement(lance): supports more dataTypes for lance table creation#8947
mchades merged 2 commits intoapache:branch-lance-namepspace-devfrom
mchades:lance-type

Conversation

@mchades
Copy link
Copy Markdown
Contributor

@mchades mchades commented Oct 28, 2025

What changes were proposed in this pull request?

supports more dataTypes for lance table creation

Why are the changes needed?

Fix: #8946

Does this PR introduce any user-facing change?

yes, more column data types supports

How was this patch tested?

tests added

@mchades mchades requested review from jerryshao and yuqi1129 October 28, 2025 15:56
}
// since the table metadata will load from Gravitino storage directly, we don't need to
// implement this method for now.
throw new UnsupportedOperationException("toGravitino is not implemented yet.");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need such a conversion when request is sending from Lance rest server to Gravitino?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will implement this method when I refine the Lance REST server type conversion.

Comment thread docs/generic-lakehouse-catalog.md Outdated
which allows you to define any Arrow data type by providing the JSON string of an Arrow `Field`.

The JSON string must conform to the Apache Arrow `Field` [specification](https://github.com/apache/arrow-java/blob/ed81e5981a2bee40584b3a411ed755cb4cc5b91f/vector/src/main/java/org/apache/arrow/vector/types/pojo/Field.java#L80C1-L86C68),
including details such as the field name, data type, and nullability. For example, you can define a `LargeUtf8` type field using its JSON representation.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you list the types that we don't support, and need to use External to represent?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

@mchades mchades requested a review from jerryshao October 29, 2025 09:47
@jerryshao
Copy link
Copy Markdown
Contributor

@yuqi1129 can you please take a look?

public Field toArrowField(String name, Type type, boolean nullable) {
switch (type.name()) {
case LIST:
Types.ListType listType = (Types.ListType) type;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a type named FixedSizeList in Arrow, which Lance commonly uses. Do you handle it? This is how lance stores vector data.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, plz see the doc in this PR

| `Large Utf8` | `External("{\"name\":\"col_name\",\"nullable\":true,\"type\":{\"name\":\"largeutf8\"},\"children\":[]}")` |
| `Large Binary` | `External("{\"name\":\"col_name\",\"nullable\":true,\"type\":{\"name\":\"largebinary\"},\"children\":[]}")` |
| `Large List` | `External("{\"name\":\"col_name\",\"nullable\":true,\"type\":{\"name\":\"largelist\"},\"children\":[{\"name\":\"element\",\"nullable\":true,\"type\":{\"name\":\"int\", \"bitWidth\":32, \"isSigned\": true},\"children\":[]}]}")` |
| `Fixed-Size List` | `External("{\"name\":\"col_name\",\"nullable\":true,\"type\":{\"name\":\"fixedsizelist\", \"listSize\":10},\"children\":[{\"name\":\"element\",\"nullable\":true,\"type\":{\"name\":\"int\", \"bitWidth\":32, \"isSigned\": true},\"children\":[]}]}")` |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is fixedsizelist not a camel-case string?

Copy link
Copy Markdown
Contributor Author

@mchades mchades Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the JSON spec from arrow lib

Copy link
Copy Markdown
Contributor

@yuqi1129 yuqi1129 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a minor one, others LGTM

name,
listField,
Lists.newArrayList(
toArrowField("element", listType.elementType(), listType.elementNullable())));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the column name of subtype a constant value element or a random name?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's all okay, it didn't take effect in the actual parsing.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.

@mchades mchades merged commit bc1b77a into apache:branch-lance-namepspace-dev Oct 29, 2025
26 checks passed
@mchades mchades deleted the lance-type branch October 30, 2025 02:38
jerryshao pushed a commit to jerryshao/gravitino that referenced this pull request Nov 11, 2025
…able creation (apache#8947)

### What changes were proposed in this pull request?

supports more dataTypes for lance table creation

### Why are the changes needed?

Fix: apache#8946 

### Does this PR introduce _any_ user-facing change?

yes, more column data types supports

### How was this patch tested?

tests added
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants