Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Substrait List/EmptyList literals #10615

Merged
merged 6 commits into from
May 22, 2024

Conversation

Blizzara
Copy link
Contributor

Which issue does this PR close?

Closes #.

Extracted part of #10531 - not necessary part for it but somewhat related

Rationale for this change

What changes are included in this PR?

Adds support for converting from DataFusion List/LargeList ScalarValues into Substrait List/EmptyList Literals and back

Are these changes tested?

Adds a round-trip unit test

Are there any user-facing changes?

More things are now supported, but I don't think Substrait support status is covered by documentation currently?

Adds support for converting from DataFusion List/LargeList ScalarValues into Substrait List/EmptyList Literals and back
@@ -1138,7 +1139,7 @@ fn from_substrait_type(dt: &substrait::proto::Type) -> Result<DataType> {
from_substrait_type(list.r#type.as_ref().ok_or_else(|| {
substrait_datafusion_err!("List type must have inner type")
})?)?;
let field = Arc::new(Field::new("list_item", inner_type, true));
let field = Arc::new(Field::new_list_field(inner_type, true));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a breaking change in the sense that the new field name is just "item" - to align with Arrow default

);
match l.type_variation_reference {
DEFAULT_CONTAINER_TYPE_REF => Ok(ScalarValue::List(Arc::new(
GenericListArray::new_null(field.into(), 1),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this the correct way for creating null lists, or is there something better? The list-of-lists structure ScalarValue::List uses is a bit confusing to me..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is correct.

@Blizzara Blizzara marked this pull request as ready for review May 22, 2024 12:38
@Blizzara
Copy link
Contributor Author

@jonahgao here's a first part of the split - adding support for List types

.iter()
.map(|el| from_substrait_literal(el))
.collect::<Result<Vec<_>>>()?;
let element_type = elements[0].data_type();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check if elements are empty and report an error? The literal input might come from systems other than DataFusion, and they might not be properly implemented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, done: cdc525c

}
}
Some(LiteralType::EmptyList(l)) => {
let element_type = from_substrait_type(l.r#type.clone().unwrap().as_ref())?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can remove unwrap, it can become more robust. 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, without the type specified we don't know what it should be - I guess we could default to NullType (which I think is what DataFusion does if you just do a "SELECT [] FROM ..", do you think that'd make sense?

I feel like Substrait probably intends this field to always exist, though I'm not sure, but e.g. in the Java library they have it as required: https://github.com/substrait-io/substrait-java/blob/79decd20e85d6a1a5623890042ebcf1415cf784a/core/src/main/java/io/substrait/expression/Expression.java#L451

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can return an error like "invalid parameter", but it may not be necessary to do so. Let's keep it as it is for now until someone requests this behavior.

Copy link
Member

@jonahgao jonahgao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thank you for your contribution @Blizzara .

@alamb alamb merged commit b2a31b9 into apache:main May 22, 2024
23 checks passed
@Blizzara Blizzara deleted the avo/substrait-literal-lists branch May 22, 2024 20:19
findepi pushed a commit to findepi/datafusion that referenced this pull request Jul 16, 2024
* Add support for Substrait List/EmptyList literals

Adds support for converting from DataFusion List/LargeList ScalarValues into Substrait List/EmptyList Literals and back

* cleanup

* fix test, add literal roundtrip tests for lists, and fix creating null large lists

* add unit testing for type roundtrips

* fix clippy

* better error if a substrait literal list is empty
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants