Skip to content

Conversation

@nastra
Copy link
Contributor

@nastra nastra commented Jan 5, 2024

This backports #9340 to Spark 3.4

@nastra nastra force-pushed the spark34-view-read-support branch from 7b92c36 to 45e2f84 Compare January 5, 2024 12:37
@nastra nastra added this to the Iceberg 1.5.0 milestone Jan 5, 2024
Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nastra !

@amogh-jahagirdar
Copy link
Contributor

Since this was a clean backport. I'll go ahead and merge.

@amogh-jahagirdar amogh-jahagirdar merged commit 2101ac2 into apache:main Jan 5, 2024
@nastra nastra deleted the spark34-view-read-support branch January 5, 2024 16:23
geruh pushed a commit to geruh/iceberg that referenced this pull request Jan 26, 2024
adnanhemani pushed a commit to adnanhemani/iceberg that referenced this pull request Jan 30, 2024
Comment on lines +128 to +131
// there's no explicit view defined for spark, so it will fall back to the defined trino view
assertThat(sql("SELECT * FROM %s", viewName))
.hasSize(10)
.containsExactlyInAnyOrderElementsOf(expected);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure we want to make this the default behavior? There are views that are equally parsable by both Trino and Spark but they mean different things. So just using a Trino view "as is" in Spark may be incorrect. Consider the case when the view accesses array elements, where array indexes start from 0 in Spark and 1 in Trino. FYI @rdblue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might want to consider having a strictness flag of some sort, that by default would only allow reading/modifying views that have been created by Spark. In a lenient mode this could then also fallback reading views that have been created by a different engine. @wmoustafa thoughts on that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not prefer it, but yes that is the least we could do, along with explicitly stating the caveats like array indexes, null handling, etc.

devangjhabakh pushed a commit to cdouglas/iceberg that referenced this pull request Apr 22, 2024
@nastra nastra self-assigned this Nov 25, 2024
zhongyujiang pushed a commit to zhongyujiang/iceberg that referenced this pull request Apr 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants