Skip to content

Conversation

@kbendick
Copy link
Contributor

@kbendick kbendick commented Oct 9, 2021

Replaces a somewhat complicated set of calls to Preconditions.checkState in the Spark 3 MicroBatchStream shouldProcess method with a simpler switch statement for readability.

Also places the most common checks first (instead of doing a string comparison check twice against both "DELETE" and "APPEND").

This method was a bit confusing for me on first read and given that the APPEND check is the most common case, processing it first and reducing the number of string comparison checks will likely be more performant. It is more readable in my opinion.

I was going to allow for also skipping OVERWRITE if users set a flag, but there is already open discussion around this and some older PRs. So I closed my original PR in favor of just simplify the check and continuing the discussion elsewhere: #3267

@github-actions github-actions bot added the spark label Oct 9, 2021
@kbendick
Copy link
Contributor Author

kbendick commented Oct 9, 2021

There are a few corner cases that don't appear to be handled. For example if somebody runs a CreateOrReplaceTable operation on an existing table and somebody streams from the beginning it will either NPE or likely return to the beginning of the table entirely.

But given that the Spark streaming source is presently only able to handle a limited set of use cases, I don't think convoluting it for edge cases is the best idea. If anybody would like to discuss my findings, I'd be happy to do so as I spent some time looking into CDC in other systems and how it can be efficiently applied for the Spark stream (which doesn't have an in-built notion of CDC or deletes in general). There's a bit more of a break down on the other PR mentioned above.

@rdblue rdblue merged commit a32119b into apache:master Oct 11, 2021
@rdblue
Copy link
Contributor

rdblue commented Oct 11, 2021

Merged. Thanks, @kbendick!

RussellSpitzer pushed a commit to RussellSpitzer/iceberg that referenced this pull request Oct 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants