Skip to content

Spark3 structured streaming micro_batch read support#1

Merged
SreeramGarlapati merged 28 commits into
spark3.stream.read.baselinefrom
spark3.stream.read.1
Jun 2, 2021
Merged

Spark3 structured streaming micro_batch read support#1
SreeramGarlapati merged 28 commits into
spark3.stream.read.baselinefrom
spark3.stream.read.1

Conversation

@SreeramGarlapati
Copy link
Copy Markdown
Owner

This work is an extension of the idea in issue apache#179 & the Spark2 work done in PR apache#2272 - only that - this is for Spark3.

In the current implementation:

  • Iceberg Snapshot is the upper bound for MicroBatch. A given MicroBatch will only Span within a Snapshot. It will not be composed of multiple Snapshots. BatchSize - is used to limit the number of files with in a given snapshot.
  • The streaming reader - will error out if it encounters any Snapshot of type NOT EQUAL to type APPEND.
  • Handling DELETES, REPLACE & OVERWRITES is something for future.
  • Columnar reads are not enabled. Something for future.

cc: @aokolnychyi & @RussellSpitzer & @holdenk @rdblue @rdsr

@github-actions github-actions Bot added the SPARK label Jun 2, 2021
@SreeramGarlapati SreeramGarlapati merged commit 41041f3 into spark3.stream.read.baseline Jun 2, 2021
@SreeramGarlapati SreeramGarlapati deleted the spark3.stream.read.1 branch June 2, 2021 06:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant