Skip to content

Use Storage write API in BigQuery connector#18897

Merged
ebyhr merged 1 commit intomasterfrom
ebi/bigquery-storage-write
Nov 22, 2023
Merged

Use Storage write API in BigQuery connector#18897
ebyhr merged 1 commit intomasterfrom
ebi/bigquery-storage-write

Conversation

@ebyhr
Copy link
Copy Markdown
Member

@ebyhr ebyhr commented Sep 2, 2023

Release notes

(x) Release notes are required, with the following suggested text:

# BigQuery
* Improve performance when writing rows. ({issue}`18897`)

@cla-bot cla-bot bot added the cla-signed label Sep 2, 2023
@github-actions github-actions bot added the bigquery BigQuery connector label Sep 2, 2023
@ebyhr ebyhr force-pushed the ebi/bigquery-storage-write branch from 509ea04 to e92d7ed Compare September 2, 2023 04:05
@ebyhr ebyhr marked this pull request as draft September 4, 2023 02:05
{
InsertAllRequest.Builder batch = InsertAllRequest.newBuilder(tableId);
JSONArray batch = new JSONArray();
for (int position = 0; position < page.getPositionCount(); position++) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the data be batched based on a request size limit (config)? https://cloud.google.com/bigquery/quotas#write-api-limits

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good to do but pre-existing issue.

@ebyhr ebyhr force-pushed the ebi/bigquery-storage-write branch 5 times, most recently from 6d1e190 to 3b10a3f Compare November 12, 2023 23:05
@ebyhr ebyhr marked this pull request as ready for review November 13, 2023 00:43

private void insertWithCommitted(JSONArray batch)
{
WriteStream stream = WriteStream.newBuilder().setType(COMMITTED).build();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are creating a new stream per Page.
BigQuery has a limit of 1k streams open at a time and 30k/4 hours (7.5k creations per hour).

Let's either document this if we think it's shouldn't be a problem or let's consider creating a single stream per page-sink/tablewriter task.

Also do we consider to use "pending mode" long term to provide proper isolation? With current mode ("committed") if a single stream fails then writes from other streams will still succeed and be visible. i.e. it's not ACID and there's no way to rollback. Note that this is same behaviour that we already had so it's not a regression in that sense.

@ebyhr ebyhr force-pushed the ebi/bigquery-storage-write branch from 3b10a3f to f38227b Compare November 21, 2023 23:44
@ebyhr ebyhr requested a review from hashhar November 22, 2023 02:02
Copy link
Copy Markdown
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: multi-line

Suggested change
CreateWriteStreamRequest createWriteStreamRequest = CreateWriteStreamRequest.newBuilder().setParent(tableName.toString()).setWriteStream(stream).build();
CreateWriteStreamRequest createWriteStreamRequest = CreateWriteStreamRequest.newBuilder()
.setParent(tableName.toString())
.setWriteStream(stream)
.build();

@ebyhr ebyhr force-pushed the ebi/bigquery-storage-write branch from f38227b to 89d0540 Compare November 22, 2023 07:53
@ebyhr ebyhr merged commit deb8ae0 into master Nov 22, 2023
@ebyhr ebyhr deleted the ebi/bigquery-storage-write branch November 22, 2023 08:18
@github-actions github-actions bot added this to the 434 milestone Nov 22, 2023
@hashhar
Copy link
Copy Markdown
Member

hashhar commented Nov 22, 2023

@ebyhr Does the docs need updating about any new IAM permissions which are needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bigquery BigQuery connector cla-signed

Development

Successfully merging this pull request may close these issues.

3 participants