Use Storage write API in BigQuery connector#18897
Conversation
509ea04 to
e92d7ed
Compare
| { | ||
| InsertAllRequest.Builder batch = InsertAllRequest.newBuilder(tableId); | ||
| JSONArray batch = new JSONArray(); | ||
| for (int position = 0; position < page.getPositionCount(); position++) { |
There was a problem hiding this comment.
should the data be batched based on a request size limit (config)? https://cloud.google.com/bigquery/quotas#write-api-limits
There was a problem hiding this comment.
good to do but pre-existing issue.
6d1e190 to
3b10a3f
Compare
|
|
||
| private void insertWithCommitted(JSONArray batch) | ||
| { | ||
| WriteStream stream = WriteStream.newBuilder().setType(COMMITTED).build(); |
There was a problem hiding this comment.
we are creating a new stream per Page.
BigQuery has a limit of 1k streams open at a time and 30k/4 hours (7.5k creations per hour).
Let's either document this if we think it's shouldn't be a problem or let's consider creating a single stream per page-sink/tablewriter task.
Also do we consider to use "pending mode" long term to provide proper isolation? With current mode ("committed") if a single stream fails then writes from other streams will still succeed and be visible. i.e. it's not ACID and there's no way to rollback. Note that this is same behaviour that we already had so it's not a regression in that sense.
3b10a3f to
f38227b
Compare
There was a problem hiding this comment.
nit: multi-line
| CreateWriteStreamRequest createWriteStreamRequest = CreateWriteStreamRequest.newBuilder().setParent(tableName.toString()).setWriteStream(stream).build(); | |
| CreateWriteStreamRequest createWriteStreamRequest = CreateWriteStreamRequest.newBuilder() | |
| .setParent(tableName.toString()) | |
| .setWriteStream(stream) | |
| .build(); |
f38227b to
89d0540
Compare
|
@ebyhr Does the docs need updating about any new IAM permissions which are needed. |
Release notes
(x) Release notes are required, with the following suggested text: