Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Task]: Break up single pending stream commit into multiple commits #31872

Open
2 of 16 tasks
ahmedabu98 opened this issue Jul 12, 2024 · 1 comment
Open
2 of 16 tasks

Comments

@ahmedabu98
Copy link
Contributor

ahmedabu98 commented Jul 12, 2024

What needs to happen?

In the BigQuery Storage API batch connector, we use Pending streams to write to BigQuery. The final step in the connector is to commit stream contents to the table.

Currently we do one single batch commit for all streams. There is a quota placed on the number of bytes we can commit per operation: 1TB for small regions, 10TB for multi-regions. Essentially any batch write job's size will be restricted to this limit. Would it be a good idea to break this up into multiple back-to-back commits?

@Abacn brings up a good point in this comment about whether this is done intentionally to avoid partially written data in the rare case where the whole pipeline fails between commits (and is unable to retry).

However, limiting it to one commit would place a hard restriction on the amount of data one can write with this connector.

Issue Priority

Priority: 2 (default / most normal work should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner
@ahmedabu98
Copy link
Contributor Author

@reuvenlax can you let us know if it would be bad practice to have multiple commit operations?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant