Skip to content
Open
Changes from 1 commit
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
a975186
Fix: Update BigQuery Storage Arrow samples batching logic
AlexZMLyu Dec 11, 2025
186d6a6
Chore: Apply manual formatting to Arrow samples
AlexZMLyu Dec 11, 2025
e9a7007
Chore: Apply manual Black-style formatting
AlexZMLyu Dec 11, 2025
5a91e4e
Chore: Manual formatting adjustments
AlexZMLyu Dec 11, 2025
84d0fc1
Refactor: Extract AppendRowsRequest creation to helper
AlexZMLyu Dec 11, 2025
eb97b8f
Fix AttributeError in append_rows_with_arrow.py
AlexZMLyu Dec 11, 2025
593ac97
Update append_rows_with_arrow.py
AlexZMLyu Dec 12, 2025
1eb8266
Update append_rows_with_arrow.py
AlexZMLyu Dec 12, 2025
2239db8
Fix: Update verify_result in pyarrow sample
AlexZMLyu Dec 16, 2025
b510cec
Fix: Update PyArrow serialization in append_rows_with_arrow.py
AlexZMLyu Dec 16, 2025
f581b33
feat: Measure request generation time in pyarrow sample
AlexZMLyu Dec 17, 2025
610f38e
Fix: Improve PyArrow batching and serialization in BigQuery Storage s…
AlexZMLyu Dec 18, 2025
501e993
Refactor: Use generator for request creation and block on send
AlexZMLyu Dec 18, 2025
ec49ec5
samples: reformat append_rows_with_arrow.py
AlexZMLyu Dec 18, 2025
2464fd3
enhance unit test for write request generation
AlexZMLyu Jan 1, 2026
fc14e4a
Merge branch 'googleapis:main' into main
AlexZMLyu Jan 2, 2026
fa73c4b
fix: update comment to reference correct variable name
AlexZMLyu Jan 6, 2026
8867ecd
fix: logic update in generate_write_requests and updated tests
AlexZMLyu Jan 7, 2026
613e330
fix: update assertion to match new request generation logic
AlexZMLyu Jan 7, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -160,17 +160,43 @@ def generate_pyarrow_table(num_rows=TABLE_LENGTH):


def generate_write_requests(pyarrow_table):
# Determine max_chunksize of the record batches. Because max size of
# AppendRowsRequest is 10 MB, we need to split the table if it's too big.
# See: https://cloud.google.com/bigquery/docs/reference/storage/rpc/google.cloud.bigquery.storage.v1#appendrowsrequest
max_request_bytes = 10 * 2**20 # 10 MB
chunk_num = int(pyarrow_table.nbytes / max_request_bytes) + 1
chunk_size = int(pyarrow_table.num_rows / chunk_num)

# Construct request(s).
for batch in pyarrow_table.to_batches(max_chunksize=chunk_size):
# Maximum size for a single AppendRowsRequest is 10 MB.
# To be safe, we'll aim for a soft limit of 7 MB.
max_request_bytes = 7 * 1024 * 1024 # 7 MB

batches_in_request = []
current_size = 0

# Split table into batches of one row.
for row_batch in pyarrow_table.to_batches(max_chunksize=1):
serialized_batch = row_batch.serialize().to_pybytes()
batch_size = len(serialized_batch)

if batch_size > max_request_bytes:
raise ValueError(
f"A single PyArrow batch of one row is larger than the maximum request size (batch size: {batch_size} > max request size: {max_request_bytes}). "
"Cannot proceed."
)

if current_size + batch_size > max_request_bytes and batches_in_request:
# Combine collected batches and yield request
combined_table = pa.Table.from_batches(batches_in_request)
request = gapic_types.AppendRowsRequest()
request.arrow_rows.rows.serialized_record_batch = combined_table.serialize().to_pybytes()
yield request

# Reset for next request.
batches_in_request = []
current_size = 0

batches_in_request.append(row_batch)
current_size += batch_size

# Yield any remaining batches
if batches_in_request:
combined_table = pa.Table.from_batches(batches_in_request)
request = gapic_types.AppendRowsRequest()
request.arrow_rows.rows.serialized_record_batch = batch.serialize().to_pybytes()
request.arrow_rows.rows.serialized_record_batch = combined_table.serialize().to_pybytes()
yield request


Expand Down
Loading