Skip to content

Conversation

@antoineeripret
Copy link
Contributor

This change allows the user to run a dry run query using the read_gbq function. Instead of returning a pd.DataFrame, the behavior is changed and the amount of data processed (in GB) is returned.

@antoineeripret antoineeripret requested review from a team as code owners November 6, 2025 09:32
@product-auto-label product-auto-label bot added size: s Pull request size is small. api: bigquery Issues related to the googleapis/python-bigquery-pandas API. labels Nov 6, 2025
@antoineeripret antoineeripret changed the title Add dry run feat: add dry run to the read_gbq function Nov 6, 2025
@GarrettWu GarrettWu assigned shuoweil and unassigned GarrettWu Nov 6, 2025
@shuoweil shuoweil added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Nov 7, 2025
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Nov 7, 2025
@shuoweil
Copy link

shuoweil commented Nov 7, 2025

@antoineeripret Could you please check the failed tests? Thanks a lot.

@product-auto-label product-auto-label bot added size: m Pull request size is medium. and removed size: s Pull request size is small. labels Nov 10, 2025
@antoineeripret
Copy link
Contributor Author

@shuoweil , I've added a new commit with some changes to fix tests. I've ran nox -s unit-3.10 and got 0 fails. Thank you !

@shuoweil
Copy link

  • lint / lint (pull_request)

Hi @antoineeripret, could you please check the failed check please? It should be a quick fix. Thanks a lot.

@antoineeripret
Copy link
Contributor Author

Hi @shuoweil, the last commit should fix it. Got the following on my local env:

python -m black --check docs pandas_gbq tests noxfile.py setup.py
All done! ✨ 🍰 ✨
45 files would be left unchanged.

# we need to get it from the query result
# For query_and_wait_via_client_library, the RowIterator should have job set
raise ValueError("Cannot access QueryJob from RowIterator for dry_run")
return query_job.total_bytes_processed / 1024**3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we simply return query_job.total_bytes_processed without further processing?

Reasons:

  • The total_bytes_processed has integer type, which is more precise than a float type
  • For small tables (ones with 1-10 MB sizes), converting the size to GB makes the result less readable
  • It aligns more with the behavior of BigQuery Python client to return size in bytes.

Generally speaking, we want the caller of this function to perform unit conversions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sycai, good call ! I've though about my own usage, but didn't think about the bigger picture here. I'll commit the change. :)

@antoineeripret antoineeripret requested a review from sycai November 16, 2025 10:48
Copy link
Contributor

@sycai sycai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I think we should be good to go once the doc and tests are updated.

@antoineeripret antoineeripret requested a review from sycai November 18, 2025 07:06
@antoineeripret
Copy link
Contributor Author

@sycai : updated :)

@sycai
Copy link
Contributor

sycai commented Nov 18, 2025

Looks like there's a lint error. Could you fix it? Thanks a lot!

-------
df: DataFrame
DataFrame representing results of query.
df: DataFrame or float
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc nit: "DataFrame or int"

DataFrame representing results of query.
df: DataFrame or float
DataFrame representing results of query. If ``dry_run=True``, returns
a float representing the amount of data that would be processed (in bytes).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc nit: "returns an int representing ..."

@sycai sycai requested a review from tswast November 18, 2025 18:36
@shuoweil
Copy link

shuoweil commented Nov 18, 2025

@antoineeripret I believe lint fails. Could you please update it? It still fails with the new commit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: bigquery Issues related to the googleapis/python-bigquery-pandas API. size: m Pull request size is medium.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants