Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(bigquery): Add session support #15699

Merged
merged 6 commits into from
Nov 16, 2021
Merged

Conversation

quartzmo
Copy link
Member

@quartzmo quartzmo commented Nov 4, 2021

  • Add create_session and session_id params to Project#query_job
  • Add session_id param to Project#query
  • Add create_session and session_id params to Dataset#query_job
  • Add session_id param to Dataset#query
  • Add Job#session_id
  • Add QueryJob::Updater#create_session=
  • Add QueryJob::Updater#session_id=

closes: #15553

* Add create_session and session_id params to Project#query_job
* Add session_id param to Project#query
* Add create_session and session_id params to Dataset#query_job
* Add session_id param to Dataset#query
* Add Job#session_id
* Add QueryJob::Updater#create_session=
* Add QueryJob::Updater#session_id=

closes: googleapis#15553
@google-cla google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Nov 4, 2021
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the BigQuery API. label Nov 4, 2021
@quartzmo quartzmo self-assigned this Nov 4, 2021
@quartzmo quartzmo marked this pull request as ready for review November 5, 2021 20:18
@quartzmo quartzmo requested a review from a team as a code owner November 5, 2021 20:18
@quartzmo
Copy link
Member Author

quartzmo commented Nov 5, 2021

acceptance test failure is a flake:


Google::Cloud::Bigquery::Dataset::ddl_dml::bigquery#test_0002_creates and populates and drops a table with ddl/dml queries:
Google::Cloud::PermissionDeniedError: Exceeded rate limits: too many table update operations for this table. For more information, see https://cloud.google.com/bigquery/docs/troubleshoot-quotas

@@ -1244,6 +1244,8 @@ def routines token: nil, max: nil, filter: nil
# Flattens all nested and repeated fields in the query results. The
# default value is `true`. `large_results` parameter must be `true` if
# this is set to `false`.
# @param [Integer] maximum_billing_tier Deprecated: Change the billing
# tier to allow high-compute queries.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is deprecated, can you include a note suggesting what to use instead?
(There's a similar place in project.rb.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will do, thanks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Copy link
Contributor

@shollyman shollyman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this change hides the connection properties and only exposes session_id as it's own property? How much more work is it to expose the connection properties in a more general fashion?

@quartzmo
Copy link
Member Author

@shollyman Thanks for asking about this. Typically in the handwritten Ruby clients we try to simplify and flatten the API surface to make things easier for users by managing intermediate value objects that only have a field or two. The current list of connection properties, which is described as having only two simple properties (time_zone and session_id) is a candidate for this. Compare:

data = dataset.query "SELECT * FROM temptable", session_id: job.session_id

with

data = dataset.query "SELECT * FROM temptable", connection_properties: [{ "session_id": job.session_id }]

Composing this connection_properties array isn't too bad, but you might not understand from the param docs how it should look and need to locate a sample from which to copy/paste it. Historically we've tried to save users the trouble. Also, if we add session_id now, it doesn't prevent us from adding connection_properties in the future. The duplication would be especially acceptable if session_id is a popular argument that is commonly passed without the other properties.

It really depends on whether more properties will be added, and how many? Are time_zone and other possible future properties as likely as session_id to be passed with each and every query call? If so, I suppose real-world usage might look like this:

conn_props = [
  { "session_id": job.session_id },
  { "time_zone": time_zone },
  { "new_prop": "abc" }
]

data_a = dataset.query "SELECT * FROM temptable_a", connection_properties: conn_props
data_b = dataset.query "SELECT * FROM temptable_b", connection_properties: conn_props
data_c = dataset.query "SELECT * FROM temptable_c", connection_properties: conn_props

Given the examples above, let me know your guesses about the future and your preference!

@shollyman
Copy link
Contributor

RE: other properties, I know of a third one coming soon for associating queries to specific workload profiles. So the number remains small, but is growing.

I think it's reasonable to just expose session now, and expand to a more general connectionproperties feature later, if there's no conflict/oddity with having both specific property and the general access to connection properties hung off the query config.

@quartzmo
Copy link
Member Author

@dazuma Any thoughts on this? Does it seem worth it to have a session_id keyword arg just for user convenience in addition to the harder-to-use array of hashes connection_properties param that will also be needed?

@dazuma
Copy link
Member

dazuma commented Nov 15, 2021

I think it makes sense to keep the separate session_id keyword argument, in keeping with our practice of flattening commonly used arguments like this. It doesn't preclude us from adding the more general argument in the future.

@quartzmo
Copy link
Member Author

@shollyman Does this sound good? If so, any other concerns on this PR?

@meredithslota
Copy link
Contributor

@quartzmo Seth is currently OOO but will reply shortly. :)

@quartzmo
Copy link
Member Author

Merging with guidance from @d-torres.

@quartzmo quartzmo merged commit 6416ee8 into googleapis:main Nov 16, 2021
@quartzmo quartzmo deleted the bigquery-session branch November 16, 2021 23:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API. cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat(bigquery): Add session mode support for queries
4 participants