-
Notifications
You must be signed in to change notification settings - Fork 16.2k
Implement BigQuery Table Schema Update Operator #15367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
5d878e2 to
a797f32
Compare
0f8cca8 to
791d158
Compare
630d250 to
0fb0247
Compare
|
The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*. |
|
@marcosmarxm How important would you say it is to move functionality into a hook? Solving the use-case was easy using two already existing hooks and I don't think I need to create an additional hook given I don't implement any new functionality in how we talk with BigQuery Further I think mutating a mutable object shouldn't be a problem in this case and I am not too keen to deepcopy it for the sake of it. Lastly the naming of the schema_fields parameter, I wanted to be consistent with naming and input format of that field between this and other operators, do you think it's a problem? Changing it into updates_to_schema_fields or similar is an easy possibility. |
The CI broke because of pylint can you run pre-commit locally to organize imports? |
ee376d1 to
3f3f5b7
Compare
|
@marcosmarxm |
b059345 to
bd9af54
Compare
22a7093 to
7e128e2
Compare
|
The Workflow run is cancelling this PR. Building images for the PR has failed. Follow the workflow link to check the reason. |
89964c5 to
0f960c3
Compare
044e0d0 to
8d8bc74
Compare
potiuk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice!
|
Thanks for the thorough reviews @marcosmarxm and @tswast. Great jiob @thejens ! |
With this change we implement a new operator that handles patching of table schemas in bigquery.
This is needed as typing out an entire schema data structure (schema), in order to set e.g. a field description on a single field requires a lot of overhead. Also, many times the schema is not known or very complex as it may be the result of a Query or parsed automatically when importing files as tables.
This operator is useful for a workflow like:
Upstream: Create a BigQuery table as the output of a Query or import operator. Writer of job/operator knows the names of the fields, perhaps the types, but not necessarily how other schema fields are defined.
Downstream (this operator): Supply a partial schema definition that only contains field names and description values that will be patched on to the "generated by bigquery" schema from upstream.