-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: STRUCT and ARRAY support #318
Conversation
FTR, WRT superset, once I finally got it working :), it behaves the same with and without these changes. BTW, we have logic that tries to unpack sub-structs, I think so that there would eventually be scalars for superset to work with. If you have an array of structs, we still create columns for the fields of the struct in the array. This causes superset to error, because it has no way to get at structs in an array. We should probably not unpack structs in arrays. |
Otherwise, the BQ doesn't handle arrays of structs.
Here is the summary of changes. You are about to add 10 region tags.
This comment is generated by snippet-bot.
|
I want it narrow to avoid horizonal scrolling
Some notes for reviewers:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't quite digested everything in this PR yet, but I figured I'd share the feedback I have so far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it! Just a few things I think we should clarify before merging.
sqlalchemy_bigquery/_struct.py
Outdated
global type_compiler | ||
|
||
try: | ||
process = type_compiler.process | ||
except AttributeError: | ||
type_compiler = base.dialect.type_compiler(base.dialect()) | ||
process = type_compiler.process |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we put this in a _get_type_compiler
/ _get_process
function? I don't see anywhere else we initialize type_compiler
, but I'd be more comfortable having this logic closer to the # We have to delay getting the type compiler, because of circular imports. :(
comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refactored so this is combined and isolated in one place using a new, better named _get_subtype_col_spec
function.
sqlalchemy_bigquery/_struct.py
Outdated
type_compiler = base.dialect.type_compiler(base.dialect()) | ||
process = type_compiler.process | ||
|
||
fields = ", ".join(f"{name} {process(type_)}" for name, type_ in self.__fields) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume process
is able to handle nested arrays/structs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
sqlalchemy_bigquery/_struct.py
Outdated
f"STRUCT fields can only be accessed with strings field names," | ||
f" not {name}." | ||
) | ||
subtype = self.expr.type._STRUCT__byname.get(name.lower()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does _STRUCT__byname
come from? I'm assuming somewhere from SQLAlchemy, but I'm not getting any results when searching for byname
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I think I figured it out: https://docs.python.org/3/tutorial/classes.html#private-variables
Any identifier of the form
__spam
(at least two leading underscores, at most one trailing underscore) is textually replaced with_classname__spam
, where classname is the current class name with leading underscore(s) stripped.
Can we comment about this? I assume we have to do it because we know self.expr.type
is a STRUCT
, but it's not self
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refactored this to make name mangling more explicit and consistent, so I don't think comments are needed anymore. See if you agree. :)
I mainly use "private" variables, which aren't :), to avoid namespace conflicts when subclassing across responsibility boundaries. Arguably, explicit naming is better.
sqlalchemy_bigquery/_struct.py
Outdated
return operator, index, subtype | ||
|
||
def __getattr__(self, name): | ||
if name.lower() in self.expr.type._STRUCT__byname: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused why self.__byname
doesn't work in this case.
Edit: I see now that it's part of the Comparator
class. Still probably worth a similar comment to the one I recommend in _setup_getitem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my response on name mangling
sqlalchemy_bigquery/_types.py
Outdated
for f in field.fields | ||
] | ||
results += _get_transitive_schema_fields(sub_fields, cur_fields) | ||
cur_fields.pop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we pop these off, does that mean we don't get the top-level struct field, just the leaf fields? I suspect this might hide some ARRAY columns if a parent node has mode REPEATED
, but is not included.
Edit: I see the top field is added to results on line 83. Might be worth a comment as to why we pop here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got rid of cur_fields. It wasn't needed (anymore).
sqlalchemy_bigquery/_types.py
Outdated
for field in fields: | ||
results += [field] | ||
if field.field_type == "RECORD": | ||
cur_fields.append(field) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand what cur_fields
is doing. Is there a better name we can pick for this? Maybe it's referring to ancestors
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha, it's not doing anything.
This is based on
python-bigquery-sqlalchemy/pybigquery/sqlalchemy_bigquery.py
Lines 503 to 523 in db64424
def _get_columns_helper(self, columns, cur_columns): | |
""" | |
Recurse into record type and return all the nested field names. | |
As contributed by @sumedhsakdeo on issue #17 | |
""" | |
results = [] | |
for col in columns: | |
results += [ | |
SchemaField( | |
name=".".join(col.name for col in cur_columns + [col]), | |
field_type=col.field_type, | |
mode=col.mode, | |
description=col.description, | |
fields=col.fields, | |
) | |
] | |
if col.field_type == "RECORD": | |
cur_columns.append(col) | |
results += self._get_columns_helper(col.fields, cur_columns) | |
cur_columns.pop() | |
return results |
I've refactored it quite a bit and failed to notice that this wasn't needed any more. Fixed.
tests/unit/test__struct.py
Outdated
(_col().NAME, "`t`.`person`.NAME"), | ||
(_col().children, "`t`.`person`.children"), | ||
( | ||
_col().children[0].label("anon_1"), # SQLAlchemy doesn't add the label |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we file an issue for this to investigate later? If so, let's add TODO
and link to the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
…Alchemy 1.3 and 1/4
Also, check for both RECORD and STRUCT fild types, in case the API ever starts returning STRUCT.
…accessing array items
Co-authored-by: Tim Swast <[email protected]>
global _get_subtype_col_spec | ||
|
||
type_compiler = base.dialect.type_compiler(base.dialect()) | ||
_get_subtype_col_spec = type_compiler.process |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fancy! I didn't realize a function could replace itself. I like it.
🤖 I have created a release \*beep\* \*boop\* --- ## [1.2.0](https://www.github.com/googleapis/python-bigquery-sqlalchemy/compare/v1.1.0...v1.2.0) (2021-09-09) ### Features * STRUCT and ARRAY support ([#318](https://www.github.com/googleapis/python-bigquery-sqlalchemy/issues/318)) ([6624b10](https://www.github.com/googleapis/python-bigquery-sqlalchemy/commit/6624b10ded73bbca6f40af73aaeaceb95c381b63)) ### Bug Fixes * the unnest function lost needed type information ([#298](https://www.github.com/googleapis/python-bigquery-sqlalchemy/issues/298)) ([1233182](https://www.github.com/googleapis/python-bigquery-sqlalchemy/commit/123318269876e7f76c7f0f2daa5f5b365026cd3f)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
def visit_getitem_binary(self, binary, operator_, **kw): | ||
left = self.process(binary.left, **kw) | ||
right = self.process(binary.right, **kw) | ||
return f"{left}[OFFSET({right})]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SAFE_OFFSET support would be nice too
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes #293
Fixes #314
Fixed #233
Fixes #37
🦕