Skip to content

Commit

Permalink
feat: STRUCT and ARRAY support (#318)
Browse files Browse the repository at this point in the history
  • Loading branch information
jimfulton authored Sep 9, 2021
1 parent a42283c commit 6624b10
Show file tree
Hide file tree
Showing 21 changed files with 903 additions and 159 deletions.
2 changes: 1 addition & 1 deletion docs/alembic.rst
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ Supported operations:
<https://alembic.sqlalchemy.org/en/latest/ops.html#alembic.operations.Operations.rename_table>`_

Note that some of the operations above have limited capability, again
do to `BigQuery limitations
due to `BigQuery limitations
<https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language>`_.

The `execute` operation allows access to BigQuery-specific
Expand Down
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
:maxdepth: 2

README
struct
geography
alembic
reference
Expand Down
69 changes: 69 additions & 0 deletions docs/struct.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
Working with BigQuery STRUCT data
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The BigQuery `STRUCT data type
<https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#struct_type>`_
provided data that are collections of named fields.

`sqlalchemy-bigquery` provided a STRUCT type that can be used to
define tables with STRUCT columns:

.. literalinclude:: samples/snippets/STRUCT.py
:language: python
:dedent: 4
:start-after: [START bigquery_sqlalchemy_create_table_with_struct]
:end-before: [END bigquery_sqlalchemy_create_table_with_struct]

`STRUCT` types can be nested, as in this example. Struct fields can
be defined in two ways:

- Fields can be provided as keyword arguments, as in the `cylinder`
and `horsepower` fields in this example.

- Fields can be provided as name-type tuples provided as positional
arguments, as with the `count` and `compression` fields in this example.

STRUCT columns are automatically created when existing database tables
containing STRUCT columns are introspected.

Struct data are represented in Python as Python dictionaries:

.. literalinclude:: samples/snippets/STRUCT.py
:language: python
:dedent: 4
:start-after: [START bigquery_sqlalchemy_insert_struct]
:end-before: [END bigquery_sqlalchemy_insert_struct]

When querying struct fields, you can use attribute access syntax:

.. literalinclude:: samples/snippets/STRUCT.py
:language: python
:dedent: 4
:start-after: [START bigquery_sqlalchemy_query_struct]
:end-before: [END bigquery_sqlalchemy_query_struct]

or mapping access:

.. literalinclude:: samples/snippets/STRUCT.py
:language: python
:dedent: 4
:start-after: [START bigquery_sqlalchemy_query_getitem]
:end-before: [END bigquery_sqlalchemy_query_getitem]

and field names are case insensitive:

.. literalinclude:: samples/snippets/STRUCT.py
:language: python
:dedent: 4
:start-after: [START bigquery_sqlalchemy_query_STRUCT]
:end-before: [END bigquery_sqlalchemy_query_STRUCT]

When using attribute-access syntax, field names may conflict with
column attribute names. For example SQLAlchemy columns have `name`
and `type` attributes, among others. When accessing a field whose name
conflicts with a column attribute name, either use mapping access, or
spell the field name with upper-case letters.




90 changes: 90 additions & 0 deletions samples/snippets/STRUCT.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# Copyright (c) 2021 The sqlalchemy-bigquery Authors
#
# Permission is hereby granted, free of charge, to any person obtaining a copy of
# this software and associated documentation files (the "Software"), to deal in
# the Software without restriction, including without limitation the rights to
# use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
# the Software, and to permit persons to whom the Software is furnished to do so,
# subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
# FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
# COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
# IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
# CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


def example(engine):
# fmt: off
# [START bigquery_sqlalchemy_create_table_with_struct]
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Column, String, Integer, Float
from sqlalchemy_bigquery import STRUCT

Base = declarative_base()

class Car(Base):
__tablename__ = "Cars"

model = Column(String, primary_key=True)
engine = Column(
STRUCT(
cylinder=STRUCT(("count", Integer),
("compression", Float)),
horsepower=Integer)
)

# [END bigquery_sqlalchemy_create_table_with_struct]
Car.__table__.create(engine)

# [START bigquery_sqlalchemy_insert_struct]
from sqlalchemy.orm import sessionmaker

Session = sessionmaker(bind=engine)
session = Session()

sebring = Car(model="Sebring",
engine=dict(
cylinder=dict(
count=6,
compression=18.0),
horsepower=235))
townc = Car(model="Town and Counttry",
engine=dict(
cylinder=dict(
count=6,
compression=16.0),
horsepower=251))
xj8 = Car(model="XJ8",
engine=dict(
cylinder=dict(
count=8,
compression=10.75),
horsepower=575))

session.add_all((sebring, townc, xj8))
session.commit()

# [END bigquery_sqlalchemy_insert_struct]

# [START bigquery_sqlalchemy_query_struct]
sixes = session.query(Car).filter(Car.engine.cylinder.count == 6)
# [END bigquery_sqlalchemy_query_struct]
sixes1 = list(sixes)

# [START bigquery_sqlalchemy_query_STRUCT]
sixes = session.query(Car).filter(Car.engine.CYLINDER.COUNT == 6)
# [END bigquery_sqlalchemy_query_STRUCT]
sixes2 = list(sixes)

# [START bigquery_sqlalchemy_query_getitem]
sixes = session.query(Car).filter(Car.engine["cylinder"]["count"] == 6)
# [END bigquery_sqlalchemy_query_getitem]
# fmt: on
sixes3 = list(sixes)

return sixes1, sixes2, sixes3
27 changes: 27 additions & 0 deletions samples/snippets/STRUCT_test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Copyright (c) 2021 The sqlalchemy-bigquery Authors
#
# Permission is hereby granted, free of charge, to any person obtaining a copy of
# this software and associated documentation files (the "Software"), to deal in
# the Software without restriction, including without limitation the rights to
# use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
# the Software, and to permit persons to whom the Software is furnished to do so,
# subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
# FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
# COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
# IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
# CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


def test_struct(engine):
from . import STRUCT

sixeses = STRUCT.example(engine)

for sixes in sixeses:
assert sorted(car.model for car in sixes) == ["Sebring", "Town and Counttry"]
File renamed without changes.
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ def readme():
# Until this issue is closed
# https://github.com/googleapis/google-cloud-python/issues/10566
"google-auth>=1.25.0,<3.0.0dev", # Work around pip wack.
"google-cloud-bigquery>=2.24.1",
"google-cloud-bigquery>=2.25.2,<3.0.0dev",
"sqlalchemy>=1.2.0,<1.5.0dev",
"future",
],
Expand Down
4 changes: 3 additions & 1 deletion sqlalchemy_bigquery/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
from .version import __version__ # noqa

from .base import BigQueryDialect, dialect # noqa
from .base import (
from ._types import (
ARRAY,
BIGNUMERIC,
BOOL,
Expand All @@ -38,6 +38,7 @@
NUMERIC,
RECORD,
STRING,
STRUCT,
TIME,
TIMESTAMP,
)
Expand All @@ -58,6 +59,7 @@
"NUMERIC",
"RECORD",
"STRING",
"STRUCT",
"TIME",
"TIMESTAMP",
]
Expand Down
148 changes: 148 additions & 0 deletions sqlalchemy_bigquery/_struct.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
# Copyright (c) 2021 The sqlalchemy-bigquery Authors
#
# Permission is hereby granted, free of charge, to any person obtaining a copy of
# this software and associated documentation files (the "Software"), to deal in
# the Software without restriction, including without limitation the rights to
# use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
# the Software, and to permit persons to whom the Software is furnished to do so,
# subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
# FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
# COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
# IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
# CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

from typing import Mapping, Tuple

import packaging.version
import sqlalchemy.sql.default_comparator
import sqlalchemy.sql.sqltypes
import sqlalchemy.types

from . import base

sqlalchemy_1_4_or_more = packaging.version.parse(
sqlalchemy.__version__
) >= packaging.version.parse("1.4")

if sqlalchemy_1_4_or_more:
import sqlalchemy.sql.coercions
import sqlalchemy.sql.roles


def _get_subtype_col_spec(type_):
global _get_subtype_col_spec

type_compiler = base.dialect.type_compiler(base.dialect())
_get_subtype_col_spec = type_compiler.process
return _get_subtype_col_spec(type_)


class STRUCT(sqlalchemy.sql.sqltypes.Indexable, sqlalchemy.types.UserDefinedType):
"""
A type for BigQuery STRUCT/RECORD data
See https://googleapis.dev/python/sqlalchemy-bigquery/latest/struct.html
"""

# See https://docs.sqlalchemy.org/en/14/core/custom_types.html#creating-new-types

def __init__(
self,
*fields: Tuple[str, sqlalchemy.types.TypeEngine],
**kwfields: Mapping[str, sqlalchemy.types.TypeEngine],
):
# Note that because:
# https://docs.python.org/3/whatsnew/3.6.html#pep-468-preserving-keyword-argument-order
# We know that `kwfields` preserves order.
self._STRUCT_fields = tuple(
(
name,
type_ if isinstance(type_, sqlalchemy.types.TypeEngine) else type_(),
)
for (name, type_) in (fields + tuple(kwfields.items()))
)

self._STRUCT_byname = {
name.lower(): type_ for (name, type_) in self._STRUCT_fields
}

def __repr__(self):
fields = ", ".join(
f"{name}={repr(type_)}" for name, type_ in self._STRUCT_fields
)
return f"STRUCT({fields})"

def get_col_spec(self, **kw):
fields = ", ".join(
f"{name} {_get_subtype_col_spec(type_)}"
for name, type_ in self._STRUCT_fields
)
return f"STRUCT<{fields}>"

def bind_processor(self, dialect):
return dict

class Comparator(sqlalchemy.sql.sqltypes.Indexable.Comparator):
def _setup_getitem(self, name):
if not isinstance(name, str):
raise TypeError(
f"STRUCT fields can only be accessed with strings field names,"
f" not {repr(name)}."
)
subtype = self.expr.type._STRUCT_byname.get(name.lower())
if subtype is None:
raise KeyError(name)
operator = struct_getitem_op
index = _field_index(self, name, operator)
return operator, index, subtype

def __getattr__(self, name):
if name.lower() in self.expr.type._STRUCT_byname:
return self[name]

comparator_factory = Comparator


# In the implementations of _field_index below, we're stealing from
# the JSON type implementation, but the code to steal changed in
# 1.4. :/

if sqlalchemy_1_4_or_more:

def _field_index(self, name, operator):
return sqlalchemy.sql.coercions.expect(
sqlalchemy.sql.roles.BinaryElementRole,
name,
expr=self.expr,
operator=operator,
bindparam_type=sqlalchemy.types.String(),
)


else:

def _field_index(self, name, operator):
return sqlalchemy.sql.default_comparator._check_literal(
self.expr, operator, name, bindparam_type=sqlalchemy.types.String(),
)


def struct_getitem_op(a, b):
raise NotImplementedError()


sqlalchemy.sql.default_comparator.operator_lookup[
struct_getitem_op.__name__
] = sqlalchemy.sql.default_comparator.operator_lookup["json_getitem_op"]


class SQLCompiler:
def visit_struct_getitem_op_binary(self, binary, operator_, **kw):
left = self.process(binary.left, **kw)
return f"{left}.{binary.right.value}"
Loading

0 comments on commit 6624b10

Please sign in to comment.