Skip to content

Commit

Permalink
feat: allow to set clustering and time partitioning options at table …
Browse files Browse the repository at this point in the history
…creation (#928)

* refactor: standardize bigquery options handling to manage more options

* feat: handle table partitioning, table clustering and more table options (expiration_timestamp, expiration_timestamp, require_partition_filter, default_rounding_mode) via create_table dialect options

* fix: having clustering fields and partitioning exposed has table indexes leads to bad autogenerated version file

def upgrade() -> None:
    # ### commands auto generated by Alembic - please adjust! ###
    op.drop_index('clustering', table_name='dataset.some_table')
    op.drop_index('partition', table_name='dataset.some_table')
    # ### end Alembic commands ###

def downgrade() -> None:
    # ### commands auto generated by Alembic - please adjust! ###
    op.create_index('partition', 'dataset.some_table', ['createdAt'], unique=False)
    op.create_index('clustering', 'dataset.some_table', ['id', 'createdAt'], unique=False)
    # ### end Alembic commands ###

* docs: update README to describe how to create clustered and partitioned table as well as other newly supported table options

* test: adjust system tests since indexes are no longer populated from table partitions and clustering info

* test: alembic now supports creating partitioned tables

* test: run integration tests with all the new create_table options

* chore: rename variables to represent what it is a bit more clearly

* fix: assertions should no be used to validate user inputs

* refactor: extract process_option_value() from post_create_table() for improved readability

* docs: add docstring to post_create_table() and _process_option_value()

* test: increase code coverage by testing error cases

* refactor: better represent the distinction between the option value data type check and the transformation in SQL literal

* test: adding test cases for _validate_option_value_type() and _process_option_value()

* chore: coding style

* chore: reformat files with black

* test: typo in tests

* feat: change the option name for partitioning to leverage the TimePartitioning interface of the Python Client for Google BigQuery

* fix: TimePartitioning.field is optional

* chore: coding style

* test: fix system test with table option bigquery_require_partition_filter

* feat: add support for experimental range_partitioning option

* test: fix system test with new bigquery_time_partitioning table option

* docs: update README with time_partitioning and range_partitioning

* test: relevant comments in unit tests

* test: cover all error cases

* chore: no magic numbers

* chore: consistency in docstrings

* chore: no magic number

* chore: better error types

* chore: fix W605 invalid escape sequence
  • Loading branch information
nlenepveu authored Jan 10, 2024
1 parent ac74a34 commit c2c2958
Show file tree
Hide file tree
Showing 7 changed files with 799 additions and 67 deletions.
53 changes: 52 additions & 1 deletion README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -292,14 +292,65 @@ To add metadata to a table:

.. code-block:: python
table = Table('mytable', ..., bigquery_description='my table description', bigquery_friendly_name='my table friendly name')
table = Table('mytable', ...,
bigquery_description='my table description',
bigquery_friendly_name='my table friendly name',
bigquery_default_rounding_mode="ROUND_HALF_EVEN",
bigquery_expiration_timestamp=datetime.datetime.fromisoformat("2038-01-01T00:00:00+00:00"),
)
To add metadata to a column:

.. code-block:: python
Column('mycolumn', doc='my column description')
To create a clustered table:

.. code-block:: python
table = Table('mytable', ..., bigquery_clustering_fields=["a", "b", "c"])
To create a time-unit column-partitioned table:

.. code-block:: python
from google.cloud import bigquery
table = Table('mytable', ...,
bigquery_time_partitioning=bigquery.TimePartitioning(
field="mytimestamp",
type_="MONTH",
expiration_ms=1000 * 60 * 60 * 24 * 30 * 6, # 6 months
),
bigquery_require_partition_filter=True,
)
To create an ingestion-time partitioned table:

.. code-block:: python
from google.cloud import bigquery
table = Table('mytable', ...,
bigquery_time_partitioning=bigquery.TimePartitioning(),
bigquery_require_partition_filter=True,
)
To create an integer-range partitioned table

.. code-block:: python
from google.cloud import bigquery
table = Table('mytable', ...,
bigquery_range_partitioning=bigquery.RangePartitioning(
field="zipcode",
range_=bigquery.PartitionRange(start=0, end=100000, interval=10),
),
bigquery_require_partition_filter=True,
)
Threading and Multiprocessing
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down
Loading

0 comments on commit c2c2958

Please sign in to comment.