Skip to content

Commit 36fb4ef

Browse files
authored
Merge branch 'main' into b374307132-information_schema
2 parents fc194af + ccd7c07 commit 36fb4ef

File tree

604 files changed

+40216
-12819
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

604 files changed

+40216
-12819
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,3 +62,4 @@ system_tests/local_test_setup
6262
# Make sure a generated file isn't accidentally committed.
6363
pylintrc
6464
pylintrc.test
65+
dummy.pkl

.pre-commit-config.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ repos:
2020
hooks:
2121
- id: trailing-whitespace
2222
- id: end-of-file-fixer
23-
exclude: "^tests/unit/core/compile/sqlglot/snapshots"
23+
exclude: "^tests/unit/core/compile/sqlglot/.*snapshots"
2424
- id: check-yaml
2525
- repo: https://github.com/pycqa/isort
2626
rev: 5.12.0
@@ -43,7 +43,7 @@ repos:
4343
exclude: "^third_party"
4444
args: ["--check-untyped-defs", "--explicit-package-bases", "--ignore-missing-imports"]
4545
- repo: https://github.com/biomejs/pre-commit
46-
rev: v2.0.2
46+
rev: v2.2.4
4747
hooks:
4848
- id: biome-check
4949
files: '\.(js|css)$'

CHANGELOG.md

Lines changed: 306 additions & 1 deletion
Large diffs are not rendered by default.

GEMINI.md

Lines changed: 147 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,147 @@
1+
# Contribution guidelines, tailored for LLM agents
2+
3+
## Testing
4+
5+
We use `nox` to instrument our tests.
6+
7+
- To test your changes, run unit tests with `nox`:
8+
9+
```bash
10+
nox -r -s unit
11+
```
12+
13+
- To run a single unit test:
14+
15+
```bash
16+
nox -r -s unit-3.13 -- -k <name of test>
17+
```
18+
19+
- To run system tests, you can execute::
20+
21+
# Run all system tests
22+
$ nox -r -s system
23+
24+
# Run a single system test
25+
$ nox -r -s system-3.13 -- -k <name of test>
26+
27+
- The codebase must have better coverage than it had previously after each
28+
change. You can test coverage via `nox -s unit system cover` (takes a long
29+
time).
30+
31+
## Code Style
32+
33+
- We use the automatic code formatter `black`. You can run it using
34+
the nox session `format`. This will eliminate many lint errors. Run via:
35+
36+
```bash
37+
nox -r -s format
38+
```
39+
40+
- PEP8 compliance is required, with exceptions defined in the linter configuration.
41+
If you have ``nox`` installed, you can test that you have not introduced
42+
any non-compliant code via:
43+
44+
```
45+
nox -r -s lint
46+
```
47+
48+
- When writing tests, use the idiomatic "pytest" style.
49+
50+
## Documentation
51+
52+
If a method or property is implementing the same interface as a third-party
53+
package such as pandas or scikit-learn, place the relevant docstring in the
54+
corresponding `third_party/bigframes_vendored/package_name` directory, not in
55+
the `bigframes` directory. Implementations may be placed in the `bigframes`
56+
directory, though.
57+
58+
### Testing code samples
59+
60+
Code samples are very important for accurate documentation. We use the "doctest"
61+
framework to ensure the samples are functioning as expected. After adding a code
62+
sample, please ensure it is correct by running doctest. To run the samples
63+
doctests for just a single method, refer to the following example:
64+
65+
```bash
66+
pytest --doctest-modules bigframes/pandas/__init__.py::bigframes.pandas.cut
67+
```
68+
69+
## Tips for implementing common BigFrames features
70+
71+
### Adding a scalar operator
72+
73+
For an example, see commit
74+
[c5b7fdae74a22e581f7705bc0cf5390e928f4425](https://github.com/googleapis/python-bigquery-dataframes/commit/c5b7fdae74a22e581f7705bc0cf5390e928f4425).
75+
76+
To add a new scalar operator, follow these steps:
77+
78+
1. **Define the operation dataclass:**
79+
- In `bigframes/operations/`, find the relevant file (e.g., `geo_ops.py` for geography functions) or create a new one.
80+
- Create a new dataclass inheriting from `base_ops.UnaryOp` for unary
81+
operators, `base_ops.BinaryOp` for binary operators, `base_ops.TernaryOp`
82+
for ternary operators, or `base_ops.NaryOp for operators with many
83+
arguments. Note that these operators are counting the number column-like
84+
arguments. A function that takes only a single column but several literal
85+
values would still be a `UnaryOp`.
86+
- Define the `name` of the operation and any parameters it requires.
87+
- Implement the `output_type` method to specify the data type of the result.
88+
89+
2. **Export the new operation:**
90+
- In `bigframes/operations/__init__.py`, import your new operation dataclass and add it to the `__all__` list.
91+
92+
3. **Implement the user-facing function (pandas-like):**
93+
94+
- Identify the canonical function from pandas / geopandas / awkward array /
95+
other popular Python package that this operator implements.
96+
- Find the corresponding class in BigFrames. For example, the implementation
97+
for most geopandas.GeoSeries methods is in
98+
`bigframes/geopandas/geoseries.py`. Pandas Series methods are implemented
99+
in `bigframes/series.py` or one of the accessors, such as `StringMethods`
100+
in `bigframes/operations/strings.py`.
101+
- Create the user-facing function that will be called by users (e.g., `length`).
102+
- If the SQL method differs from pandas or geopandas in a way that can't be
103+
made the same, raise a `NotImplementedError` with an appropriate message and
104+
link to the feedback form.
105+
- Add the docstring to the corresponding file in
106+
`third_party/bigframes_vendored`, modeled after pandas / geopandas.
107+
108+
4. **Implement the user-facing function (SQL-like):**
109+
110+
- In `bigframes/bigquery/_operations/`, find the relevant file (e.g., `geo.py`) or create a new one.
111+
- Create the user-facing function that will be called by users (e.g., `st_length`).
112+
- This function should take a `Series` for any column-like inputs, plus any other parameters.
113+
- Inside the function, call `series._apply_unary_op`,
114+
`series._apply_binary_op`, or similar passing the operation dataclass you
115+
created.
116+
- Add a comprehensive docstring with examples.
117+
- In `bigframes/bigquery/__init__.py`, import your new user-facing function and add it to the `__all__` list.
118+
119+
5. **Implement the compilation logic:**
120+
- In `bigframes/core/compile/scalar_op_compiler.py`:
121+
- If the BigQuery function has a direct equivalent in Ibis, you can often reuse an existing Ibis method.
122+
- If not, define a new Ibis UDF using `@ibis_udf.scalar.builtin` to map to the specific BigQuery function signature.
123+
- Create a new compiler implementation function (e.g., `geo_length_op_impl`).
124+
- Register this function to your operation dataclass using `@scalar_op_compiler.register_unary_op` or `@scalar_op_compiler.register_binary_op`.
125+
- This implementation will translate the BigQuery DataFrames operation into the appropriate Ibis expression.
126+
127+
6. **Add Tests:**
128+
- Add system tests in the `tests/system/` directory to verify the end-to-end
129+
functionality of the new operator. Test various inputs, including edge cases
130+
and `NULL` values.
131+
132+
Where possible, run the same test code against pandas or GeoPandas and
133+
compare that the outputs are the same (except for dtypes if BigFrames
134+
differs from pandas).
135+
- If you are overriding a pandas or GeoPandas property, add a unit test to
136+
ensure the correct behavior (e.g., raising `NotImplementedError` if the
137+
functionality is not supported).
138+
139+
140+
## Constraints
141+
142+
- Only add git commits. Do not change git history.
143+
- Follow the spec file for development.
144+
- Check off items in the "Acceptance
145+
criteria" and "Detailed steps" sections with `[x]`.
146+
- Please do this as they are completed.
147+
- Refer back to the spec after each step.

bigframes/_config/auth.py

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
# Copyright 2025 Google LLC
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
from __future__ import annotations
16+
17+
import threading
18+
from typing import Optional
19+
20+
import google.auth.credentials
21+
import google.auth.transport.requests
22+
import pydata_google_auth
23+
24+
_SCOPES = ["https://www.googleapis.com/auth/cloud-platform"]
25+
26+
# Put the lock here rather than in BigQueryOptions so that BigQueryOptions
27+
# remains deepcopy-able.
28+
_AUTH_LOCK = threading.Lock()
29+
_cached_credentials: Optional[google.auth.credentials.Credentials] = None
30+
_cached_project_default: Optional[str] = None
31+
32+
33+
def get_default_credentials_with_project() -> tuple[
34+
google.auth.credentials.Credentials, Optional[str]
35+
]:
36+
global _AUTH_LOCK, _cached_credentials, _cached_project_default
37+
38+
with _AUTH_LOCK:
39+
if _cached_credentials is not None:
40+
return _cached_credentials, _cached_project_default
41+
42+
_cached_credentials, _cached_project_default = pydata_google_auth.default(
43+
scopes=_SCOPES, use_local_webserver=False
44+
)
45+
46+
# Ensure an access token is available.
47+
_cached_credentials.refresh(google.auth.transport.requests.Request())
48+
49+
return _cached_credentials, _cached_project_default
50+
51+
52+
def reset_default_credentials_and_project():
53+
global _AUTH_LOCK, _cached_credentials, _cached_project_default
54+
55+
with _AUTH_LOCK:
56+
_cached_credentials = None
57+
_cached_project_default = None

bigframes/_config/display_options.py

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,11 +26,16 @@
2626
class DisplayOptions:
2727
__doc__ = vendored_pandas_config.display_options_doc
2828

29+
# Options borrowed from pandas.
2930
max_columns: int = 20
30-
max_rows: int = 25
31+
max_rows: int = 10
32+
precision: int = 6
33+
34+
# Options unique to BigQuery DataFrames.
3135
progress_bar: Optional[str] = "auto"
3236
repr_mode: Literal["head", "deferred", "anywidget"] = "head"
3337

38+
max_colwidth: Optional[int] = 50
3439
max_info_columns: int = 100
3540
max_info_rows: Optional[int] = 200000
3641
memory_usage: bool = True
@@ -48,10 +53,14 @@ def pandas_repr(display_options: DisplayOptions):
4853
so that we don't override pandas behavior.
4954
"""
5055
with pd.option_context(
56+
"display.max_colwidth",
57+
display_options.max_colwidth,
5158
"display.max_columns",
5259
display_options.max_columns,
5360
"display.max_rows",
5461
display_options.max_rows,
62+
"display.precision",
63+
display_options.precision,
5564
"display.show_dimensions",
5665
True,
5766
) as pandas_context:

bigframes/_importing.py

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
import importlib
1515
from types import ModuleType
1616

17+
import numpy
1718
from packaging import version
1819

1920
# Keep this in sync with setup.py
@@ -22,9 +23,13 @@
2223

2324
def import_polars() -> ModuleType:
2425
polars_module = importlib.import_module("polars")
25-
imported_version = version.Version(polars_module.build_info()["version"])
26-
if imported_version < POLARS_MIN_VERSION:
26+
# Check for necessary methods instead of the version number because we
27+
# can't trust the polars version until
28+
# https://github.com/pola-rs/polars/issues/23940 is fixed.
29+
try:
30+
polars_module.lit(numpy.int64(100), dtype=polars_module.Int64())
31+
except TypeError:
2732
raise ImportError(
28-
f"Imported polars version: {imported_version} is below the minimum version: {POLARS_MIN_VERSION}"
33+
f"Imported polars version is likely below the minimum version: {POLARS_MIN_VERSION}"
2934
)
3035
return polars_module

bigframes/bigquery/__init__.py

Lines changed: 49 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,9 @@
1616
such as array functions:
1717
https://cloud.google.com/bigquery/docs/reference/standard-sql/array_functions. """
1818

19+
import sys
20+
21+
from bigframes.bigquery._operations import ai
1922
from bigframes.bigquery._operations.approx_agg import approx_top_count
2023
from bigframes.bigquery._operations.array import (
2124
array_agg,
@@ -29,6 +32,9 @@
2932
)
3033
from bigframes.bigquery._operations.geo import (
3134
st_area,
35+
st_buffer,
36+
st_centroid,
37+
st_convexhull,
3238
st_difference,
3339
st_distance,
3440
st_intersection,
@@ -45,44 +51,59 @@
4551
json_value,
4652
json_value_array,
4753
parse_json,
54+
to_json,
55+
to_json_string,
4856
)
4957
from bigframes.bigquery._operations.search import create_vector_index, vector_search
5058
from bigframes.bigquery._operations.sql import sql_scalar
5159
from bigframes.bigquery._operations.struct import struct
60+
from bigframes.core import log_adapter
5261

53-
__all__ = [
62+
_functions = [
5463
# approximate aggregate ops
55-
"approx_top_count",
64+
approx_top_count,
5665
# array ops
57-
"array_length",
58-
"array_agg",
59-
"array_to_string",
66+
array_agg,
67+
array_length,
68+
array_to_string,
69+
# datetime ops
70+
unix_micros,
71+
unix_millis,
72+
unix_seconds,
6073
# geo ops
61-
"st_area",
62-
"st_difference",
63-
"st_distance",
64-
"st_intersection",
65-
"st_isclosed",
66-
"st_length",
74+
st_area,
75+
st_buffer,
76+
st_centroid,
77+
st_convexhull,
78+
st_difference,
79+
st_distance,
80+
st_intersection,
81+
st_isclosed,
82+
st_length,
6783
# json ops
68-
"json_extract",
69-
"json_extract_array",
70-
"json_extract_string_array",
71-
"json_query",
72-
"json_query_array",
73-
"json_set",
74-
"json_value",
75-
"json_value_array",
76-
"parse_json",
84+
json_extract,
85+
json_extract_array,
86+
json_extract_string_array,
87+
json_query,
88+
json_query_array,
89+
json_set,
90+
json_value,
91+
json_value_array,
92+
parse_json,
93+
to_json,
94+
to_json_string,
7795
# search ops
78-
"create_vector_index",
79-
"vector_search",
96+
create_vector_index,
97+
vector_search,
8098
# sql ops
81-
"sql_scalar",
99+
sql_scalar,
82100
# struct ops
83-
"struct",
84-
# datetime ops
85-
"unix_micros",
86-
"unix_millis",
87-
"unix_seconds",
101+
struct,
88102
]
103+
104+
__all__ = [f.__name__ for f in _functions] + ["ai"]
105+
106+
_module = sys.modules[__name__]
107+
for f in _functions:
108+
_decorated_object = log_adapter.method_logger(f, custom_base_name="bigquery")
109+
setattr(_module, f.__name__, _decorated_object)

0 commit comments

Comments
 (0)