|
| 1 | +# Contribution guidelines, tailored for LLM agents |
| 2 | + |
| 3 | +## Testing |
| 4 | + |
| 5 | +We use `nox` to instrument our tests. |
| 6 | + |
| 7 | +- To test your changes, run unit tests with `nox`: |
| 8 | + |
| 9 | + ```bash |
| 10 | + nox -r -s unit |
| 11 | + ``` |
| 12 | + |
| 13 | +- To run a single unit test: |
| 14 | + |
| 15 | + ```bash |
| 16 | + nox -r -s unit-3.13 -- -k <name of test> |
| 17 | + ``` |
| 18 | + |
| 19 | +- To run system tests, you can execute:: |
| 20 | + |
| 21 | + # Run all system tests |
| 22 | + $ nox -r -s system |
| 23 | + |
| 24 | + # Run a single system test |
| 25 | + $ nox -r -s system-3.13 -- -k <name of test> |
| 26 | + |
| 27 | +- The codebase must have better coverage than it had previously after each |
| 28 | + change. You can test coverage via `nox -s unit system cover` (takes a long |
| 29 | + time). |
| 30 | + |
| 31 | +## Code Style |
| 32 | + |
| 33 | +- We use the automatic code formatter `black`. You can run it using |
| 34 | + the nox session `format`. This will eliminate many lint errors. Run via: |
| 35 | + |
| 36 | + ```bash |
| 37 | + nox -r -s format |
| 38 | + ``` |
| 39 | + |
| 40 | +- PEP8 compliance is required, with exceptions defined in the linter configuration. |
| 41 | + If you have ``nox`` installed, you can test that you have not introduced |
| 42 | + any non-compliant code via: |
| 43 | + |
| 44 | + ``` |
| 45 | + nox -r -s lint |
| 46 | + ``` |
| 47 | + |
| 48 | +- When writing tests, use the idiomatic "pytest" style. |
| 49 | + |
| 50 | +## Documentation |
| 51 | + |
| 52 | +If a method or property is implementing the same interface as a third-party |
| 53 | +package such as pandas or scikit-learn, place the relevant docstring in the |
| 54 | +corresponding `third_party/bigframes_vendored/package_name` directory, not in |
| 55 | +the `bigframes` directory. Implementations may be placed in the `bigframes` |
| 56 | +directory, though. |
| 57 | + |
| 58 | +### Testing code samples |
| 59 | + |
| 60 | +Code samples are very important for accurate documentation. We use the "doctest" |
| 61 | +framework to ensure the samples are functioning as expected. After adding a code |
| 62 | +sample, please ensure it is correct by running doctest. To run the samples |
| 63 | +doctests for just a single method, refer to the following example: |
| 64 | + |
| 65 | +```bash |
| 66 | +pytest --doctest-modules bigframes/pandas/__init__.py::bigframes.pandas.cut |
| 67 | +``` |
| 68 | + |
| 69 | +## Tips for implementing common BigFrames features |
| 70 | + |
| 71 | +### Adding a scalar operator |
| 72 | + |
| 73 | +For an example, see commit |
| 74 | +[c5b7fdae74a22e581f7705bc0cf5390e928f4425](https://github.com/googleapis/python-bigquery-dataframes/commit/c5b7fdae74a22e581f7705bc0cf5390e928f4425). |
| 75 | + |
| 76 | +To add a new scalar operator, follow these steps: |
| 77 | + |
| 78 | +1. **Define the operation dataclass:** |
| 79 | + - In `bigframes/operations/`, find the relevant file (e.g., `geo_ops.py` for geography functions) or create a new one. |
| 80 | + - Create a new dataclass inheriting from `base_ops.UnaryOp` for unary |
| 81 | + operators, `base_ops.BinaryOp` for binary operators, `base_ops.TernaryOp` |
| 82 | + for ternary operators, or `base_ops.NaryOp for operators with many |
| 83 | + arguments. Note that these operators are counting the number column-like |
| 84 | + arguments. A function that takes only a single column but several literal |
| 85 | + values would still be a `UnaryOp`. |
| 86 | + - Define the `name` of the operation and any parameters it requires. |
| 87 | + - Implement the `output_type` method to specify the data type of the result. |
| 88 | + |
| 89 | +2. **Export the new operation:** |
| 90 | + - In `bigframes/operations/__init__.py`, import your new operation dataclass and add it to the `__all__` list. |
| 91 | + |
| 92 | +3. **Implement the user-facing function (pandas-like):** |
| 93 | + |
| 94 | + - Identify the canonical function from pandas / geopandas / awkward array / |
| 95 | + other popular Python package that this operator implements. |
| 96 | + - Find the corresponding class in BigFrames. For example, the implementation |
| 97 | + for most geopandas.GeoSeries methods is in |
| 98 | + `bigframes/geopandas/geoseries.py`. Pandas Series methods are implemented |
| 99 | + in `bigframes/series.py` or one of the accessors, such as `StringMethods` |
| 100 | + in `bigframes/operations/strings.py`. |
| 101 | + - Create the user-facing function that will be called by users (e.g., `length`). |
| 102 | + - If the SQL method differs from pandas or geopandas in a way that can't be |
| 103 | + made the same, raise a `NotImplementedError` with an appropriate message and |
| 104 | + link to the feedback form. |
| 105 | + - Add the docstring to the corresponding file in |
| 106 | + `third_party/bigframes_vendored`, modeled after pandas / geopandas. |
| 107 | + |
| 108 | +4. **Implement the user-facing function (SQL-like):** |
| 109 | + |
| 110 | + - In `bigframes/bigquery/_operations/`, find the relevant file (e.g., `geo.py`) or create a new one. |
| 111 | + - Create the user-facing function that will be called by users (e.g., `st_length`). |
| 112 | + - This function should take a `Series` for any column-like inputs, plus any other parameters. |
| 113 | + - Inside the function, call `series._apply_unary_op`, |
| 114 | + `series._apply_binary_op`, or similar passing the operation dataclass you |
| 115 | + created. |
| 116 | + - Add a comprehensive docstring with examples. |
| 117 | + - In `bigframes/bigquery/__init__.py`, import your new user-facing function and add it to the `__all__` list. |
| 118 | + |
| 119 | +5. **Implement the compilation logic:** |
| 120 | + - In `bigframes/core/compile/scalar_op_compiler.py`: |
| 121 | + - If the BigQuery function has a direct equivalent in Ibis, you can often reuse an existing Ibis method. |
| 122 | + - If not, define a new Ibis UDF using `@ibis_udf.scalar.builtin` to map to the specific BigQuery function signature. |
| 123 | + - Create a new compiler implementation function (e.g., `geo_length_op_impl`). |
| 124 | + - Register this function to your operation dataclass using `@scalar_op_compiler.register_unary_op` or `@scalar_op_compiler.register_binary_op`. |
| 125 | + - This implementation will translate the BigQuery DataFrames operation into the appropriate Ibis expression. |
| 126 | + |
| 127 | +6. **Add Tests:** |
| 128 | + - Add system tests in the `tests/system/` directory to verify the end-to-end |
| 129 | + functionality of the new operator. Test various inputs, including edge cases |
| 130 | + and `NULL` values. |
| 131 | + |
| 132 | + Where possible, run the same test code against pandas or GeoPandas and |
| 133 | + compare that the outputs are the same (except for dtypes if BigFrames |
| 134 | + differs from pandas). |
| 135 | + - If you are overriding a pandas or GeoPandas property, add a unit test to |
| 136 | + ensure the correct behavior (e.g., raising `NotImplementedError` if the |
| 137 | + functionality is not supported). |
| 138 | + |
| 139 | + |
| 140 | +## Constraints |
| 141 | + |
| 142 | +- Only add git commits. Do not change git history. |
| 143 | +- Follow the spec file for development. |
| 144 | + - Check off items in the "Acceptance |
| 145 | + criteria" and "Detailed steps" sections with `[x]`. |
| 146 | + - Please do this as they are completed. |
| 147 | + - Refer back to the spec after each step. |
0 commit comments