feat: add pipelines structure #1046

daniel-sanche · 2025-05-12T23:30:51Z

This is the first PR for pipelines support, adding the base structure for the new feature:

added PipelineSource class to create a new pipeline pointed at a resource
- added client.pipeline() to create a new PipelineSource object
added pipeline_expressions file to hold set of expressions
- Currently just holding base Expr and Constant
added pipeline_stages file to hold set of stages.
- Currently just holding base Stage, Collection, and GenericStage
added BasePipeline, Pipeline, and AsyncPipeline to chain together sequences of stages
- implemented pipleine.execute() in sync and async classes
added PipelineResult to expose resulting data back to the user

This PR contains unit tests. System tests will come later, when we have more stages

bhshkh · 2025-06-12T01:04:57Z

LGTM

google/cloud/firestore_v1/async_pipeline.py

MarkDuckworth

looks good. I put some comments for your consideration. They may not be actionable, just FYIs

MarkDuckworth · 2025-06-13T20:38:36Z

google/cloud/firestore_v1/async_pipeline.py

+    async def execute(
+        self,
+        transaction: "AsyncTransaction" | None = None,
+    ) -> AsyncIterable[PipelineResult]:


I'm not familiar with how firestore for Python works, but within the SDKs my team works on, execute would buffer all of the PipelineResults into memory and make these available as an array. The execute method itself would be asyncronous.

However, our server sdks, java-firestore and nodejs-firstore, will offer a streaming API pipeline.stream(). And much like the AsyncIterable, this would allow users to stream PipelineResults as they are received and not buffer all into memroy

Also, FWIW, we will be adding a wrapper object PipelineSnapshot which contains all results and other metadata (like explain info and timestamps). I will be sending your team an updated nodejs-firestore PR with our latest changes, as soon as i can.

Ok, that makes sense, good feedback.

I will create separate methods for stream(): -> AsyncIterable[PipelineResult] and execute_pipeline(): -> list[PipelineResult], and then we can convert that list into a PipelineSnapshot in the future

MarkDuckworth · 2025-06-13T20:58:32Z

google/cloud/firestore_v1/pipeline.py

+    def execute(
+        self,
+        transaction: "Transaction" | None = None,
+    ) -> Iterable[PipelineResult]:


I see the synchronous iterable here. Makes sense. But I'm still wondering if there is an in-between impementation, where the execute is asynchronous, but the results are available in a synchronous iterable.

Also, same comment about returning a PipelineSnapshot

I don't think Python really provides a way to do this, because synchronous iterables are inherently blocking. But I'm interested if you have an idea in mind

MarkDuckworth · 2025-06-13T21:33:56Z

google/cloud/firestore_v1/pipeline_expressions.py

+    GeoPoint,
+    Vector,
+    list,
+    Dict[str, Any],


might be a candidate for Dict[str, CONSTANT_TYPE]

Unfortunately I don't think Python's type system is powerful enough for this kind of recursive TypeVar. But I'll take another look

MarkDuckworth · 2025-06-13T21:38:09Z

google/cloud/firestore_v1/pipeline_result.py

+        Retrieves all fields in the result.
+
+        If a converter was provided to this `PipelineResult`, the result of the
+        converter's `from_firestore` method is returned.


I can't tell if converters are or will be supported in Python pipelines

Good catch, I ended up cutting it from this PR. I'll remove the docstring

Let me know if you think converters would be important to have in Python for launch

MarkDuckworth · 2025-06-13T21:43:41Z

google/cloud/firestore_v1/pipeline_stages.py

+        return f"{self.__class__.__name__}({', '.join(items)})"
+
+
+class Collection(Stage):


FWIW, in most SDKs, we tried to make adding a stage a method on Pipeline. Only in the Java SDKs we expose the Stage classes. I guess it's too early to tell how you are planning to implement the API to add stages to a Pipeline. And I guess you will do whatever is idomatic for Python.

Python currently exposes them both ways, but using the methods on pipeline is more idiomatic and user-friendly. Maybe I'll mark this class as private, and keep the pipeline methods as the official public way to add a stage

commit ad3e3df Author: Daniel Sanche <[email protected]> Date: Fri Jun 20 17:44:37 2025 -0700 added .pipeline() to aggregation query commit 4ce1b91 Author: Daniel Sanche <[email protected]> Date: Fri Jun 20 15:17:20 2025 -0700 added exprs and stages needed for aggregation pipelines commit e7d8e52 Merge: 0d15355 17e71b9 Author: Daniel Sanche <[email protected]> Date: Tue Jun 17 16:57:45 2025 -0700 Merge branch 'pipeline_queries_approved' into pipeline_queries_2_query_parity commit 17e71b9 Author: Daniel Sanche <[email protected]> Date: Tue Jun 17 15:54:31 2025 -0700 feat: add pipelines structure (#1046) commit 0d15355 Author: Daniel Sanche <[email protected]> Date: Tue Jun 17 15:52:38 2025 -0700 fixed tests commit bc25930 Merge: 6351ae7 13389b8 Author: Daniel Sanche <[email protected]> Date: Tue Jun 17 15:38:05 2025 -0700 Merge branch 'pipeline_queries_1_stubs' into pipeline_queries_2_query_parity commit 6351ae7 Author: Daniel Sanche <[email protected]> Date: Tue Jun 17 15:35:15 2025 -0700 merged PR #1 commit 13389b8 Author: Daniel Sanche <[email protected]> Date: Mon Jun 16 15:15:26 2025 -0700 fixed mypy commit 06a2084 Author: Daniel Sanche <[email protected]> Date: Mon Jun 16 14:57:54 2025 -0700 added generic_stage method to base_pipeline commit 8a9c3ec Author: Daniel Sanche <[email protected]> Date: Mon Jun 16 14:55:50 2025 -0700 made stages private commit a818f52 Author: Daniel Sanche <[email protected]> Date: Mon Jun 16 14:28:20 2025 -0700 removed converter reference commit e74e04d Author: Daniel Sanche <[email protected]> Date: Mon Jun 16 14:22:13 2025 -0700 added separate stream/execute methods commit 64cd4fb Author: Daniel Sanche <[email protected]> Date: Thu Jun 12 11:22:25 2025 -0700 fixed comment commit 3432322 Author: Daniel Sanche <[email protected]> Date: Tue Jun 10 17:03:09 2025 -0700 fixed lint

daniel-sanche added 30 commits January 22, 2025 15:53

added skeletons for pipeline expressions

1087d64

added quick implemtation for expressions

053f55b

added default implementation for pipeline stages

e56459c

ran black

babafb2

got code to run

149c61f

moved helpers

320cea1

added basic pipelines.py file

cd2963d

added yaml test file

3cbc2d5

wrote basic parser for pipeline yaml

0c2241d

encoded extra java system tests

9c8982c

reconstruct pipeline expr objects

863bd1d

got yaml to run

7cd2c11

add data loading code

c328934

updated protos

de0a862

added pyyaml to system test dependencies

45c83aa

added Expr methods

fc57f04

trying to improve accumulator api

0224f94

improved aggregate/accumulators

cdee937

fixed naming in yaml

5d35c54

create objects in parsing code

11373e3

use order enum

d2b4153

standardize how I deal with map stages

bfdfba3

fix broke super.__init__ calls

afa823a

added repr for custom classes

3348127

added repr to pipeline

e3c995d

fixed vector formatting

2d604e1

only treat capitalized strings as possible exprs

3bed807

Where uses positional args in yaml

9b9eaa8

support union stage

e35accf

fixed testReplace

72301a1

daniel-sanche requested a review from a team as a code owner May 12, 2025 23:30

blunderbuss-gcf bot assigned cherylEnkidu May 12, 2025

product-auto-label bot added size: xl Pull request size is extra large. api: firestore Issues related to the googleapis/python-firestore API. labels May 12, 2025

daniel-sanche force-pushed the pipeline_queries_approved branch from b57424d to 3f9b65f Compare May 13, 2025 18:25

daniel-sanche added 2 commits May 13, 2025 11:25

Merge branch 'pipeline_queries_approved' into pipeline_queries_1_stubs

d2babd2

fixed test issues

b46bdc1

daniel-sanche assigned MarkDuckworth and unassigned cherylEnkidu May 14, 2025

product-auto-label bot added size: l Pull request size is large. and removed size: xl Pull request size is extra large. labels Jun 9, 2025

Merge branch 'pipeline_queries_approved' into pipeline_queries_1_stubs

22b558c

daniel-sanche force-pushed the pipeline_queries_1_stubs branch from 1f2bfa4 to 22b558c Compare June 9, 2025 21:17

fixed lint

3432322

product-auto-label bot added size: xl Pull request size is extra large. and removed size: l Pull request size is large. labels Jun 11, 2025

bhshkh approved these changes Jun 12, 2025

View reviewed changes

bhshkh reviewed Jun 12, 2025

View reviewed changes

google/cloud/firestore_v1/async_pipeline.py Outdated Show resolved Hide resolved

fixed comment

64cd4fb

MarkDuckworth approved these changes Jun 13, 2025

View reviewed changes

daniel-sanche added 5 commits June 16, 2025 14:22

added separate stream/execute methods

e74e04d

removed converter reference

a818f52

made stages private

8a9c3ec

added generic_stage method to base_pipeline

06a2084

fixed mypy

13389b8

daniel-sanche merged commit 17e71b9 into pipeline_queries_approved Jun 17, 2025
7 checks passed

daniel-sanche deleted the pipeline_queries_1_stubs branch June 17, 2025 22:54

daniel-sanche mentioned this pull request Jun 24, 2025

feat: add primary pipeline stages #1048

Merged

		return f"{self.__class__.__name__}({', '.join(items)})"


		class Collection(Stage):

feat: add pipelines structure #1046

feat: add pipelines structure #1046

Uh oh!

Conversation

daniel-sanche commented May 12, 2025

Uh oh!

bhshkh commented Jun 12, 2025

Uh oh!

Uh oh!

MarkDuckworth left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants