Skip to content

Commit

Permalink
docs(proposal): edges, buffers and buckets (#704)
Browse files Browse the repository at this point in the history
Signed-off-by: Derek Wang <[email protected]>
  • Loading branch information
whynowy authored May 2, 2023
1 parent 714c803 commit 15a229b
Show file tree
Hide file tree
Showing 7 changed files with 82 additions and 63 deletions.
Binary file added docs/assets/map-edges-buffers-buckets.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/assets/proposal.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/reduce-edges-buffers-buckets.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/specifications/controllers.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ The source code of the controllers is located at `./pkg/reconciler/`.

Pipeline Controller is used to watch `Pipeline` objects, it does following major things when there's a pipeline object created.

- Spawn a Kubernetes Job to create [buffers](./edges-and-buffers.md) in the [Inter-Step Buffer Services](../core-concepts/inter-step-buffer-service.md).
- Spawn a Kubernetes Job to create [buffers and buckets](./edges-buffers-buckets.md) in the [Inter-Step Buffer Services](../core-concepts/inter-step-buffer-service.md).
- Create `Vertex` objects according to `.spec.vertices` defined in `Pipeline` object.
- Create some other Kubernetes objects used for the Pipeline, such as a Deployment and a Service for daemon service application.

Expand Down
61 changes: 0 additions & 61 deletions docs/specifications/edges-and-buffers.md

This file was deleted.

79 changes: 79 additions & 0 deletions docs/specifications/edges-buffers-buckets.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
# Edges, Buffers and Buckets

![Proposal](../assets/proposal.svg)

> This document describes the concepts of `Edge`, `Buffer` and `Bucket` in a pipeline.
## Edges

`Edge` is the connection between the vertices, specifically, `edge` is defined in the pipeline spec under `.spec.edges`. No matter if the `to` vertex is a Map, or a Reduce with multiple partitions, it is considered as one edge.

In the following pipeline , there are 3 edges defined (`in` - `aoti`, `aoti` - `compute-sum`, `compute-sum` - `out`).

```yaml
apiVersion: numaflow.numaproj.io/v1alpha1
kind: Pipeline
metadata:
name: even-odd-sum
spec:
vertices:
- name: in
source:
http: {}
- name: atoi
scale:
min: 1
udf:
container:
image: quay.io/numaio/numaflow-go/map-even-odd
- name: compute-sum
udf:
container:
image: quay.io/numaio/numaflow-go/reduce-sum
groupBy:
window:
fixed:
length: 60s
keyed: true
- name: out
scale:
min: 1
sink:
log: {}
edges:
- from: in
to: atoi
- from: atoi
to: compute-sum
parallelism: 2
- from: compute-sum
to: out
```
Each `edge` could have a name for internal usage, the naming convention is `{pipeline-name}-{from-vertex-name}-{to-vertex-name}`.

## Buffers

`Buffer` is `InterStepBuffer`. Each buffer has an owner, which is the vertex who reads from it. Each `udf` and `sink` vertex in a pipeline owns a group of partitioned buffers. Each buffer has a name with the naming convention `{pipeline-name}-{vertex-name}-{index}`, where the `index` is the partition index, starting from 0. This naming convention applies to the buffers of both map and reduce udf vertices.

When multiple vertices connecting to the same vertex, if the `to` vertex is a Map, the data from all the from vertices will be forwarded to the group of partitoned buffers round-robinly. If the `to` vertex is a Reduce, the data from all the from vertices will be forwarded to the group of partitoned buffers based on the partitioning key.

A Source vertex does not have any owned buffers. But a pipeline may have multiple Source vertices, followed by one vertex. Same as above, if the following vertex is a map, the data from all the Source vertices will be forwarded to the group of partitoned buffers round-robinly. If it is a reduce, the data from all the Source vertices will be forwarded to the group of partitoned buffers based on the partitioning key.

## Buckets

`Bucket` is a K/V store (or a pair of stores) used for watermark propagation.

There are 3 types of buckets in a pipeline:

- `Edge Bucket`: Each edge has a bucket, used for edge watermark propagation, no matter if the vertex that the edge leads to is a Map or a Reduce. The naming convention of an edge bucket is `{pipeline-name}-{from-vertex-name}-{to-vertex-name}`.
- `Source Bucket`: Each Source vertex has a source bucket, used for source watermark propagation. The naming convention of a source bucket is `{pipeline-name}-{vertex-name}-SOURCE`.
- `Sink Bucket`: Sitting on the right side of a Sink vertex, used for sink watermark. The naming convention of a sink bucket is `{pipeline-name}-{vertex-name}-SINK`.

## Diagrams

**Map**
![Diagram](../assets/map-edges-buffers-buckets.png)

**Reduce**
![Diagram](../assets/reduce-edges-buffers-buckets.png)
2 changes: 1 addition & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@ nav:
- Overview: "specifications/overview.md"
- specifications/controllers.md
- specifications/autoscaling.md
- specifications/edges-and-buffers.md
- Edges, Buffers and Buckets: "specifications/edges-buffers-buckets.md"
- development/debugging.md
- development/static-code-analysis.md
- development/releasing.md
Expand Down

0 comments on commit 15a229b

Please sign in to comment.