-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(Draft) Initial pipelines docs #18595
Draft
maheshwarip
wants to merge
17
commits into
production
Choose a base branch
from
pipelines-docs
base: production
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
Show all changes
17 commits
Select commit
Hold shift + click to select a range
01564e4
Initial pipelines docs
maheshwarip e0cc62f
Fixed wrangler commands
maheshwarip 1dfcc0d
Improved pipelines index page
maheshwarip 4ab692a
Fixed broken links
maheshwarip 4afaa0e
Fixed typos
maheshwarip 72381fc
improved worker binding documentation
maheshwarip 112140e
Fixed broken links
maheshwarip 51ce611
PIPE-155 Add prefix to Pipelines naming (#18656)
oliy dedafff
Modified worker binding docs
maheshwarip 8ffc4bc
added local development notes
maheshwarip 9f6d180
Renamed worker bindings to .mdx
maheshwarip 31bdfe1
Fixed filenames and broken comments
maheshwarip b4fcf86
updated dates
maheshwarip 69d9dd5
Fixed render issues and page titles
maheshwarip 59a1902
Updated prefix instructions
maheshwarip 2631676
Updated heading
maheshwarip 4511e58
Updated docs for Oauth flow
maheshwarip File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
--- | ||
link: "/pipelines/reference/changelog/" | ||
productName: Pipelines | ||
productLink: "/pipelines/" | ||
productArea: Developer Platform | ||
productAreaLink: "/pipelines/" | ||
entries: | ||
- publish_date: "2025-01-30" | ||
title: Pipelines is now in public beta. | ||
description: |- | ||
Pipelines, a new product to ingest and store real time streaming data, is now in public beta. The public beta is avaiable to any user with a [free or paid Workers plan](/workers/platform/pricing/). Create a Pipeline, and you'll be able to post data to it via HTTP or from a Cloudflare Worker. Pipelines handle batching, buffering, and partitioning the data, before writing it to an R2 bucket of your choice. It's useful to collect clickstream data, or ingest logs from a service. Start building with our [get started guide](/pipelines/getting-started/). | ||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
--- | ||
pcx_content_type: concept | ||
title: Batching | ||
sidebar: | ||
order: 10 | ||
--- | ||
|
||
Pipelines automatically batches requests that are received via HTTP or from a Worker. Batching helps reduce the number of output files written to your destination, which can make them more efficient to query. | ||
|
||
There are three ways to define how requests are batched: | ||
|
||
1. `batch-max-mb`: The maximum amount of data that will be batched, in megabytes. Default is 10 MB, maximum is 100 MB. | ||
2. `batch-max-rows`: The maximum number of rows or events in a batch before data is written. Default, and maximum, is 10,000 rows. | ||
3. `batch-max-seconds`: The maximum duration of a batch before data is written, in seconds. Default is 15 seconds, maximum is 600 seconds. | ||
|
||
All three batch definitions work together. Whichever limit is reached first triggers the delivery of a batch. | ||
|
||
For example, a `batch-max-mb` = 100 MB and a `batch-max-seconds` = 600 means that if 100 MB of events are posted to the Pipeline, the batch will be delivered. However, if it takes longer than 600 seconds for 100 MB of events to be posted, a batch of all the messages that were posted during those 600 seconds will be created and delivered. | ||
|
||
## Batch settings | ||
|
||
You can configure the following batch-level settings to adjust how Pipelines create a batch: | ||
|
||
| Setting | Default | Minimum | Maximum | | ||
| ----------------------------------------- | ----------- | --------- | ----------- | | ||
| Maximum Batch Size `batch-max-mb` | 10 MB | 0.001 MB | 100 MB | | ||
| Maximum Batch Timeout `batch-max-seconds` | 15 seconds | 0 seconds | 600 seconds | | ||
| Maximum Batch Rows `batch-max-rows` | 10,000 rows | 1 row | 10,000 rows | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
--- | ||
title: Configuration | ||
pcx_content_type: navigation | ||
sidebar: | ||
order: 4 | ||
group: | ||
hideIndex: true | ||
--- | ||
|
||
import { DirectoryListing } from "~/components" | ||
|
||
<DirectoryListing /> |
30 changes: 30 additions & 0 deletions
30
src/content/docs/pipelines/configuration/partition-filenames.mdx
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
--- | ||
pcx_content_type: concept | ||
title: Partitions and Prefixes | ||
sidebar: | ||
order: 11 | ||
|
||
--- | ||
|
||
## Partitions | ||
Partitioning organizes data into directories based on specific fields to improve query performance. It helps by reducing the amount of data scanned for queries, enabling faster reads. By default, Pipelines partitions data by event date. This will be customizable in the future. | ||
|
||
For example, the output from a Pipeline in your R2 bucket might look like this: | ||
```sh | ||
- event_date=2024-09-06/hr=15/37db9289-15ba-4e8b-9231-538dc7c72c1e-15.json.gz | ||
- event_date=2024-09-06/hr=15/37db9289-15ba-4e8b-9231-538dc7c72c1e-15.json.gz | ||
``` | ||
|
||
## Prefix | ||
You can specify an optional prefix for all the output files stored in your specified R2 bucket. The data will remain partitioned by date. | ||
|
||
To modify the prefix for a Pipeline using Wrangler: | ||
```sh | ||
wrangler pipelines update <pipeline-name> --prefix "test" | ||
``` | ||
|
||
All the output records generated by your pipeline will be stored under the prefix "test", and will look like this: | ||
```sh | ||
- test/event_date=2024-09-06/hr=15/37db9289-15ba-4e8b-9231-538dc7c72c1e-15.json.gz | ||
- test/event_date=2024-09-06/hr=15/37db9289-15ba-4e8b-9231-538dc7c72c1e-15.json.gz | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
--- | ||
title: Examples | ||
pcx_content_type: navigation | ||
sidebar: | ||
order: 4 | ||
group: | ||
hideIndex: false | ||
--- | ||
|
||
import { DirectoryListing } from "~/components" | ||
|
||
<DirectoryListing /> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,100 @@ | ||
--- | ||
title: Get started | ||
pcx_content_type: get-started | ||
sidebar: | ||
order: 2 | ||
head: | ||
- tag: title | ||
content: Get started | ||
--- | ||
|
||
import { Render, PackageManagers } from "~/components"; | ||
|
||
Pipelines let you ingest real-time data streams, such as click events on a website, or logs from a service. You can send data to a Pipeline from a Worker, or via HTTP. Pipelines handle batching requests and scales in response to your workload. Finally, Pipelines deliver the output into R2 as JSON files, automatically handling partitioning and compression for efficient querying. | ||
|
||
By following this guide, you will: | ||
|
||
1. Create your first Pipeline. | ||
2. Connect it to your R2 bucket. | ||
3. Post data to it via HTTP. | ||
4. Verify the output file written to R2. | ||
|
||
:::note | ||
|
||
Pipelines is in **public beta**, and any developer with a [paid Workers plan](/workers/platform/pricing/#workers) can start using Pipelines immediately. | ||
|
||
::: | ||
|
||
## Prerequisites | ||
|
||
To use Pipelines, you will need: | ||
|
||
<Render file="prereqs" product="workers" /> | ||
|
||
## 1. Set up an R2 bucket | ||
|
||
Pipelines let you ingest records in real time, and load them into an R2 bucket. Create a bucket by following the [get started guide for R2](/r2/get-started/). Save the bucket name for the next step. | ||
|
||
## 2. Create a Pipeline | ||
|
||
To create a Pipeline using Wrangler, run the following command in a the terminal, and specify: | ||
|
||
- The name of your Pipeline | ||
- The name of the R2 bucket you created in step 1 | ||
|
||
```sh | ||
npx wrangler pipelines create [PIPELINE-NAME] --r2 [R2-BUCKET-NAME] | ||
``` | ||
|
||
After running this command, you'll be prompted to authorize Cloudflare Workers Pipelines to create R2 API tokens on your behalf. These tokens are required by your Pipeline. Your Pipeline uses the tokens when loading data into your bucket. You can approve the request through the browser link which will open automatically. | ||
|
||
If you prefer not to authenticate this way, you may pass your [R2 API Tokens](/r2/api/s3/tokens/) to Wrangler: | ||
```sh | ||
npx wrangler pipelines create [PIPELINE-NAME] --r2 [R2-BUCKET-NAME] --access-key-id [ACCESS-KEY-ID] --secret-access-key [SECRET-ACCESS-KEY] | ||
``` | ||
|
||
When choosing a name for your Pipeline: | ||
|
||
1. Ensure it is descriptive and relevant to the type of events you intend to ingest. You cannot change the name of the Pipeline after creating it. | ||
2. Pipeline names must be between 1 and 63 characters long. | ||
3. The name cannot contain special characters outside dashes (`-`). | ||
4. The name must start and end with a letter or a number. | ||
|
||
Once you create your Pipeline, you will receive a HTTP endpoint which you can post data to. You should see output as shown below: | ||
|
||
```sh output | ||
🌀 Authorizing R2 bucket "[R2-BUCKET-NAME]" | ||
🌀 Creating pipeline named "[PIPELINE-NAME]" | ||
✅ Successfully created pipeline [PIPELINE-NAME] with ID [PIPELINE-ID] | ||
|
||
You can now send data to your pipeline with: | ||
curl "https://<PIPELINE-ID>.pipelines.cloudflare.com/" -d '[{ ...JSON_DATA... }]' | ||
``` | ||
|
||
## 3. Post data to your pipeline | ||
|
||
Use a curl command in your terminal to post an array of JSON objects to the endpoint you received in step 1. | ||
|
||
```sh | ||
curl -H "Content-Type:application/json" \ | ||
-d '[{"account_id":"test", "other_data": "test"},{"account_id":"test","other_data": "test2"}]' \ | ||
<HTTP-endpoint> | ||
``` | ||
|
||
Once the Pipeline successfully accepts the data, you will receive a success message. | ||
|
||
Pipelines handle batching the data, so you can continue posting data to the Pipeline. Once a batch is filled up, the data will be partitioned by date, and written to your R2 bucket. | ||
|
||
## 4. Verify in R2 | ||
|
||
Go to the R2 bucket you created in step 1 via [the Cloudflare dashboard](https://dash.cloudflare.com/). You should see a prefix for today's date. Click through, and you will see a file created containing the JSON data you posted in step 3. | ||
|
||
## Summary | ||
|
||
By completing this guide, you have: | ||
|
||
- Created a Pipeline | ||
- Connected the Pipeline with an R2 bucket as destination. | ||
- Posted data to the R2 bucket via HTTP. | ||
- Verified the output in the R2 bucket. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
--- | ||
title: Overview | ||
type: overview | ||
pcx_content_type: overview | ||
sidebar: | ||
order: 1 | ||
badge: | ||
text: Beta | ||
head: | ||
- tag: title | ||
content: Pipelines | ||
--- | ||
|
||
import { CardGrid, Description, Feature, LinkTitleCard, Plan, RelatedProduct } from "~/components"; | ||
|
||
<Description> | ||
|
||
Ingest and load real time data streams to R2, using Cloudflare Pipelines. | ||
|
||
</Description> | ||
|
||
<Plan type="paid" /> | ||
|
||
Pipelines lets you ingest and load real time data streams into R2, without managing any infrastructure. You can send data to a Pipeline data via HTTP, or from a Worker. Your Pipeline will handle batching the data, generating compressed JSON files, and delivering the files to an R2 bucket. | ||
|
||
Refer to the [get started guide](/pipelines/get-started) to start building with Pipelines. | ||
|
||
*** | ||
## Features | ||
|
||
<Feature header="Build your first Pipeline" href="/pipelines/get-started/"> | ||
Create your first Pipeline, and send data to it. | ||
</Feature> | ||
|
||
<Feature header="HTTP as a source" href="/pipelines/sources/http/"> | ||
Each Pipeline generates an HTTP endpoint to use for ingestion | ||
</Feature> | ||
|
||
<Feature header="Batch and deliver records to R2" href="/pipelines/configuration/batching"> | ||
Pipelines buffer records, before creating JSON files and delivering them to R2. | ||
</Feature> | ||
|
||
*** | ||
|
||
## More resources | ||
|
||
<CardGrid> | ||
|
||
<LinkTitleCard title="Limits" href="/pipelines/reference/limits/" icon="document"> | ||
Learn about Pipelines limits. | ||
</LinkTitleCard> | ||
|
||
<LinkTitleCard title="@CloudflareDev" href="https://x.com/cloudflaredev" icon="x.com"> | ||
Follow @CloudflareDev on Twitter to learn about product announcements, and what is new in Cloudflare Workers. | ||
</LinkTitleCard> | ||
|
||
<LinkTitleCard title="Developer Discord" href="https://discord.cloudflare.com" icon="discord"> | ||
Connect with the Workers community on Discord to ask questions, show what you are building, and discuss the platform with other developers. | ||
</LinkTitleCard> | ||
|
||
</CardGrid> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
--- | ||
title: Observability | ||
pcx_content_type: navigation | ||
sidebar: | ||
order: 5 | ||
group: | ||
hideIndex: true | ||
--- | ||
|
||
import { DirectoryListing } from "~/components" | ||
|
||
<DirectoryListing /> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,66 @@ | ||
--- | ||
pcx_content_type: concept | ||
title: Metrics | ||
sidebar: | ||
order: 10 | ||
|
||
--- | ||
|
||
Pipelines metrics are split across three different nodes under `viewer` > `accounts`. Refer to [Explore the GraphQL schema](/analytics/graphql-api/getting-started/explore-graphql-schema/) to learn how to navigate a GraphQL schema and discover which data are available. | ||
|
||
To learn more about the GraphQL Analytics API, refer to [GraphQL Analytics API](/analytics/graphql-api/). | ||
|
||
You can use the GraphQL API to measure metrics for data ingested, as well as data delivered. | ||
|
||
## Write GraphQL queries | ||
|
||
Examples of how to explore your Pipelines metrics. | ||
|
||
### Measure total bytes & records ingested over time period | ||
|
||
```graphql | ||
query PipelineIngestion($accountTag: string!, $pipelineId: string!, $datetimeStart: Time!, $datetimeEnd: Time!) { | ||
viewer { | ||
accounts(filter: {accountTag: $accountTag}) { | ||
pipelinesIngestionAdaptiveGroups( | ||
limit: 10000 | ||
filter: { | ||
pipelineId: $pipelineId | ||
datetime_geq: $datetimeStart | ||
datetime_leq: $datetimeEnd | ||
} | ||
|
||
) | ||
{ | ||
sum { | ||
ingestedBytes, | ||
ingestedRecords, | ||
} | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
|
||
### Measure volume of data delivered | ||
|
||
```graphql | ||
query PipelineDelivery($accountTag: string!, $queueId: string!, $datetimeStart: Time!, $datetimeEnd: Time!) { | ||
viewer { | ||
accounts(filter: {accountTag: $accountTag}) { | ||
pipelinesDeliveryAdaptiveGroups( | ||
limit: 10000 | ||
filter: { | ||
pipelineId: $queueId | ||
datetime_geq: $datetimeStart | ||
datetime_leq: $datetimeEnd | ||
} | ||
) { | ||
sum { | ||
deliveredBytes, | ||
} | ||
} | ||
} | ||
} | ||
} | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
--- | ||
pcx_content_type: navigation | ||
title: Pipelines REST API | ||
sidebar: | ||
order: 10 | ||
|
||
--- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
--- | ||
pcx_content_type: changelog | ||
title: Changelog | ||
changelog_file_name: | ||
- pipelines | ||
sidebar: | ||
order: 99 | ||
|
||
--- | ||
|
||
import { ProductChangelog } from "~/components" | ||
|
||
{/* <!-- Actual content lives in /data/changelogs/pipelines.yaml. Update the file there for new entries to appear here. For more details, refer to https://developers.cloudflare.com/style-guide/documentation-content-strategy/content-types/changelog/#yaml-file --> */} | ||
|
||
<ProductChangelog /> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
--- | ||
pcx_content_type: navigation | ||
title: Platform | ||
sidebar: | ||
order: 8 | ||
group: | ||
hideIndex: true | ||
--- | ||
|
||
import { DirectoryListing } from "~/components" | ||
|
||
<DirectoryListing /> |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be this [getting started guide]?