Prefect: Index page and usage guide #299

amotl · 2025-09-14T21:15:39Z

About

Continue adding integration guides from the community forum.

Preview

https://cratedb-guide--299.org.readthedocs.build/integrate/prefect/

References

Consolidate Integration Guides I vs. II #102

coderabbitai · 2025-09-14T21:15:46Z

Walkthrough

Replaces an ETL grid entry in docs/ingest/etl/index.md (nifi → prefect) and adds Prefect integration docs: a landing page and a detailed usage tutorial demonstrating a CSV-to-CrateDB ETL flow using Prefect tasks/flow with SQLAlchemy. No code or API changes.

Changes

Cohort / File(s)	Summary
ETL grid update `docs/ingest/etl/index.md`	Replaced {ref}`nifi` with {ref}`prefect` in the Dataflow / Pipeline / Code-first grid; removed nifi entry, added prefect entry; caption unchanged.
Prefect integration landing `docs/integrate/prefect/index.md`	New Prefect integration page with header, logo, About, and Learn sections; links to a tutorial and Prefect site; content-only addition.
Prefect tutorial/usage `docs/integrate/prefect/usage.md`	New tutorial showing a CSV → transform → CrateDB ETL using Prefect tasks/flow and SQLAlchemy; includes prerequisites, setup steps, code example, screenshots, and references.

Sequence Diagram(s)

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

Integrate: Add sections about dlt and ingestr #268 — Also updates the ETL tooling grid in docs/ingest/etl/index.md (adds dlt and ingestr), overlapping with this page’s entry changes.
Naming things: Simplify directory layout at "Integrate" section #249 — Renames apache-nifi → nifi in the same ETL grid targeted here.
ETL/CDC: Guidance, Layout / Dissolve walls of links #239 — Reorganizes the ETL integrations grid/layout in docs/ingest/etl/index.md, related to the area modified in this PR.

Suggested reviewers

hammerhead
kneth
karynzv

Poem

I hop through docs with nimble feet,
Swapping NiFi’s beat for Prefect’s suite. 🐇
New flows arise—extract, transform, load—
To CrateDB along the road.
Carrots in cache, tasks in a row,
Ship it, and watch the pipelines flow! 🚀

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title "Prefect: Index page and usage guide" is concise and accurately summarizes the primary change in the PR — adding a Prefect index page and a usage/tutorial guide — so it is relevant and clear for teammates scanning history.
Description Check	✅ Passed	The PR description states the objective ("Continue adding integration guides"), includes a preview link and a reference to GH-102, and therefore is related to the documentation changes in this changeset.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.

✨ Finishing touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch prefect

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c679f36 and b4bc57c.

📒 Files selected for processing (3)

docs/ingest/etl/index.md (1 hunks)
docs/integrate/prefect/index.md (1 hunks)
docs/integrate/prefect/usage.md (1 hunks)

✅ Files skipped from review due to trivial changes (1)

docs/integrate/prefect/usage.md

🚧 Files skipped from review as they are similar to previous changes (2)

docs/integrate/prefect/index.md
docs/ingest/etl/index.md

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Build docs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

amotl · 2025-09-16T16:39:43Z

@coderabbitai review and please convert all spots to active voice.

coderabbitai · 2025-09-16T16:39:59Z

@amotl I'll review the PR and identify all instances that need conversion to active voice. Let me start by reviewing all the changes.

🧠 Learnings used

Learnt from: amotl
PR: crate/cratedb-guide#0
File: :0-0
Timestamp: 2025-09-15T17:42:25.449Z
Learning: When reviewing documentation, use active voice instead of first-person or passive voice, and emit corresponding suggestions.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (10)

docs/integrate/prefect/tutorial.md (7)

16-19: Add prerequisites for the Python DB stack.

Include SQLAlchemy and the CrateDB SQLAlchemy dialect so to_sql works out of the box.

Apply this diff:

 * **Python 3.x**: Prefect is a Python-based workflow management system, so you'll need Python installed on your machine.
 * **CrateDB**: To work with CrateDB, create a new cluster in [CrateDB Cloud](https://console.cratedb.cloud/). You can choose the CRFEE tier cluster that does not require any payment information.
-* **Prefect**: Install Prefect using pip by running the following command in your terminal or command prompt: `pip install -U prefect`
+* **Prefect**: `pip install -U prefect`
+* **SQLAlchemy + CrateDB dialect**: `pip install -U sqlalchemy sqlalchemy-cratedb`

20-25: Typo: “Perfect” → “Prefect”; simplify tracking link.

Fix spelling in the heading and drop tracking parameters from the Cloud URL.

Apply this diff:

-## Getting started with Perfect
+## Getting started with Prefect
@@
-1. To get started with Prefect, you need to connect to Prefect’s API: the easiest way is to sign up for a free forever Cloud account at [https://app.prefect.cloud/](https://app.prefect.cloud/?deviceId=cfc80edd-a234-4911-a25e-ff0d6bb2c32a&deviceId=cfc80edd-a234-4911-a25e-ff0d6bb2c32a).
+1. To get started with Prefect, connect to Prefect’s API by signing up for a free Cloud account at [https://app.prefect.cloud/](https://app.prefect.cloud/).

6-10: Prefer active voice and tighten phrasing.

Shift to active voice per PR objective.

Apply this diff:

-[Prefect](https://www.prefect.io/opensource/) is an open-source workflow automation and orchestration tool for data engineering, machine learning, and other data-related tasks. It allows you to define, schedule, and execute complex data workflows in a straightforward manner.
+[Prefect](https://www.prefect.io/opensource/) is an open-source workflow orchestration tool for data engineering, machine learning, and other data tasks. You define, schedule, and execute complex data workflows with straightforward Python code.
@@
-Prefect workflows are defined using *Python code*. Each step in the workflow is represented as a "task," and tasks can be connected to create a directed acyclic graph (DAG). The workflow defines the sequence of task execution and can include conditional logic and branching. Furthermore, Prefect provides built-in scheduling features that set up cron-like schedules for the flow. You can also parameterize your flow, allowing a run of the same flow with different input values.
+You define Prefect workflows in Python. Each step is a “task,” and tasks form a directed acyclic graph (DAG). Flows can branch and include conditional logic. Prefect also provides built‑in scheduling and flow parameters so you can run the same flow with different inputs.
@@
-This tutorial will explore how CrateDB and Prefect come together to streamline data ingestion, transformation, and loading (ETL) processes with a few lines of Python code.
+This tutorial shows how to combine CrateDB and Prefect to streamline ETL with a few lines of Python.

29-30: Avoid first‑person “We’ll dive …”; use neutral, direct phrasing.

Matches the style used across docs.

Apply this diff:

-We'll dive into the basics of Prefect by creating a simple workflow with tasks that fetch data from a source, perform basic transformations, and load it into CrateDB. For this example, we will use [the yellow taxi trip data](https://github.com/DataTalksClub/nyc-tlc-data/releases/download/yellow/yellow_tripdata_2021-01.csv.gz), which includes pickup time, geo-coordinates, number of passengers, and several other variables. The goal is to create a workflow that does a basic transformation on this data and inserts it into a CrateDB table named `trip_data`:
+This section walks you through a simple workflow that fetches data, applies a basic transformation, and loads it into CrateDB. It uses the [yellow taxi trip dataset](https://github.com/DataTalksClub/nyc-tlc-data/releases/download/yellow/yellow_tripdata_2021-01.csv.gz), which includes pickup time, geo‑coordinates, passenger count, and other fields. The goal is to write transformed data to a CrateDB table named `trip_data`:

63-71: Fix function names, arguments, and list indentation (MD005).

Align prose with the code (extract_data, not read_data; load_data("trip_data", data)) and indent the nested list consistently to satisfy markdownlint.

Apply this diff:

-1. We start defining the flow by importing the necessary modules, including `prefect` for working with workflows, `pandas` for data manipulation, and `crate` for interacting with CrateDB.
-2. Next, we specify the connection parameters for CrateDB and the URL for a file containing the dataset. You should modify these values according to your CrateDB Cloud setup.
-3. We define three tasks using the `@task` decorator: `extract_data(url)`, `transform_data(data)`, and `load_data(table_name, transformed_data)`. Each task represents a unit of work in the workflow:
-  1. The `read_data()` task loads the data from the CSV file to a `pandas` data frame.
-  2. The `transform_data(data)` task takes the data frame and returns the data frame with entries where the `passenger_count` value is different than 0.
-  3. The `load_data(transformed_data)` task connects to the CrateDB and loads data into the `trip_data` table.
-4. We define the workflow, name it “ETL workflow“, and specify the sequence of tasks: `extract_data()`, `transform_data(data)`, and `load_data(table_name, transformed_data)`.
-5. Finally, we execute the flow by calling `main_flow()`. This runs the workflow, and each task is executed in the order defined.
+1. Start by importing the necessary modules: `prefect` for workflows, `pandas` for data manipulation, and SQLAlchemy for the database connection.
+2. Specify the CrateDB connection URI and the dataset URL. Modify these values for your CrateDB Cloud setup.
+3. Define three tasks with the `@task` decorator—`extract_data(url)`, `transform_data(df)`, and `load_data(table_name, df)`:
+
+    1. `extract_data()` reads the CSV into a pandas DataFrame.
+    2. `transform_data(df)` filters out rows where `passenger_count` is 0.
+    3. `load_data(table_name, df)` writes the data to the `trip_data` table in CrateDB.
+
+4. Define the flow, name it “ETL workflow,” and order the tasks: `extract_data()`, `transform_data()`, then `load_data()`.
+5. Execute the flow by calling `main_flow()`. Prefect runs each task in order.

72-78: Minor copy edits: “Flow Runs”; tighten sentences.

And prefer “CSV” capitalization.

Apply this diff:

-When you run this Python script, the workflow will read the trip data from a `csv` file, transform it, and load it into the CrateDB table. You can see the state of the flow run in the *Flows Runs* tab in Prefect UI:
+When you run the script, the workflow reads the trip data from a CSV file, transforms it, and loads it into CrateDB. You can see the state of the run in the *Flow Runs* tab in the Prefect UI:
@@
-You can enrich the ETL pipeline with many advanced features available in Prefect such as parameterization, error handling, retries, and more. Finally, after the successful execution of the workflow, you can query the data in the CrateDB:
+You can enrich the pipeline with Prefect features such as parameters, error handling, and retries. After a successful run, query the data in CrateDB:

82-84: Active voice in wrap‑up.

Small tweak for consistency.

Apply this diff:

-Throughout this tutorial, you made a simple Prefect workflow, defined tasks, and orchestrated data transformations and loading into CrateDB. Both tools offer extensive feature sets that you can use to optimize and scale your data workflows further.
+In this tutorial, you created a simple Prefect workflow, defined tasks, and orchestrated data transformations and loading into CrateDB. Both tools offer extensive features that help you optimize and scale your data workflows.

docs/integrate/prefect/index.md (3)

11-12: Optional: Sharpen tagline (active voice).

Not required, but reads more crisply.

Apply this diff:
-Modern Workflow Orchestration.
+Orchestrate modern data workflows in Python.
17-25: Active voice and tighten copy in “About.”

Shorten and switch to direct voice.

Apply this diff:
-[Prefect] is a workflow orchestration framework for building resilient data
-pipelines in Python.
+Use [Prefect] to orchestrate resilient data pipelines in Python.
@@
-Give your team the power to build reliable workflows without sacrificing
-development speed. Prefect Core combines the freedom of pure Python
-development with production-grade resilience, putting you in control of
-your data operations. Transform your code into scalable workflows that
-deliver consistent results.
+Build reliable workflows without sacrificing development speed. Prefect combines the freedom of pure Python with production‑grade resilience, putting you in control of your data operations. Turn code into scalable workflows that deliver consistent results.
4-6: Asset sourcing (optional).

The logo hotlinks an external host. Consider adding the asset to the repo or using an official brand CDN to avoid broken images during docs builds.

If you want, I can open a follow‑up issue to track moving this asset in‑repo.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 591e346 and 62f2889.

📒 Files selected for processing (4)

docs/ingest/etl/index.md (1 hunks)
docs/integrate/index.md (1 hunks)
docs/integrate/prefect/index.md (1 hunks)
docs/integrate/prefect/tutorial.md (1 hunks)

🧰 Additional context used

🪛 markdownlint-cli2 (0.17.2)

docs/integrate/prefect/tutorial.md

66-66: Inconsistent indentation for list items at the same level
Expected: 0; Actual: 2

(MD005, list-indent)

67-67: Inconsistent indentation for list items at the same level
Expected: 0; Actual: 2

(MD005, list-indent)

68-68: Inconsistent indentation for list items at the same level
Expected: 0; Actual: 2

(MD005, list-indent)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Build docs

🔇 Additional comments (2)

docs/ingest/etl/index.md (1)

56-56: LGTM: Cross-link works and ordering looks correct.

{ref}prefect`` resolves to the new integration page, and placement under “Dataflow / Pipeline / Code-first” is consistent.

docs/integrate/index.md (1)

62-62: LGTM: Toctree inclusion is correct.

prefect/index appears in alphabetical position between “Power BI” and “prometheus.” No further changes needed.

docs/integrate/prefect/tutorial.md

amotl force-pushed the prefect branch from 177b8db to ea8fac1 Compare September 15, 2025 22:31

amotl added reorganize Moving content around, inside and between other systems. cross linking Linking to different locations of the documentation. labels Sep 15, 2025

amotl marked this pull request as ready for review September 15, 2025 22:31

amotl force-pushed the prefect branch from ea8fac1 to f120370 Compare September 16, 2025 16:41

amotl requested a review from hammerhead September 16, 2025 17:09

amotl force-pushed the prefect branch from f120370 to 62f2889 Compare September 16, 2025 17:30

coderabbitai bot reviewed Sep 16, 2025

View reviewed changes

docs/integrate/prefect/tutorial.md Show resolved Hide resolved

amotl mentioned this pull request Sep 17, 2025

Consolidate Integration Guides I vs. II #102

Open

amotl requested a review from karynzv September 17, 2025 20:11

amotl force-pushed the prefect branch 2 times, most recently from c679f36 to add4422 Compare September 18, 2025 14:45

amotl requested a review from kneth September 19, 2025 21:25

marijaselakovic and others added 3 commits September 23, 2025 23:00

Prefect: Index page and starter tutorial

5a02f04

Prefect: Implement suggestions by CodeRabbit

8adc7d3

Prefect: s/tutorial/usage/

b4bc57c

amotl force-pushed the prefect branch from add4422 to b4bc57c Compare September 23, 2025 21:03

amotl changed the title ~~Prefect: Index page and starter tutorial~~ Prefect: Index page and usage guide Sep 23, 2025

amotl merged commit 8f3d2eb into main Sep 30, 2025
3 checks passed

amotl deleted the prefect branch September 30, 2025 11:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Prefect: Index page and usage guide #299

Prefect: Index page and usage guide #299

Uh oh!

amotl commented Sep 14, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Sep 14, 2025 •

edited

Loading

Uh oh!

amotl commented Sep 16, 2025

Uh oh!

coderabbitai bot commented Sep 16, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Prefect: Index page and usage guide #299

Prefect: Index page and usage guide #299

Uh oh!

Conversation

amotl commented Sep 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

About

Preview

References

Uh oh!

coderabbitai bot commented Sep 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

amotl commented Sep 16, 2025

Uh oh!

coderabbitai bot commented Sep 16, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

amotl commented Sep 14, 2025 •

edited

Loading

coderabbitai bot commented Sep 14, 2025 •

edited

Loading