[Docs] Getting Started - Table of Contents restructure#2408
Conversation
| Cosmos Fundamentals | ||
| =================== | ||
|
|
||
| Information about important cosmos concepts go here |
There was a problem hiding this comment.
| Information about important cosmos concepts go here | |
| Information about important Cosmos concepts go here |
| Run Cosmos <run-cosmos> | ||
| astro | ||
| aws-container-run-job | ||
| gcc | ||
| mwaa | ||
| open-source |
There was a problem hiding this comment.
The structure makes sense, but the headings need tightening for clarity and consistency to look more professional.
Currently, this renders:
- Run Cosmos
- Getting Started on Astro
- Getting Started with Astronomer Cosmos on AWS ECS
- Getting Started on Google Cloud Composer (GCC)
- Getting Started on MWAA
- Getting Started on Open Source Airflow
A few issues:
- Inconsistent phrasing (“Run Cosmos” vs “Getting Started…”)
- Some platform names should be precise (some use acronyms, others both, some contain the platform, others don't)
- Mixed tone (imperative vs descriptive)
Would it make sense to bring Open Source to the top before commercial platforms, and then sort them alphabetically?
| :caption: Cosmos Fundamentals | ||
|
|
||
| Cosmos fundamentals <cosmos-fundamentals> | ||
| Similar dbt and Airflow <dbt-airflow-concepts> |
There was a problem hiding this comment.
It absolutely makes sense to have a section like that in your documentation, but “Similar dbt and Airflow” sounds slightly awkward and could confuse readers.
Something similar to one of the options below may be more clear:
- dbt vs Airflow
- How dbt Compares to Airflow
| :caption: Cosmos Fundamentals | ||
|
|
||
| Cosmos fundamentals <cosmos-fundamentals> |
There was a problem hiding this comment.
This currently renders:
Cosmos Fundamentals
- Cosmos Fundamentals
Could we avoid redundant naming of parents and children?
| .. toctree:: | ||
| :maxdepth: 1 | ||
| :hidden: | ||
| :caption: Execution Modes |
There was a problem hiding this comment.
Please rename this section from “Execution Modes.”
In several of our earlier conversations, we discussed that this term is not meaningful to users who are just getting started with running dbt in Airflow. For someone simply trying to run dbt in Airflow for the first time, “Execution Modes” feels abstract and doesn’t clearly communicate what the section is about.
Could we consider renaming this to something more user-centred and task-oriented? For example:
- Ways to Run dbt in Airflow
- Ways to Run dbt with Cosmos
- Running dbt in Airflow
- How Cosmos Executes dbt
- Choosing How to Run dbt
The goal would be to make the section immediately understandable to someone scanning the "Getting Started guide", rather than introducing terminology that requires additional context.
| @@ -17,11 +35,14 @@ | |||
| Airflow Async Execution Mode <async-execution-mode> | |||
| Watcher Execution Mode <watcher-execution-mode> | |||
| Watcher Kubernetes Execution Mode <watcher-kubernetes-execution-mode> | |||
There was a problem hiding this comment.
This currently renders:
Execution Modes
- Execution Modes
- Airflow and dbt dependencies conflicts
- Docker Execution Mode
- Azure Container instance execution mode
- AWS Container Run Job Execution Mode
- GCP Cloud Run Job Execution Mode
- Airflow Async Execution Mode
- Watcher Execution Mode
- Watcher Kubernetes Execution Mode
I think this part of the table of contents can cause confusion and reduce scannability for new users. A few concerns:
- Redundant Parent and Child Naming
- Overuse of the Phrase “Execution Mode”
- Inconsistent Naming Conventions
- Concept vs Implementation Mixing
- Not Beginner-Friendly in a “Getting Started” Context
On the inconsistent naming:
- There’s an inconsistency in how platforms and concepts are presented:
- Some are infrastructure-based (Docker, Azure Container Instance)
- Some are cloud-job based (AWS Container Run Job, GCP Cloud Run Job)
- Some are behaviour-based (Airflow Async, Watcher)
- Some are dependency-related (Airflow and dbt dependency conflicts)
- These are not all in the same conceptual category, yet they’re grouped under the same heading as if they were equivalent “modes.”
On (4), concept x implementation, the list mixes:
- Conceptual topics (dependency conflicts)
- Execution strategies (async, watcher)
- Infrastructure backends (Docker, Cloud Run, etc.)
This makes the mental model unclear. A user might reasonably ask:
- Are these all mutually exclusive modes?
- Are there some infrastructure options?
- Are some features layered on top of others?
Lastly, on (5), for someone just trying to run dbt in Airflow, seeing:
- Watcher Kubernetes Execution Mode
- AWS Container Run Job Execution Mode
…is overwhelming and abstract.
Instead of a smooth getting-started experience, it introduces internal terminology before grounding the user in the task.
Maybe we should break it down into two parts:
dbt Installation
- Installing dbt (that could cover: same virtualenv as Airflow via requirements.txt, dedicated user-managed virtualenv via Dockerfile, dedicated Cosmos-managed virtualenv using ExecutionMode.VIRTUALENV or by creating a container image that contains dbt
- Installing dbt dependencies (
ProjectConfig.install_dbt_deps)
Running dbt in Airflow
- Run dbt in the Airflow worker environment
- Standard Execution
- Watcher Execution
- Async Execution
- Running dbt in a container
- Kubernetes
- Kubernetes Watcher
- Docker (highlight in this section, it is not compatible with Astro due to Docker in Docker issues!)
- AWS ECS
- AWS EKS
- Azure Container Instance
- GCP Cloud Run
| Watcher Execution Mode <watcher-execution-mode> | ||
| Watcher Kubernetes Execution Mode <watcher-kubernetes-execution-mode> | ||
| dbt and Airflow Similar Concepts <dbt-airflow-concepts> | ||
|
|
There was a problem hiding this comment.
Lastly, I strongly believe we need a section in the "Getting Started" table of contents highlighting two topics that are very important:
Connecting to your database
- Using
profiles.yml - Using Airflow Connections
Bringing your dbt Project into Airflow
- Choosing task granularity
- One task per dbt node # DbtDag
- Combining dbt and non-dbt tasks # DbtTaskGroup
- Running multiple dbt nodes in a single task # Instantiating operators
- Choosing a parsing strategy
- Using
manifest.json - Using
dbt ls
- Using
- Selecting what to run
- Using dbt selectors in Cosmos
- Enabling Tests
- After each model
- At the end of the pipeline
- Using
dbt build - Disabling tests
- Managing Sources
- Selecting sources to run
- Running source freshness checks
Description
Restructures the existing Getting Started Section in docs to better organize the contents into expandable sections, without requiring redirects.
This PR does not address moving
Execution Modesto another subdirectory, but focuses on adapting the docs currently visible in the production environment.Related Issue(s)
N/A
Breaking Change?
Since no files are moving directories in the backend, no broken links anticipated.
Checklist