pytask-dev · tobiasraabe · Jan 1, 2026 · Jan 1, 2026 · Jan 1, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -7,7 +7,9 @@ releases are available on [PyPI](https://pypi.org/project/pytask) and
 
 ## Unreleased
 
-- Nothing yet.
+- {issue}`735` adds the `pytask.lock` lockfile as the primary state backend with a
+  portable format, documentation, and a one-run SQLite fallback when no lockfile
+  exists.
 
 ## 0.5.8 - 2025-12-30
 

diff --git a/docs/source/how_to_guides/index.md b/docs/source/how_to_guides/index.md
@@ -13,6 +13,7 @@ maxdepth: 1
 ---
 migrating_from_scripts_to_pytask
 interfaces_for_dependencies_products
+portability
 remote_files
 functional_interface
 capture_warnings

diff --git a/docs/source/how_to_guides/portability.md b/docs/source/how_to_guides/portability.md
@@ -0,0 +1,47 @@
+# Portability
+
+This guide explains how to keep pytask state portable across machines.
+
+## Two Portability Concerns
+
+1. **Portable IDs**
+
+   - The lockfile stores task and node IDs.
+   - IDs must be project‑relative and stable across machines.
+   - pytask builds these IDs from the project root; no action required for most users.
+
+1. **Portable State Values**
+
+   - `state` is opaque and comes from `PNode.state()` / `PTask.state()`.
+   - Content hashes are portable; timestamps or absolute paths are not.
+   - Custom nodes should avoid machine‑specific paths in `state()`.
+
+## Tips
+
+- Commit `pytask.lock` to your repository. If you ship the repository together with the
+  build artifacts (for example, a zipped project folder including `pytask.lock` and the
+  produced files), you can move it to another machine and runs will skip recomputation.
+- Prefer file content hashes over timestamps for custom nodes.
+- For `PythonNode` values that are not natively stable, provide a custom hash function.
+- If inputs live outside the project root, IDs will include `..` segments to remain
+  relative; this is expected.
+
+## Cleaning Up the Lockfile
+
+`pytask.lock` is updated incrementally. Entries are only replaced when the corresponding
+tasks run. If tasks are removed or renamed, their old entries remain as stale data and
+are ignored.
+
+To clean up stale entries without deleting the file, run:
+
+```
+pytask build --clean-lockfile
+```
+
+This rewrites the lockfile after a successful build with only the currently collected
+tasks and their current state values.
+
+## Legacy SQLite
+
+SQLite is the old state format. It is used only when no lockfile exists, and the
+lockfile is written during that run. Subsequent runs rely on the lockfile.
diff --git a/docs/source/reference_guides/configuration.md b/docs/source/reference_guides/configuration.md
@@ -44,11 +44,12 @@ are welcome to also support macOS.
 
 ````{confval} database_url
 
-pytask uses a database to keep track of tasks, products, and dependencies over runs. By
-default, it will create an SQLite database in the project's root directory called
-`.pytask/pytask.sqlite3`. If you want to use a different name or a different dialect
-[supported by sqlalchemy](https://docs.sqlalchemy.org/en/latest/core/engines.html#backend-specific-urls),
-use either {option}`pytask build --database-url` or `database_url` in the config.
+SQLite is the legacy state format. pytask now uses `pytask.lock` as the primary state
+backend and only consults the database when no lockfile exists. During that first run,
+the lockfile is written and subsequent runs use the lockfile only.
+
+The `database_url` option remains for backwards compatibility and controls the legacy
+database location and dialect ([supported by sqlalchemy](https://docs.sqlalchemy.org/en/latest/core/engines.html#backend-specific-urls)).
 
 ```toml
 database_url = "sqlite:///.pytask/pytask.sqlite3"

diff --git a/docs/source/reference_guides/index.md b/docs/source/reference_guides/index.md
@@ -9,6 +9,7 @@ maxdepth: 1
 ---
 command_line_interface
 configuration
+lockfile
 hookspecs
 api
 ```
diff --git a/docs/source/reference_guides/lockfile.md b/docs/source/reference_guides/lockfile.md
@@ -0,0 +1,86 @@
+# The Lock File
+
+`pytask.lock` is the default state backend. It stores task state in a portable,
+git-friendly format so runs can be resumed or shared across machines.
+
+```{note}
+SQLite is the legacy format. It is still read when no lockfile exists, and a lockfile
+is written during that first run. Subsequent runs use the lockfile only.
+```
+
+## Example
+
+```toml
+# This file is automatically @generated by pytask.
+# It is not intended for manual editing.
+
+lock-version = "1"
+
+[[task]]
+id = "src/tasks/data.py::task_clean_data"
+state = "f9e8d7c6..."
+
+[task.depends_on]
+"data/raw/input.csv" = "e5f6g7h8..."
+
+[task.produces]
+"data/processed/clean.parquet" = "m3n4o5p6..."
+```
+
+## Behavior
+
+On each run, pytask:
+
+1. Reads `pytask.lock` (if present).
+1. Compares current dependency/product/task `state()` to stored `state`.
+1. Skips tasks whose states match; runs the rest.
+1. Updates `pytask.lock` after each completed task (atomic write).
+
+`pytask-parallel` uses a single coordinator to write the lock file, so writes are
+serialized even when tasks execute in parallel.
+
+## Portability
+
+There are two portability concerns:
+
+1. **IDs**: Lockfile IDs must be project‑relative and stable across machines.
+1. **State values**: `state` is opaque; portability depends on each node’s `state()`
+   implementation. Content hashes are portable; timestamps are not.
+
+## Maintenance
+
+Use `pytask build --clean-lockfile` to rewrite `pytask.lock` with only currently
+collected tasks. The rewrite happens after a successful build and recomputes current
+state values without executing tasks again.
+
+## File Format Reference
+
+### Top-Level
+
+| Field          | Required | Description                      |
+| -------------- | -------- | -------------------------------- |
+| `lock-version` | Yes      | Schema version (currently `"1"`) |
+
+### Task Entry
+
+| Field        | Required | Description                   |
+| ------------ | -------- | ----------------------------- |
+| `id`         | Yes      | Portable task identifier      |
+| `state`      | Yes      | Opaque state string           |
+| `depends_on` | No       | Mapping from node id to state |
+| `produces`   | No       | Mapping from node id to state |
+
+### Dependency/Product Entry
+
+Node entries are stored as key-value pairs inside `depends_on` and `produces`, where the
+key is the node id and the value is the node state string.
+
+## Version Compatibility
+
+Only lock-version `"1"` is supported. Older or newer versions error with a clear upgrade
+message.
+
+## Implementation Notes
+
+- The lockfile is encoded/decoded with `msgspec`’s TOML support.
+- Writes are atomic: pytask writes a temporary file and replaces `pytask.lock`.
diff --git a/docs/source/tutorials/making_tasks_persist.md b/docs/source/tutorials/making_tasks_persist.md
@@ -9,7 +9,7 @@ In this case, you can apply the {func}`@pytask.mark.persist <pytask.mark.persist
 decorator to the task, which will skip its execution as long as all products exist.
 
 Internally, the state of the dependencies, the source file, and the products are updated
-in the database such that the subsequent execution will skip the task successfully.
+in the lockfile such that the subsequent execution will skip the task successfully.
 
 ## When is this useful?
 

diff --git a/pyproject.toml b/pyproject.toml
@@ -30,6 +30,7 @@ dependencies = [
     "pluggy>=1.3.0",
     "rich>=13.8.0",
     "sqlalchemy>=2.0.31",
+    "msgspec[toml]>=0.18.6",
     'tomli>=1; python_version < "3.11"',
     'typing-extensions>=4.8.0; python_version < "3.11"',
     "universal-pathlib>=0.2.2",

diff --git a/src/_pytask/build.py b/src/_pytask/build.py
@@ -75,6 +75,7 @@ def build(  # noqa: C901, PLR0912, PLR0913, PLR0915
     debug_pytask: bool = False,
     disable_warnings: bool = False,
     dry_run: bool = False,
+    clean_lockfile: bool = False,
     editor_url_scheme: Literal["no_link", "file", "vscode", "pycharm"]  # noqa: PYI051
     | str = "file",
     explain: bool = False,
@@ -124,6 +125,8 @@ def build(  # noqa: C901, PLR0912, PLR0913, PLR0915
         Whether warnings should be disabled and not displayed.
     dry_run
         Whether a dry-run should be performed that shows which tasks need to be rerun.
+    clean_lockfile
+        Whether the lockfile should be rewritten to only include collected tasks.
     editor_url_scheme
         An url scheme that allows to click on task names, node names and filenames and
         jump right into you preferred editor to the right line.
@@ -192,6 +195,7 @@ def build(  # noqa: C901, PLR0912, PLR0913, PLR0915
             "debug_pytask": debug_pytask,
             "disable_warnings": disable_warnings,
             "dry_run": dry_run,
+            "clean_lockfile": clean_lockfile,
             "editor_url_scheme": editor_url_scheme,
             "explain": explain,
             "expression": expression,
@@ -341,6 +345,12 @@ def build(  # noqa: C901, PLR0912, PLR0913, PLR0915
     default=False,
     help="Execute a task even if it succeeded successfully before.",
 )
+@click.option(
+    "--clean-lockfile",
+    is_flag=True,
+    default=False,
+    help="Rewrite the lockfile with only currently collected tasks.",
+)
 @click.option(
     "--explain",
     is_flag=True,

diff --git a/src/_pytask/console.py b/src/_pytask/console.py
@@ -111,10 +111,26 @@ def render_to_string(
     example, render warnings with colors or text in exceptions.
 
     """
-    buffer = console.render(renderable)
+    render_console = console
+    if not strip_styles and console.no_color and console.color_system is not None:
+        theme: Theme | None
+        try:
+            theme = Theme(console._theme_stack._entries[-1])
+        except (AttributeError, IndexError, TypeError):
+            theme = None
+        render_console = Console(
+            color_system=console.color_system,  # type: ignore[invalid-argument-type]
+            force_terminal=True,
+            width=console.width,
+            no_color=False,
+            markup=getattr(console, "_markup", True),
+            theme=theme,
+        )
+
+    buffer = render_console.render(renderable)
     if strip_styles:
         buffer = Segment.strip_styles(buffer)
-    return console._render_buffer(buffer)
+    return render_console._render_buffer(buffer)
 
 
 def format_task_name(task: PTask, editor_url_scheme: str) -> Text:

diff --git a/src/_pytask/execute.py b/src/_pytask/execute.py
@@ -20,9 +20,6 @@
 from _pytask.dag_utils import TopologicalSorter
 from _pytask.dag_utils import descending_tasks
 from _pytask.dag_utils import node_and_neighbors
-from _pytask.database_utils import get_node_change_info
-from _pytask.database_utils import has_node_changed
-from _pytask.database_utils import update_states_in_database
 from _pytask.exceptions import ExecutionError
 from _pytask.exceptions import NodeLoadError
 from _pytask.exceptions import NodeNotFoundError
@@ -46,6 +43,9 @@
 from _pytask.pluginmanager import hookimpl
 from _pytask.provisional_utils import collect_provisional_products
 from _pytask.reports import ExecutionReport
+from _pytask.state import get_node_change_info
+from _pytask.state import has_node_changed
+from _pytask.state import update_states
 from _pytask.traceback import remove_traceback_from_exc_info
 from _pytask.tree_util import tree_leaves
 from _pytask.tree_util import tree_map
@@ -196,7 +196,7 @@ def pytask_execute_task_setup(session: Session, task: PTask) -> None:  # noqa: C
             # Check if node changed and collect detailed info if in explain mode
             if session.config["explain"]:
                 has_changed, reason, details = get_node_change_info(
-                    task=task, node=node, state=node_state
+                    session=session, task=task, node=node, state=node_state
                 )
                 if has_changed:
                     needs_to_be_executed = True
@@ -222,7 +222,9 @@ def pytask_execute_task_setup(session: Session, task: PTask) -> None:  # noqa: C
                         )
                     )
             else:
-                has_changed = has_node_changed(task=task, node=node, state=node_state)
+                has_changed = has_node_changed(
+                    session=session, task=task, node=node, state=node_state
+                )
                 if has_changed:
                     needs_to_be_executed = True
 
@@ -232,6 +234,8 @@ def pytask_execute_task_setup(session: Session, task: PTask) -> None:  # noqa: C
 
     if not needs_to_be_executed:
         collect_provisional_products(session, task)
+        if not session.config["dry_run"] and not session.config["explain"]:
+            update_states(session, task)
         raise SkippedUnchanged
 
     # Create directory for product if it does not exist. Maybe this should be a `setup`
@@ -326,7 +330,7 @@ def pytask_execute_task_process_report(
     task = report.task
 
     if report.outcome == TaskOutcome.SUCCESS:
-        update_states_in_database(session, task.signature)
+        update_states(session, task)
     elif report.exc_info and isinstance(report.exc_info[1], WouldBeExecuted):
         report.outcome = TaskOutcome.WOULD_BE_EXECUTED