Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,9 @@ releases are available on [PyPI](https://pypi.org/project/pytask) and

## Unreleased

- Nothing yet.
- {issue}`735` adds the `pytask.lock` lockfile as the primary state backend with a
portable format, documentation, and a one-run SQLite fallback when no lockfile
exists.

## 0.5.8 - 2025-12-30

Expand Down
1 change: 1 addition & 0 deletions docs/source/how_to_guides/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ maxdepth: 1
---
migrating_from_scripts_to_pytask
interfaces_for_dependencies_products
portability
remote_files
functional_interface
capture_warnings
Expand Down
47 changes: 47 additions & 0 deletions docs/source/how_to_guides/portability.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# Portability

This guide explains how to keep pytask state portable across machines.

## Two Portability Concerns

1. **Portable IDs**

- The lockfile stores task and node IDs.
- IDs must be project‑relative and stable across machines.
- pytask builds these IDs from the project root; no action required for most users.

1. **Portable State Values**

- `state` is opaque and comes from `PNode.state()` / `PTask.state()`.
- Content hashes are portable; timestamps or absolute paths are not.
- Custom nodes should avoid machine‑specific paths in `state()`.

## Tips

- Commit `pytask.lock` to your repository. If you ship the repository together with the
build artifacts (for example, a zipped project folder including `pytask.lock` and the
produced files), you can move it to another machine and runs will skip recomputation.
- Prefer file content hashes over timestamps for custom nodes.
- For `PythonNode` values that are not natively stable, provide a custom hash function.
- If inputs live outside the project root, IDs will include `..` segments to remain
relative; this is expected.

## Cleaning Up the Lockfile

`pytask.lock` is updated incrementally. Entries are only replaced when the corresponding
tasks run. If tasks are removed or renamed, their old entries remain as stale data and
are ignored.

To clean up stale entries without deleting the file, run:

```
pytask build --clean-lockfile
```

This rewrites the lockfile after a successful build with only the currently collected
tasks and their current state values.

## Legacy SQLite

SQLite is the old state format. It is used only when no lockfile exists, and the
lockfile is written during that run. Subsequent runs rely on the lockfile.
11 changes: 6 additions & 5 deletions docs/source/reference_guides/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,11 +44,12 @@ are welcome to also support macOS.

````{confval} database_url

pytask uses a database to keep track of tasks, products, and dependencies over runs. By
default, it will create an SQLite database in the project's root directory called
`.pytask/pytask.sqlite3`. If you want to use a different name or a different dialect
[supported by sqlalchemy](https://docs.sqlalchemy.org/en/latest/core/engines.html#backend-specific-urls),
use either {option}`pytask build --database-url` or `database_url` in the config.
SQLite is the legacy state format. pytask now uses `pytask.lock` as the primary state
backend and only consults the database when no lockfile exists. During that first run,
the lockfile is written and subsequent runs use the lockfile only.

The `database_url` option remains for backwards compatibility and controls the legacy
database location and dialect ([supported by sqlalchemy](https://docs.sqlalchemy.org/en/latest/core/engines.html#backend-specific-urls)).

```toml
database_url = "sqlite:///.pytask/pytask.sqlite3"
Expand Down
1 change: 1 addition & 0 deletions docs/source/reference_guides/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ maxdepth: 1
---
command_line_interface
configuration
lockfile
hookspecs
api
```
86 changes: 86 additions & 0 deletions docs/source/reference_guides/lockfile.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
# The Lock File

`pytask.lock` is the default state backend. It stores task state in a portable,
git-friendly format so runs can be resumed or shared across machines.

```{note}
SQLite is the legacy format. It is still read when no lockfile exists, and a lockfile
is written during that first run. Subsequent runs use the lockfile only.
```

## Example

```toml
# This file is automatically @generated by pytask.
# It is not intended for manual editing.

lock-version = "1"

[[task]]
id = "src/tasks/data.py::task_clean_data"
state = "f9e8d7c6..."

[task.depends_on]
"data/raw/input.csv" = "e5f6g7h8..."

[task.produces]
"data/processed/clean.parquet" = "m3n4o5p6..."
```

## Behavior

On each run, pytask:

1. Reads `pytask.lock` (if present).
1. Compares current dependency/product/task `state()` to stored `state`.
1. Skips tasks whose states match; runs the rest.
1. Updates `pytask.lock` after each completed task (atomic write).

`pytask-parallel` uses a single coordinator to write the lock file, so writes are
serialized even when tasks execute in parallel.

## Portability

There are two portability concerns:

1. **IDs**: Lockfile IDs must be project‑relative and stable across machines.
1. **State values**: `state` is opaque; portability depends on each node’s `state()`
implementation. Content hashes are portable; timestamps are not.

## Maintenance

Use `pytask build --clean-lockfile` to rewrite `pytask.lock` with only currently
collected tasks. The rewrite happens after a successful build and recomputes current
state values without executing tasks again.

## File Format Reference

### Top-Level

| Field | Required | Description |
| -------------- | -------- | -------------------------------- |
| `lock-version` | Yes | Schema version (currently `"1"`) |

### Task Entry

| Field | Required | Description |
| ------------ | -------- | ----------------------------- |
| `id` | Yes | Portable task identifier |
| `state` | Yes | Opaque state string |
| `depends_on` | No | Mapping from node id to state |
| `produces` | No | Mapping from node id to state |

### Dependency/Product Entry

Node entries are stored as key-value pairs inside `depends_on` and `produces`, where the
key is the node id and the value is the node state string.

## Version Compatibility

Only lock-version `"1"` is supported. Older or newer versions error with a clear upgrade
message.

## Implementation Notes

- The lockfile is encoded/decoded with `msgspec`’s TOML support.
- Writes are atomic: pytask writes a temporary file and replaces `pytask.lock`.
2 changes: 1 addition & 1 deletion docs/source/tutorials/making_tasks_persist.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ In this case, you can apply the {func}`@pytask.mark.persist <pytask.mark.persist
decorator to the task, which will skip its execution as long as all products exist.

Internally, the state of the dependencies, the source file, and the products are updated
in the database such that the subsequent execution will skip the task successfully.
in the lockfile such that the subsequent execution will skip the task successfully.

## When is this useful?

Expand Down
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ dependencies = [
"pluggy>=1.3.0",
"rich>=13.8.0",
"sqlalchemy>=2.0.31",
"msgspec[toml]>=0.18.6",
'tomli>=1; python_version < "3.11"',
'typing-extensions>=4.8.0; python_version < "3.11"',
"universal-pathlib>=0.2.2",
Expand Down
10 changes: 10 additions & 0 deletions src/_pytask/build.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,7 @@ def build( # noqa: C901, PLR0912, PLR0913, PLR0915
debug_pytask: bool = False,
disable_warnings: bool = False,
dry_run: bool = False,
clean_lockfile: bool = False,
editor_url_scheme: Literal["no_link", "file", "vscode", "pycharm"] # noqa: PYI051
| str = "file",
explain: bool = False,
Expand Down Expand Up @@ -124,6 +125,8 @@ def build( # noqa: C901, PLR0912, PLR0913, PLR0915
Whether warnings should be disabled and not displayed.
dry_run
Whether a dry-run should be performed that shows which tasks need to be rerun.
clean_lockfile
Whether the lockfile should be rewritten to only include collected tasks.
editor_url_scheme
An url scheme that allows to click on task names, node names and filenames and
jump right into you preferred editor to the right line.
Expand Down Expand Up @@ -192,6 +195,7 @@ def build( # noqa: C901, PLR0912, PLR0913, PLR0915
"debug_pytask": debug_pytask,
"disable_warnings": disable_warnings,
"dry_run": dry_run,
"clean_lockfile": clean_lockfile,
"editor_url_scheme": editor_url_scheme,
"explain": explain,
"expression": expression,
Expand Down Expand Up @@ -341,6 +345,12 @@ def build( # noqa: C901, PLR0912, PLR0913, PLR0915
default=False,
help="Execute a task even if it succeeded successfully before.",
)
@click.option(
"--clean-lockfile",
is_flag=True,
default=False,
help="Rewrite the lockfile with only currently collected tasks.",
)
@click.option(
"--explain",
is_flag=True,
Expand Down
20 changes: 18 additions & 2 deletions src/_pytask/console.py
Original file line number Diff line number Diff line change
Expand Up @@ -111,10 +111,26 @@ def render_to_string(
example, render warnings with colors or text in exceptions.

"""
buffer = console.render(renderable)
render_console = console
if not strip_styles and console.no_color and console.color_system is not None:
theme: Theme | None
try:
theme = Theme(console._theme_stack._entries[-1])
except (AttributeError, IndexError, TypeError):
theme = None
render_console = Console(
color_system=console.color_system, # type: ignore[invalid-argument-type]
force_terminal=True,
width=console.width,
no_color=False,
markup=getattr(console, "_markup", True),
theme=theme,
)

buffer = render_console.render(renderable)
if strip_styles:
buffer = Segment.strip_styles(buffer)
return console._render_buffer(buffer)
return render_console._render_buffer(buffer)


def format_task_name(task: PTask, editor_url_scheme: str) -> Text:
Expand Down
16 changes: 10 additions & 6 deletions src/_pytask/execute.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,6 @@
from _pytask.dag_utils import TopologicalSorter
from _pytask.dag_utils import descending_tasks
from _pytask.dag_utils import node_and_neighbors
from _pytask.database_utils import get_node_change_info
from _pytask.database_utils import has_node_changed
from _pytask.database_utils import update_states_in_database
from _pytask.exceptions import ExecutionError
from _pytask.exceptions import NodeLoadError
from _pytask.exceptions import NodeNotFoundError
Expand All @@ -46,6 +43,9 @@
from _pytask.pluginmanager import hookimpl
from _pytask.provisional_utils import collect_provisional_products
from _pytask.reports import ExecutionReport
from _pytask.state import get_node_change_info
from _pytask.state import has_node_changed
from _pytask.state import update_states
from _pytask.traceback import remove_traceback_from_exc_info
from _pytask.tree_util import tree_leaves
from _pytask.tree_util import tree_map
Expand Down Expand Up @@ -196,7 +196,7 @@ def pytask_execute_task_setup(session: Session, task: PTask) -> None: # noqa: C
# Check if node changed and collect detailed info if in explain mode
if session.config["explain"]:
has_changed, reason, details = get_node_change_info(
task=task, node=node, state=node_state
session=session, task=task, node=node, state=node_state
)
if has_changed:
needs_to_be_executed = True
Expand All @@ -222,7 +222,9 @@ def pytask_execute_task_setup(session: Session, task: PTask) -> None: # noqa: C
)
)
else:
has_changed = has_node_changed(task=task, node=node, state=node_state)
has_changed = has_node_changed(
session=session, task=task, node=node, state=node_state
)
if has_changed:
needs_to_be_executed = True

Expand All @@ -232,6 +234,8 @@ def pytask_execute_task_setup(session: Session, task: PTask) -> None: # noqa: C

if not needs_to_be_executed:
collect_provisional_products(session, task)
if not session.config["dry_run"] and not session.config["explain"]:
update_states(session, task)
raise SkippedUnchanged

# Create directory for product if it does not exist. Maybe this should be a `setup`
Expand Down Expand Up @@ -326,7 +330,7 @@ def pytask_execute_task_process_report(
task = report.task

if report.outcome == TaskOutcome.SUCCESS:
update_states_in_database(session, task.signature)
update_states(session, task)
elif report.exc_info and isinstance(report.exc_info[1], WouldBeExecuted):
report.outcome = TaskOutcome.WOULD_BE_EXECUTED

Expand Down
Loading