add non-retryable errors, and shutdown helpers #33

filintod · 2025-11-08T17:41:11Z

This is a split from asyncio PR #13 . Removing changes not related to asyncio changes

filintod · 2025-11-09T03:37:01Z

durabletask/client.py

        self._stub = stubs.TaskHubSidecarServiceStub(channel)
        self._logger = shared.get_logger("client", log_handler, log_formatter)

+    def __enter__(self):


add context manager option for clean closing

filintod · 2025-11-09T03:42:44Z

durabletask/client.py

+            # gRPC timeout mapping (pytest unit tests may pass None explicitly)
+            grpc_timeout = None if (timeout is None or timeout == 0) else timeout
+
+            # If timeout is None or 0, skip pre-checks/polling and call server-side wait directly


improves resource consumption on server side that might also lag behind client side

mypy.ini

filintod · 2025-11-09T03:46:45Z

durabletask/task.py

    pass


+class NonRetryableError(Exception):


this is a new helper, that is present in Temporal but not us, where we can defined errors that are non-retryable so activities don't attempt to retry when raised

filintod · 2025-11-09T03:48:58Z

durabletask/task.py

                    next_delay_f, self._retry_policy.max_retry_interval.total_seconds()
                )
-                return timedelta(seconds=next_delay_f)
+            return timedelta(seconds=next_delay_f)


this fixes a bug with retry, as the login in line 400 above f datetime.utcnow() < retry_expiration: means that we should retry, but as this was badly indented if for some reason max_retry_interval is not none this was not working.

this is also kind of mentioned in one of the gotchas in dapr/python-sdk#836, I found this bug beforehand, the other gotchas are gotchas or not-explained behavior

added some info in README to cover the gotchas, but we might need to add to python-sdk

filintod · 2025-11-09T04:44:42Z

examples/components/statestore.yaml

@@ -0,0 +1,16 @@
+apiVersion: dapr.io/v1alpha1


needed for e2e tests with dapr that should substitute durabletask-go tests with dapr setup

filintod · 2025-11-10T14:32:25Z

@acroca ptal

durabletask/internal/shared.py

tox.ini

acroca · 2025-11-11T10:01:00Z

durabletask/client.py

+                res: pb.GetInstanceResponse = self._stub.WaitForInstanceCompletion(
+                    req, timeout=grpc_timeout
+                )


I don't understand this. grpc_timeout is set to None in both 0 and None cases but if I understand correctly, when timeout is None we wait forever, but timeout 0 won't wait at all, right?

well the current behavior has not change a timeout of 0 still means wait forever. that kind of make sense, why would you call this function to not wait

durabletask/client.py

acroca · 2025-11-11T10:37:02Z

durabletask/client.py

+                if current_state and current_state.runtime_status in [
+                    OrchestrationStatus.COMPLETED,
+                    OrchestrationStatus.FAILED,
+                    OrchestrationStatus.TERMINATED,
+                ]:


From https://github.com/dapr/durabletask-go/blob/7f28b2408db77ed48b1b03ecc71624fc456ccca3/api/orchestration.go#L196-L201, CANCELLED is also a condition for a workflow to be considered in a terminal state.
But what's the reason for this check? Why not just call the WaitForInstanceCompletion? You are still sending a call to the runtime to get the current state.

it is maybe a premature optimization, but on python for things that are closed quickly use polls without taking server longer running streaming https://grpc.io/docs/guides/performance/#python

durabletask/task.py

acroca · 2025-11-11T10:42:13Z

durabletask/task.py

+                if isinstance(t, str):
+                    if t:


can't we check it all at once?

Suggested change

if isinstance(t, str):

if t:

if isinstance(t, str) and len(t)>0:

acroca · 2025-11-11T10:45:02Z

durabletask/worker.py

        self._channel_options = channel_options
+        self._stop_timeout = stop_timeout
+        # Track in-flight activity executions for graceful draining
+        import threading as _threading


Move this import to the top of the file 🙏

acroca · 2025-11-11T10:49:56Z

durabletask/worker.py

                current_reader_thread.start()
                loop = asyncio.get_running_loop()
                while not self._shutdown.is_set():
-                    try:


I don't see why this try was removed. If I understand correctly, the exceptions that were captured here will now be captured outside of the while, right? Why is this preferred now?

mainly reduce extra logging/indentation self._logger.warning(f"Error in work item stream: {e}"). I could put it back, I think it was just bothering me with the extra duplicated messages that were not helping me

there's not really much before this point and the other try, maybe GetWorkItems

acroca · 2025-11-11T10:50:43Z

durabletask/worker.py

+        """
+        end: Optional[float] = None
+        if timeout is not None:
+            import time as _t


Move all the imports to the top please

filintod · 2025-11-11T14:38:51Z

@acroca ptal

…noisy logs Signed-off-by: Filinto Duran <[email protected]>

Signed-off-by: Filinto Duran <[email protected]>

acroca · 2025-11-11T17:00:33Z

tests/durabletask/test_client.py

+        mock_channel.close.side_effect = Exception("close failed")
+        mock_get_channel.return_value = mock_channel
+
+        from durabletask import client


tests/durabletask/test_client.py

acroca · 2025-11-12T15:37:40Z

durabletask/worker.py

+        except grpc.RpcError as rpc_error:  # type: ignore
+            # Treat common shutdown/termination races as benign to avoid noisy logs
+            code = rpc_error.code()  # type: ignore
+            details = str(rpc_error)
+            benign = code in {
+                grpc.StatusCode.CANCELLED,
+                grpc.StatusCode.UNAVAILABLE,
+                grpc.StatusCode.UNKNOWN,
+            } and (
+                "unknown instance ID/task ID combo" in details
+                or "Channel closed" in details
+                or "Locally cancelled by application" in details
+            )
+            if self._shutdown.is_set() or benign:
+                self._logger.debug(
+                    f"Ignoring activity completion delivery error during shutdown/benign condition: {rpc_error}"
+                )
+            else:
+                self._logger.exception(
+                    f"Failed to deliver activity response for '{req.name}#{req.taskId}' of orchestration ID '{instance_id}' to sidecar: {rpc_error}"
+                )


Can we combine this logic with the other one that looks very similar?

acroca · 2025-11-12T15:38:02Z

durabletask/worker.py


        self._async_worker_manager = _AsyncWorkerManager(self._concurrency_options)
+        # Readiness flag set once the worker has an active stream to the sidecar
+        self._ready = Event()


Is this _ready necessary?

tests/durabletask/test_orchestration_e2e.py

acroca · 2025-11-12T15:45:21Z

tests/durabletask/test_orchestration_executor.py

+    actions = result.actions
+    complete_action = get_and_validate_single_complete_orchestration_action(actions)
+    assert complete_action.orchestrationStatus == pb.ORCHESTRATION_STATUS_FAILED
+    assert complete_action.failureDetails.errorMessage.__contains__("Activity task #1 failed: boom")


It'd be good to test the activity have been called exactly once, to make sure is not retrying

actually, this test only check the event processing pipeline and making sure that when you send a nonretryable error in the event loop to be process it fails the workflow. There is a test after that checks that when a retryable error is raised a new timer is created (line 1526 test_activity_generic_exception_is_retryable)

Signed-off-by: Filinto Duran <[email protected]>

filintod · 2025-11-13T15:59:23Z

@acroca ptal

Signed-off-by: Filinto Duran <[email protected]>

filintod requested a review from a team as a code owner November 8, 2025 17:41

filintod force-pushed the filinto/asyncio-p1 branch from f6f6a40 to de68786 Compare November 9, 2025 02:25

filintod commented Nov 9, 2025

View reviewed changes

mypy.ini Outdated Show resolved Hide resolved

filintod commented Nov 9, 2025

View reviewed changes

filintod force-pushed the filinto/asyncio-p1 branch 2 times, most recently from df310f8 to d9ed06e Compare November 10, 2025 13:14

filintod mentioned this pull request Nov 10, 2025

[WORKFLOW SDK FEATURE REQUEST] Address many gotchas in workflow activities RetryPolicy dapr/python-sdk#836

Open

filintod force-pushed the filinto/asyncio-p1 branch from d9ed06e to 7321905 Compare November 10, 2025 17:25

filintod changed the title ~~add new deterministic functions, non-retryable errors, and shutdown h…~~ add non-retryable errors, and shutdown helpers Nov 10, 2025

filintod mentioned this pull request Nov 10, 2025

add deterministic functions #34

Closed

acroca reviewed Nov 11, 2025

View reviewed changes

filintod requested a review from acroca November 11, 2025 14:34

add non-retryable, fix retry bug, and add shutdown helpers to reduce …

c513506

…noisy logs Signed-off-by: Filinto Duran <[email protected]>

filintod force-pushed the filinto/asyncio-p1 branch from 763ef39 to c513506 Compare November 12, 2025 04:47

lint

c01d8b3

Signed-off-by: Filinto Duran <[email protected]>

acroca reviewed Nov 12, 2025

View reviewed changes

feedback

1c194d0

Signed-off-by: Filinto Duran <[email protected]>

feedback, missing canceled status

6697ef3

Signed-off-by: Filinto Duran <[email protected]>

filintod requested a review from acroca November 13, 2025 16:59

acroca merged commit 9e8b34b into dapr:main Nov 14, 2025
7 checks passed

	if isinstance(t, str):
	if t:
	if isinstance(t, str) and len(t)>0:

add non-retryable errors, and shutdown helpers #33

add non-retryable errors, and shutdown helpers #33

Uh oh!

Conversation

filintod commented Nov 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

filintod Nov 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

filintod Nov 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

filintod commented Nov 10, 2025

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

filintod Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

filintod commented Nov 11, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

filintod commented Nov 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

filintod commented Nov 8, 2025 •

edited

Loading

filintod Nov 9, 2025 •

edited

Loading

filintod Nov 9, 2025 •

edited

Loading

filintod Nov 11, 2025 •

edited

Loading