fix(core): Ensure tasks timeout even if they don't receive settings #12431

tomi · 2025-01-02T12:45:54Z

Summary

If the n8n instance happens to crash on a specific time, task runner task
might not receive a response to e.g. settings or data request. In cases like
this the task runner was left hanging forever. This PR makes sure the tasks
get aborted correctly.

Also refactors the task execution lifecycle to be explicit which states the task
might have and how different events are handled in different states.

Related Linear tickets, Github issues, and Community forum posts

https://linear.app/n8n/issue/CAT-459/community-issue-code-node-stopped-working

https://community.n8n.io/t/code-nodes-stopped-working/67141

fixes #12354

Review / Merge checklist

PR title and summary are descriptive. (conventions)
Docs updated or follow-up ticket created.
Tests included.
PR Labeled with release/backport (if the PR is an urgent fix that needs to be backported)

codecov · 2025-01-02T13:13:56Z

Codecov Report

Attention: Patch coverage is 77.00000% with 23 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
packages/@n8n/task-runner/src/task-runner.ts	76.05%	17 Missing ⚠️
packages/@n8n/task-runner/src/task-state.ts	72.22%	5 Missing ⚠️
...k-runner/src/js-task-runner/__tests__/test-data.ts	83.33%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

If the n8n instance happens to crash on a specific time, task runner task might not receive a response to e.g. settings or data request. In cases like this the task runner was left hanging forever. This PR makes sure the tasks get aborted correctly. Also refactors the task execution lifecycle to be explicit which states the task might have and how different events are handled in different states.

ivov

We were slowly moving to a state machine with so many booleans :)

ivov · 2025-01-02T14:40:52Z

packages/@n8n/task-runner/src/task-runner.ts

+		await taskState.caseOf({
+			// If the cancelled task hasn't received settings yet, we can finish it
+			waitingForSettings: () => this.finishTask(taskState),

-		for (const [requestId, request] of this.nodeTypesRequests.entries()) {
-			if (request.taskId === taskId) {
-				request.reject(new TaskCancelledError(reason));
-				this.nodeTypesRequests.delete(requestId);
-			}
-		}
+			// If the task has already timed out or is already cancelled, we can
+			// ignore the cancellation
+			'aborting:timeout': noOp,
+			'aborting:cancelled': noOp,

-		const controller = this.taskCancellations.get(taskId);
-		if (controller) {
-			controller.abort();
-			this.taskCancellations.delete(taskId);
+			running: () => {
+				taskState.status = 'aborting:cancelled';
+				taskState.abortController.abort('cancelled');
+				this.cancelTaskRequests(taskId, reason);
+			},
+		});
+	}


If the broker cancels a task and the task was waiting for settings, then we clean up in finishTask without cancelling the task requests, but if the broker cancels a task and the task was already running, then we cancel the task requests without cleanup.

Why is this? I'd expect we'd do cleanup and cancel task requests in both these transitions.

That is because when the task is in:

waitingForSettings state : it hasn't been executed yet, so it can't have any in-flight requests

running : The task is currently executing, so we have to wait until the control comes back from the task (i.e. the task execution promise resolves/rejects). We don't want to release a new "slot" for next task until that. This happens concurrently (but not in parallel!). Hopefully this image clarifies it:

Thank you, I misremembered and thought we requested node types as preparation rather than as part of the task.

packages/@n8n/task-runner/src/task-runner.ts

packages/@n8n/task-runner/src/task-state.ts

ivov · 2025-01-02T14:55:37Z

packages/@n8n/task-runner/src/task-state.ts

+
+	constructor(opts: TaskStateOpts) {
+		this.taskId = opts.taskId;
+		this.timeoutTimer = setTimeout(opts.onTimeout, opts.timeoutInS * 1000);


In master we currently give task execution its full time budget of taskTimeout, but now we're allowing the broker to consume that budget. If a main is overloaded and the broker is slow, can this cause cascading failures (timeouts) of tasks because they're all receiving too little time to actually execute?

That is correct. This was the bug here. If we never receive the task settings, it would never timeout and wait forever. The other option would be to add a separate timeout for this case, but that would add more complexity. It shouldn't take seconds to receive the settings from the main, so it should be fine to eat that from the timeout budget. You can always increase the timeout if that is a concern

packages/@n8n/task-runner/src/task-state.ts

packages/@n8n/task-runner/src/task-runner.ts

ivov · 2025-01-02T15:14:07Z

packages/@n8n/task-runner/src/task-state.ts

+ *
+ * The class only holds the state, and does not have any logic.
+ *
+ * The task has the following lifecycle:


Thanks for diagramming this!

tomi · 2025-01-02T16:03:15Z

Thank you for the comments @ivov 🙇 Addressed them

ivov

Tested manually and working well, thanks for the fix!

github-actions · 2025-01-03T08:52:19Z

⚠️ Some Cypress E2E specs are failing, please fix them before merging

cypress · 2025-01-03T08:52:30Z

n8n Run #8552

Run Properties: Passed #8552 • 773ad6faee: 🌳 🖥️ browsers:node18.12.0-chrome107 🤖 tomi 🗃️ e2e/*

Project	`n8n`
Branch Review	`cat-459-timeout-task-when-no-settings-are-received`
Run status	`Passed #8552`
Run duration	`04m 39s`
Commit	`773ad6faee: 🌳 🖥️ browsers:node18.12.0-chrome107 🤖 tomi 🗃️ e2e/*`
Committer	`Tomi Turtiainen`
View all properties for this run ↗︎

Test results
Failures	`0`
Flaky	`3`
Pending	`0`
Skipped	`0`
Passing	`484`
View all changes introduced in this branch ↗︎

…received

github-actions · 2025-01-03T09:37:01Z

✅ All Cypress E2E specs passed

tomi · 2025-01-03T09:53:24Z

@ivov had to merge master to fix an e2e test, could you reapprove 🙏

github-actions · 2025-01-03T10:06:25Z

✅ All Cypress E2E specs passed

janober · 2025-01-09T17:06:53Z

Got released with [email protected]

janober · 2025-01-09T17:09:02Z

Got released with [email protected]

janober · 2025-01-09T17:11:44Z

Got released with [email protected]

janober · 2025-01-09T17:27:24Z

Got released with [email protected]

tomi force-pushed the cat-459-timeout-task-when-no-settings-are-received branch from 527a146 to efe3980 Compare January 2, 2025 12:49

tomi added the release/backport Changes that need to be backported to older releases. label Jan 2, 2025

n8n-assistant bot added the n8n team Authored by the n8n team label Jan 2, 2025

tomi force-pushed the cat-459-timeout-task-when-no-settings-are-received branch from efe3980 to 8d5fbe8 Compare January 2, 2025 14:11

ivov reviewed Jan 2, 2025

View reviewed changes

Review fixes

eaafc46

tomi requested a review from ivov January 2, 2025 16:02

ivov approved these changes Jan 3, 2025

View reviewed changes

Merge branch 'master' into cat-459-timeout-task-when-no-settings-are-…

773ad6f

…received

tomi requested a review from ivov January 3, 2025 09:53

ivov approved these changes Jan 3, 2025

View reviewed changes

tomi merged commit b194026 into master Jan 3, 2025
37 checks passed

tomi deleted the cat-459-timeout-task-when-no-settings-are-received branch January 3, 2025 10:27

github-actions bot mentioned this pull request Jan 8, 2025

🚀 Release 1.74.0 #12508

Merged

janober added the Released label Jan 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(core): Ensure tasks timeout even if they don't receive settings #12431

fix(core): Ensure tasks timeout even if they don't receive settings #12431

tomi commented Jan 2, 2025 •

edited

Loading

codecov bot commented Jan 2, 2025 •

edited

Loading

ivov left a comment

ivov Jan 2, 2025

tomi Jan 2, 2025 •

edited

Loading

ivov Jan 3, 2025

ivov Jan 2, 2025

tomi Jan 2, 2025

ivov Jan 2, 2025

tomi commented Jan 2, 2025

ivov left a comment

github-actions bot commented Jan 3, 2025

cypress bot commented Jan 3, 2025 •

edited

Loading

github-actions bot commented Jan 3, 2025

tomi commented Jan 3, 2025

github-actions bot commented Jan 3, 2025

janober commented Jan 9, 2025

janober commented Jan 9, 2025

janober commented Jan 9, 2025

janober commented Jan 9, 2025

fix(core): Ensure tasks timeout even if they don't receive settings #12431

fix(core): Ensure tasks timeout even if they don't receive settings #12431

Conversation

tomi commented Jan 2, 2025 • edited Loading

Summary

Related Linear tickets, Github issues, and Community forum posts

Review / Merge checklist

codecov bot commented Jan 2, 2025 • edited Loading

Codecov Report

ivov left a comment

Choose a reason for hiding this comment

ivov Jan 2, 2025

Choose a reason for hiding this comment

tomi Jan 2, 2025 • edited Loading

Choose a reason for hiding this comment

ivov Jan 3, 2025

Choose a reason for hiding this comment

ivov Jan 2, 2025

Choose a reason for hiding this comment

tomi Jan 2, 2025

Choose a reason for hiding this comment

ivov Jan 2, 2025

Choose a reason for hiding this comment

tomi commented Jan 2, 2025

ivov left a comment

Choose a reason for hiding this comment

github-actions bot commented Jan 3, 2025

cypress bot commented Jan 3, 2025 • edited Loading

n8n Run #8552

github-actions bot commented Jan 3, 2025

tomi commented Jan 3, 2025

github-actions bot commented Jan 3, 2025

janober commented Jan 9, 2025

janober commented Jan 9, 2025

janober commented Jan 9, 2025

janober commented Jan 9, 2025

tomi commented Jan 2, 2025 •

edited

Loading

codecov bot commented Jan 2, 2025 •

edited

Loading

tomi Jan 2, 2025 •

edited

Loading

cypress bot commented Jan 3, 2025 •

edited

Loading