Skip to content

[One Workflow] Task manager timeout conflicts with workflow timeout#240950

Merged
skynetigor merged 13 commits intoelastic:mainfrom
skynetigor:14367-Task-Manager-timeout-conflicts-with-workflow-timeout____
Oct 29, 2025
Merged

[One Workflow] Task manager timeout conflicts with workflow timeout#240950
skynetigor merged 13 commits intoelastic:mainfrom
skynetigor:14367-Task-Manager-timeout-conflicts-with-workflow-timeout____

Conversation

@skynetigor
Copy link
Contributor

Summary

Closes: https://github.com/elastic/security-team/issues/14367

Checklist

Check the PR satisfies following conditions.

Reviewers should verify this PR satisfies this list as well.

  • Any text added follows EUI's writing guidelines, uses sentence case text and includes i18n support
  • Documentation was added for features that require explanation or tutorials
  • Unit or functional tests were updated or added to match the most common scenarios
  • If a plugin configuration key changed, check if it needs to be allowlisted in the cloud and added to the docker list
  • This was checked for breaking HTTP API changes, and any breaking changes have been approved by the breaking-change committee. The release_note:breaking label should be applied in these situations.
  • Flaky Test Runner was used on any tests changed
  • The PR description includes the appropriate Release Notes section, and the correct release_note:* label is applied per the guidelines
  • Review the backport guidelines and apply applicable backport:* labels.

Identify risks

Does this PR introduce any risks? For example, consider risks like hard to test bugs, performance regression, potential of data loss.

Describe the risk, its severity, and mitigation for each identified risk. Invite stakeholders and evaluate how to proceed before merging.

@skynetigor skynetigor added release_note:skip Skip the PR/issue when compiling release notes backport:skip This PR does not require backporting Team:One Workflow Team label for One Workflow (Workflow automation) labels Oct 28, 2025
@skynetigor skynetigor marked this pull request as ready for review October 28, 2025 09:34
@skynetigor skynetigor requested a review from a team as a code owner October 28, 2025 09:34
@Kiryous Kiryous requested a review from Copilot October 28, 2025 10:09
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR addresses a conflict between the Task Manager timeout and the Workflow timeout by increasing the Task Manager timeout from 5 minutes to 1 day across three task types (workflow:scheduled, workflow:run, and workflow:resume). The PR also introduces a default workflow-level timeout of 10 minutes to be applied when no explicit timeout is configured in the workflow definition.

Key Changes:

  • Increased Task Manager timeout from 5 minutes to 1 day for workflow-related tasks
  • Added default workflow settings with a 10-minute timeout
  • Updated workflow graph construction to accept and use default settings

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/platform/plugins/shared/workflows_management/server/plugin.ts Increased Task Manager timeout for scheduled workflow execution from 5m to 1d
src/platform/plugins/shared/workflows_execution_engine/server/plugin.ts Increased Task Manager timeout for run and resume workflow tasks from 5m to 1d
src/platform/plugins/shared/workflows_execution_engine/server/execution_functions/setup_dependencies.ts Added default workflow settings with 10m timeout and passed to workflow graph construction
src/platform/plugins/shared/workflows_execution_engine/integration_tests/tests/wait_step.test.ts Updated test expectations to account for additional step execution from workflow-level timeout
src/platform/packages/shared/kbn-workflows/graph/workflow_graph/workflow_graph.ts Modified fromWorkflowDefinition to accept optional default settings parameter
src/platform/packages/shared/kbn-workflows/graph/workflow_graph/tests/has_step.test.ts Added type assertion to fix type compatibility
src/platform/packages/shared/kbn-workflows/graph/workflow_graph/tests/get_node_stack.test.ts Added type assertions to fix type compatibility
src/platform/packages/shared/kbn-workflows/graph/build_execution_graph/tests/timeout_zone_graph.test.ts Added tests for default timeout behavior and explicit timeout precedence
src/platform/packages/shared/kbn-workflows/graph/build_execution_graph/build_execution_graph.ts Updated to accept default settings and merge with workflow-defined timeout

Copy link
Contributor

@rosomri rosomri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, just one minor comment, I also like Copilot’s comments.

import { WorkflowTaskManager } from '../workflow_task_manager/workflow_task_manager';

const defaultWorkflowSettings: WorkflowSettings = {
timeout: '10m',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d suggest setting the default timeout to 1d

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@talboren @shahargl
What do you think? Should we align default workflow timeout with task manager task timeout?
I used 10m value because we spoke about in that thread - https://elastic.slack.com/archives/C08U04SUN49/p1761314394268429

Copy link
Contributor

@rosomri rosomri Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn’t aware of the Task Manager’s default timeout. I’d assume that by default users would prefer their workflows not to fail due to a timeout. imo, it should be an opt-in behavior.
Anyway, it’s a tactical decision, so I’m fine with either approach.

Comment on lines +294 to +296
expect(WorkflowGraph.fromWorkflowDefinition).toHaveBeenCalledWith(expect.anything(), {
timeout: '6h',
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, thanks for adding tests, but I’m not sure these here are adding much value, they mainly check that parameters are passed to the setup function, rather than testing any actual logic.
The tests in timeout_zone_graph.test.ts, which indirectly cover resolveWorkflowSettings (the logic added in this PR), might be enough.

Also, a small nitpick: I noticed the defaultSettings parameter is a static constant object. Since it’s always the same value, maybe it could be defined inside the setup (or even in the resolveWorklfowSettings function) or imported from a shared location, instead of having to drill an extra parameter to always pass the same constant value. That might simplify things a bit.

Copy link
Contributor Author

@skynetigor skynetigor Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it might not check the functionality, but it checks that the correct value is used (6h) and if suddenly someone changes it - the test will catch it.

I wanted building graph not to be dependent on internal workflow execution logic or default settings.
And also, wanted defaultWorkflowSettings to look like a configuration constant. This is why I put it on top.
Although it's always the same value, don't think it's a problem.
Hope it makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future we might want to have a shared place with some default values, but I don't see it at the moment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point about wanting to catch accidental changes 👍

The only thing I’d be careful about is that when we start testing specific config values (like 6h), the tests themselves become the “source of truth” for that configuration. In other words, the test ends up asserting what the configuration is, instead of verifying that the logic behaves correctly given the configuration.

If that constant is meant to be a declarative setting or a single source of truth (like a config file or top-level constant), then it’s usually better to treat it as data, not something we test against directly. Otherwise, every time we intentionally update the config, we’ll also have to update the test, which doesn’t add safety, just maintenance overhead.

So in this case, I’d lean toward trusting the constant itself as the source of truth, and testing only the behavior that depends on it, which I think the existing timeout_zone_graph.test.ts already does indirectly.

Copy link
Contributor

@semd semd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
Added some minor comments, but the implementation looks fine 💯

@elasticmachine
Copy link
Contributor

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

  • [job] [logs] Scout Test Run Builder / serverless-security - EUI testing wrapper: EuiDataGrid - data grid, run

Metrics [docs]

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
workflowsManagement 2.1MB 2.1MB +197.0B

History

@skynetigor skynetigor merged commit 1ce5c0f into elastic:main Oct 29, 2025
12 checks passed
tkajtoch pushed a commit to tkajtoch/kibana that referenced this pull request Oct 29, 2025
…lastic#240950)

## Summary

Closes: elastic/security-team#14367

### Checklist

Check the PR satisfies following conditions. 

Reviewers should verify this PR satisfies this list as well.

- [ ] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md)
- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [ ] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [ ] If a plugin configuration key changed, check if it needs to be
allowlisted in the cloud and added to the [docker
list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)
- [ ] This was checked for breaking HTTP API changes, and any breaking
changes have been approved by the breaking-change committee. The
`release_note:breaking` label should be applied in these situations.
- [ ] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed
- [ ] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
- [ ] Review the [backport
guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing)
and apply applicable `backport:*` labels.

### Identify risks

Does this PR introduce any risks? For example, consider risks like hard
to test bugs, performance regression, potential of data loss.

Describe the risk, its severity, and mitigation for each identified
risk. Invite stakeholders and evaluate how to proceed before merging.

- [ ] [See some risk
examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx)
- [ ] ...
qn895 pushed a commit to qn895/kibana that referenced this pull request Oct 30, 2025
…lastic#240950)

## Summary

Closes: elastic/security-team#14367

### Checklist

Check the PR satisfies following conditions. 

Reviewers should verify this PR satisfies this list as well.

- [ ] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md)
- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [ ] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [ ] If a plugin configuration key changed, check if it needs to be
allowlisted in the cloud and added to the [docker
list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)
- [ ] This was checked for breaking HTTP API changes, and any breaking
changes have been approved by the breaking-change committee. The
`release_note:breaking` label should be applied in these situations.
- [ ] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed
- [ ] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
- [ ] Review the [backport
guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing)
and apply applicable `backport:*` labels.

### Identify risks

Does this PR introduce any risks? For example, consider risks like hard
to test bugs, performance regression, potential of data loss.

Describe the risk, its severity, and mitigation for each identified
risk. Invite stakeholders and evaluate how to proceed before merging.

- [ ] [See some risk
examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx)
- [ ] ...
ana-davydova pushed a commit to ana-davydova/kibana that referenced this pull request Nov 3, 2025
…lastic#240950)

## Summary

Closes: elastic/security-team#14367

### Checklist

Check the PR satisfies following conditions. 

Reviewers should verify this PR satisfies this list as well.

- [ ] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md)
- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [ ] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [ ] If a plugin configuration key changed, check if it needs to be
allowlisted in the cloud and added to the [docker
list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)
- [ ] This was checked for breaking HTTP API changes, and any breaking
changes have been approved by the breaking-change committee. The
`release_note:breaking` label should be applied in these situations.
- [ ] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed
- [ ] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
- [ ] Review the [backport
guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing)
and apply applicable `backport:*` labels.

### Identify risks

Does this PR introduce any risks? For example, consider risks like hard
to test bugs, performance regression, potential of data loss.

Describe the risk, its severity, and mitigation for each identified
risk. Invite stakeholders and evaluate how to proceed before merging.

- [ ] [See some risk
examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx)
- [ ] ...
albertoblaz pushed a commit to albertoblaz/kibana that referenced this pull request Nov 4, 2025
…lastic#240950)

## Summary

Closes: elastic/security-team#14367

### Checklist

Check the PR satisfies following conditions. 

Reviewers should verify this PR satisfies this list as well.

- [ ] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md)
- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [ ] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [ ] If a plugin configuration key changed, check if it needs to be
allowlisted in the cloud and added to the [docker
list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)
- [ ] This was checked for breaking HTTP API changes, and any breaking
changes have been approved by the breaking-change committee. The
`release_note:breaking` label should be applied in these situations.
- [ ] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed
- [ ] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
- [ ] Review the [backport
guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing)
and apply applicable `backport:*` labels.

### Identify risks

Does this PR introduce any risks? For example, consider risks like hard
to test bugs, performance regression, potential of data loss.

Describe the risk, its severity, and mitigation for each identified
risk. Invite stakeholders and evaluate how to proceed before merging.

- [ ] [See some risk
examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx)
- [ ] ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport:skip This PR does not require backporting release_note:skip Skip the PR/issue when compiling release notes Team:One Workflow Team label for One Workflow (Workflow automation) v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants