Skip to content

🌊 [Feature identification] Run as background task#245728

Merged
miltonhultgren merged 33 commits intoelastic:mainfrom
miltonhultgren:streams-sigevents-feature-identification-background-task
Jan 7, 2026
Merged

🌊 [Feature identification] Run as background task#245728
miltonhultgren merged 33 commits intoelastic:mainfrom
miltonhultgren:streams-sigevents-feature-identification-background-task

Conversation

@miltonhultgren
Copy link
Contributor

@miltonhultgren miltonhultgren commented Dec 9, 2025

Summary

This PR:

  • Adds a task called streams_feature_identification via the newly added task service which calls the existing identifyFeatures function and stores the result on the task document
  • Updates the POST /internal/streams/{name}/features/_identify route to schedule this task and check for the results
  • Adds the FeatureIdentificationControl component which manages all of the API interaction around Feature identification
  • Moves related telemetry reporting to the server
  • Adds a way to type and store parameters on the task document
  • Adds a way to cancel tasks (wrap your run function in cancellableTask)
  • Adds another task state (acknowledged) to mark that the user has taken action on the result of the task
  • Adds a hook to poll for task updates for in progress tasks (and tasks being cancelled)
Screen.Recording.2025-12-18.at.17.12.57.mov
Screen.Recording.2025-12-18.at.17.17.06.mov

Route changes and flags

The feature identification route now serves two roles:

  • Managing the task
  • Reporting the status of the task

The route accepts three flags: schedule, cancel and acknowledge that all have a side effect.
schedule tries to schedule the task with task manager (and is a no-op if the task is already running), this fails if the task is in the being_cancelled state.
cancel moves the task document to being_cancelled state so that cancellableTask can engage the abort controller to stop on going work.
acknowledge moves a complete task to the acknowledged state, indicating that the user has reviewed the results of this task and taken some follow up action, so it's safe to schedule this task again with losing results (this is not enforced)

The route reports the following statuses:
'not_started' | 'in_progress' | 'stale' | 'being_canceled' | 'canceled' | 'failed' | 'completed' | 'acknowledged'

Most of them are the state of the task, but stale is a special route status that indicates that no updates were made to the task document for a while.
The failed result includes an error message while completed and acknowledged include the payload found on task.task.payload.

Task document schema

Follow up to #245725

The stored documents have the following shape:

{
  id: string;
  type: string;
  status: TaskStatus;
  stream: string;
  space: string;
  created_at: string;
  task: {
    params: TaskParams;
    payload?: any // Only for completed and acknowledged tasks
    error?: string // Only for failed tasks
  };
}

All fields except task are indexed, and we store things under task to avoid indexing them because of #245974
The tasks are stored in .kibana_streams_tasks

@github-actions github-actions bot added the author:actionable-obs PRs authored by the actionable obs team label Dec 9, 2025
@miltonhultgren miltonhultgren changed the title Streams sigevents feature identification background task 🌊 [Feature identification] Run as background task Dec 9, 2025
@miltonhultgren miltonhultgren force-pushed the streams-sigevents-feature-identification-background-task branch 7 times, most recently from 12ae06d to e892f45 Compare December 16, 2025 14:28
@miltonhultgren miltonhultgren force-pushed the streams-sigevents-feature-identification-background-task branch from e892f45 to 4a12926 Compare December 16, 2025 15:28
@miltonhultgren miltonhultgren force-pushed the streams-sigevents-feature-identification-background-task branch from 4a12926 to 781fead Compare December 16, 2025 15:37
@miltonhultgren miltonhultgren marked this pull request as ready for review December 16, 2025 15:42
@miltonhultgren miltonhultgren requested review from a team as code owners December 16, 2025 15:42
@miltonhultgren miltonhultgren added release_note:skip Skip the PR/issue when compiling release notes backport:skip This PR does not require backporting Feature:SigEvents Significant events feature, related to streams and rules/alerts (RnA) labels Dec 16, 2025
@miltonhultgren miltonhultgren requested a review from a team as a code owner December 17, 2025 12:42
Copy link
Contributor

@mykolaharmash mykolaharmash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, I added a few comments for a discussion.

I assume this would be the next iteration of this feature, but just in case, I think we need some kind of bulk variation of all endpoints to read, schedule, and update tasks considering the UI we have in mind where user selects all streams they want to analyze at the same time.

Comment on lines +331 to +333
schedule: BooleanFromString.optional(),
cancel: BooleanFromString.optional(),
acknowledge: BooleanFromString.optional(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to not have it as a single parameter? This would let us use switch in the handler. Also, multiple boolean params sort of suggest that they can be used simultaneously within a single request which is not the case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah for sure i thought the same, it's not super polished but i think we can move ahead while doing a refactor.

Copy link
Contributor Author

@miltonhultgren miltonhultgren Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this was emergent code. I'm considering splitting this into two APIs, one for asking the current status (no side effects) and one for all side effects.

} & IdentifyFeaturesResult);

export const identifyFeaturesRoute = createServerRoute({
endpoint: 'POST /internal/streams/{name}/features/_identify',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a single endpoint for reading and updating the task feels odd tbh, any particular reason to do that instead of having a separate GET endpoint?

const pollInterval = 2000;

const intervalId = setInterval(async () => {
if (Date.now() - startTime > maxDuration) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we rely on the stale task status here instead? This way, if a task becomes stale we could provide some feedback about this in the UI and stop polling at the same time. Otherwise UI and the backend sort of go out of sync and there is no way to communicate the up-to-date status to the user.

Copy link
Contributor

@shahzad31 shahzad31 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM !!

We can probably refine API format in a follow up.

@mykolaharmash mykolaharmash self-requested a review December 31, 2025 10:14
Copy link
Contributor

@mykolaharmash mykolaharmash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @shahzad31, we can merge and iterate on this 👍

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
datasetQuality 1092 1093 +1
streamsApp 1420 1423 +3
total +4

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
@kbn/streams-schema 226 229 +3

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
streamsApp 1.5MB 1.5MB +3.7KB

Public APIs missing exports

Total count of every type that is part of your API that should be exported but is not. This will cause broken links in the API documentation system. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats exports for more detailed information.

id before after diff
streams 24 25 +1

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
streamsApp 24.1KB 23.5KB -656.0B
Unknown metric groups

API count

id before after diff
@kbn/streams-schema 233 236 +3

History

@miltonhultgren miltonhultgren requested review from a team and removed request for ersin-erdal January 6, 2026 12:50
Copy link
Contributor

@pmuellr pmuellr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the cancellable_task, I'm not seeing the task actually being cancelled via the passed in AbortController. Perhaps it's elsewhere and I missed it, but it seems like the call to abort the AC should be in this code.

}, 5000);
});

const result = await Promise.race([run(), cancellationPromise]).finally(() => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me that the actual task is going to be cancelled, if the cancellation promise goes off. I weould expect the AbortController sent into the task to get signalled, to indicate to the task itself that it's cancelled. And then the task has to actually USE that AbortController, as needed, to check to see if the task has been cancelled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand, there are two sources that would call AbortController.abort:

  1. The task manager itself (when cancelling a task run for any reason)
  2. The cancellableTask wrapper (on line 32 here) if the task has been marked being_canceled by the Streams code

In either case, the actual tasks themselves should be using runContext.abortController to pass to their HTTP requests so that those can be cancelled in response to either of those two cases calling abort.

Does that address your concern?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clearing that up. I think I must have missed the .abort() call in that code, somehow!

Copy link
Contributor

@pmuellr pmuellr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@miltonhultgren miltonhultgren enabled auto-merge (squash) January 7, 2026 15:09
@miltonhultgren miltonhultgren merged commit fcbfce8 into elastic:main Jan 7, 2026
13 checks passed
devamanv pushed a commit to devamanv/kibana that referenced this pull request Jan 12, 2026
### Summary

This PR:

- Adds a task called `streams_feature_identification` via the newly
added task service which calls the existing `identifyFeatures` function
and stores the result on the task document
- Updates the `POST /internal/streams/{name}/features/_identify` route
to schedule this task and check for the results
- Adds the `FeatureIdentificationControl` component which manages all of
the API interaction around Feature identification
- Moves related telemetry reporting to the server
- Adds a way to type and store parameters on the task document
- Adds a way to cancel tasks (wrap your run function in
`cancellableTask`)
- Adds another task state (`acknowledged`) to mark that the user has
taken action on the result of the task
- Adds a hook to poll for task updates for in progress tasks (and tasks
being cancelled)


https://github.com/user-attachments/assets/7c667112-e0a1-426d-a958-55cf4f2e26bb


https://github.com/user-attachments/assets/61e2c079-53dd-4318-8075-fdce466de35d

### Route changes and flags

The feature identification route now serves two roles:

- Managing the task
- Reporting the status of the task

The route accepts three flags: `schedule`, `cancel` and `acknowledge`
that all have a side effect.
`schedule` tries to schedule the task with task manager (and is a
`no-op` if the task is already running), this fails if the task is in
the `being_cancelled` state.
`cancel` moves the task document to `being_cancelled` state so that
`cancellableTask` can engage the abort controller to stop on going work.
`acknowledge` moves a `complete` task to the `acknowledged` state,
indicating that the user has reviewed the results of this task and taken
some follow up action, so it's safe to schedule this task again with
losing results (this is not enforced)

The route reports the following statuses:
`'not_started' | 'in_progress' | 'stale' | 'being_canceled' | 'canceled'
| 'failed' | 'completed' | 'acknowledged'`

Most of them are the state of the task, but `stale` is a special route
status that indicates that no updates were made to the task document for
a while.
The `failed` result includes an error message while `completed` and
`acknowledged` include the payload found on `task.task.payload`.

### Task document schema
Follow up to elastic#245725

The stored documents have the following shape:

```typescript
{
  id: string;
  type: string;
  status: TaskStatus;
  stream: string;
  space: string;
  created_at: string;
  task: {
    params: TaskParams;
    payload?: any // Only for completed and acknowledged tasks
    error?: string // Only for failed tasks
  };
}
```

All fields except `task` are indexed, and we store things under `task`
to avoid indexing them because of
elastic#245974
The tasks are stored in `.kibana_streams_tasks`

### To do

- Fix failing tests
- Add test for `cancellableTask`
- Manually test for robustness

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Co-authored-by: Shahzad <shahzad31comp@gmail.com>
Co-authored-by: Mykola Harmash <mykola.harmash@gmail.com>
miltonhultgren added a commit that referenced this pull request Jan 16, 2026
Similar to #245728, this makes the
Stream description generation process a background task.

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

author:actionable-obs PRs authored by the actionable obs team backport:skip This PR does not require backporting Feature:SigEvents Significant events feature, related to streams and rules/alerts (RnA) release_note:skip Skip the PR/issue when compiling release notes v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants