Skip to content

[ResponseOps][Alerting] Create a task to regenerate maintenance window events#219261

Merged
js-jankisalvi merged 29 commits intoelastic:mainfrom
js-jankisalvi:recreate-maintenance-window-events
Jun 18, 2025
Merged

[ResponseOps][Alerting] Create a task to regenerate maintenance window events#219261
js-jankisalvi merged 29 commits intoelastic:mainfrom
js-jankisalvi:recreate-maintenance-window-events

Conversation

@js-jankisalvi
Copy link
Contributor

@js-jankisalvi js-jankisalvi commented Apr 25, 2025

Summary

Resolves #211534

This PR adds a recurring task which will

  • run once every day
  • collect maintenance windows which have expiration date within 1 week
  • updates expiration date to +1 year if it is recurring
  • generate events for the next 1 year
  • adds new events to maintenance window

Checklist

How to test

  • Set expiration date to less than 1 week before creating maintenance windows: update line 70 expirationDate = moment().utc().add(1, 'year').toISOString(); to expirationDate = moment().utc().add(5, 'days').toISOString(); in the file x-pack/platform/plugins/shared/alerting/server/application/maintenance_window/methods/create/create_maintenance_window.ts
  • Create maintenance windows with different scenarios (recurring, non recurring, etc.)
  • Update task schedule to run every five minutes to test: set { interval: '1d' } to { interval: '5m' } in file x-pack/platform/plugins/shared/alerting/server/maintenance_window_events/task.ts
  • Verify the task ran successfully
  • Verify maintenance windows are updated properly with new expiration date and new events

Flaky test runner:

https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/8358

@js-jankisalvi js-jankisalvi self-assigned this Apr 25, 2025
@js-jankisalvi js-jankisalvi added Team:ResponseOps Platform ResponseOps team (formerly the Cases and Alerting teams) t// v9.1.0 v8.19.0 Feature:Alerting/RulesFramework Issues related to the Alerting Rules Framework labels Apr 29, 2025
@js-jankisalvi js-jankisalvi changed the title [ResponseOps][Alerting] Generate maintenance window events task [ResponseOps][Alerting] Task to generate maintenance window events Apr 29, 2025
@js-jankisalvi js-jankisalvi added the release_note:skip Skip the PR/issue when compiling release notes label Apr 29, 2025
@js-jankisalvi js-jankisalvi marked this pull request as ready for review April 29, 2025 10:17
@js-jankisalvi js-jankisalvi requested a review from a team as a code owner April 29, 2025 10:17
@elasticmachine
Copy link
Contributor

Pinging @elastic/response-ops (Team:ResponseOps)

@js-jankisalvi js-jankisalvi added the backport:version Backport to applied version labels label Apr 29, 2025
@js-jankisalvi js-jankisalvi changed the title [ResponseOps][Alerting] Task to generate maintenance window events [ResponseOps][Alerting] Create a task to generate maintenance window events Apr 29, 2025
if (rRule.interval || rRule.freq) {
expirationDate = moment().utc().add(1, 'year').toISOString();
} else {
expirationDate = moment(rRule.dtstart).utc().add(duration, 'ms').toISOString();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if maintenance window has non recurring schedule, we don't need to set expiration date for more +1 year.

Copy link
Member

@cnasikas cnasikas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I left some comments.

return {
...filteredMaintenanceWindow,
expirationDate: newEvents.length ? newExpirationDate : oldExpirationDate,
events: [...oldEvents, ...newEvents],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to keep the old events? I am wondering a) if the old events become so big that they cannot fit in memory, and b) if the events array becomes too big and the ES grows too much. The new events can contain some past events to be sure we do not miss anything. We can do that by setting the startDate of the generateMaintenanceWindowEvents to be one week in the past.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

startDateRange used in filter is 1 week before expiration date, hence used the same date to generate new events.

@js-jankisalvi js-jankisalvi changed the title [ResponseOps][Alerting] Create a task to generate maintenance window events [ResponseOps][Alerting] Create a task to regenerate maintenance window events May 27, 2025
…t --include-path /api/status --include-path /api/alerting/rule/ --include-path /api/alerting/rules --include-path /api/actions --include-path /api/security/role --include-path /api/spaces --include-path /api/streams --include-path /api/fleet --include-path /api/dashboards --include-path /api/saved_objects/_import --include-path /api/saved_objects/_export --include-path /api/maintenance_window --update'
Copy link
Contributor

@ymao1 ymao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments. It would be good to add a functional test for this as well.

.map((filteredMaintenanceWindow) => {
const { rRule, duration, expirationDate: oldExpirationDate } = filteredMaintenanceWindow;

const newEvents = generateMaintenanceWindowEvents({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like generateMaintenanceWindowEvents can throw an error. should we handle that in a try/catch here? otherwise an error in one maintenance window might cause all not to get updated?

Copy link
Contributor Author

@js-jankisalvi js-jankisalvi Jun 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const fullQuery = [
MaintenanceWindowStatus.Running,
MaintenanceWindowStatus.Upcoming,
MaintenanceWindowStatus.Finished,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we including finished maintenance windows in the query to check which ones need events? why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, suppose there was a maintenance window with 2 years of recurring schedule and runs once a year. So once this year's event is done it will be updated as finished. But it's end date is next year. That means it should generate event for next year even though it is finished as of now.

Comment on lines +233 to +235
for await (const findResults of soFinder.find()) {
mwsWithNewEvents = await generateEvents({
maintenanceWindowsSO: findResults.saved_objects,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this work? soFinder.find() can return multiple findResults with their own saved_objects? why doesn't it return a single findResult with everything?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It calls savedObjectsClient.find and iterates over multiple pages. It opens a new Point-In-Time (PIT), and continue paging until a set of results is received that's smaller than the designated perPage size. In our case if it has more than 1000 results, it will return in multiple batches. That's why we loop through findResults to find and update maximum 1000 SOs at a time.

@js-jankisalvi
Copy link
Contributor Author

Left a few comments. It would be good to add a functional test for this as well.

Added a functional test

@kibanamachine
Copy link
Contributor

Flaky Test Runner Stats

🎉 All tests passed! - kibana-flaky-test-suite-runner#8358

[✅] x-pack/test/alerting_api_integration/security_and_spaces/group3/config.ts: 25/25 tests passed.
[✅] x-pack/test/alerting_api_integration/security_and_spaces/group3/config_with_schedule_circuit_breaker.ts: 25/25 tests passed.

see run history

Copy link
Member

@cnasikas cnasikas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I left some comments.

};
});

const result = await savedObjectsClient.bulkUpdate<MaintenanceWindowAttributes>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's log the errors returned by the savedObjectsClient.bulkUpdate. bulkUpdate will not throw an error if any of the SOs failed to be updated. Instead, the error property of the SO in the response will not be empty and will have a message. We should iterate through the results and log the errors as Patrick suggested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated
Also subtracted MW SO with errors from totalUpdatedMaintenanceWindows

references: [],
};

const statusFilter = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is hard to understand the filter and verify if it is correct. What about writing it on KQL and then using the fromKueryExpression to convert it to the format you need?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done I still added a snapshot test for the whole object.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great test coverage!

// verify 2 maintenance windows are updated
await retry.try(async () => {
const maintenanceWindowsResult = await getUpdatedMaintenanceWindows();
expect(maintenanceWindowsResult.length).to.eql(2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please help me understand why two? In getUpdatedMaintenanceWindows, the number of "should be updated as..." is three, and I got confused.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two is correct number. Sorry my bad with incorrect comment.
The running MW should not be updated as its expiration date is 10 days in future while our task has 1 week past and 1 week future window.

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

Unknown metric groups

ESLint disabled line counts

id before after diff
@kbn/test-suites-xpack-platform 340 341 +1

Total ESLint disabled count

id before after diff
@kbn/test-suites-xpack-platform 344 345 +1

History

cc @js-jankisalvi

Copy link
Member

@cnasikas cnasikas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Great work!

describe('getStatusFilter', () => {
test('should build status filter', () => {
expect(getStatusFilter()).toEqual(statusFilter);
expect(getStatusFilter()).toMatchInlineSnapshot(`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: I was thinking something like expect(getStatusFilter()).toEqual(fromKueryExpression(statusFilter))

}

logger.debug(
`Cancelling maintenance windows events generator task - execution error due to timeout.`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: We are not sure why the task got cancelled.


await soFinder?.close();

return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit:

Suggested change
return;

@js-jankisalvi js-jankisalvi merged commit eaa9c1c into elastic:main Jun 18, 2025
10 checks passed
@js-jankisalvi js-jankisalvi deleted the recreate-maintenance-window-events branch June 18, 2025 11:46
@kibanamachine
Copy link
Contributor

Starting backport for target branches: 8.19

https://github.com/elastic/kibana/actions/runs/15731950402

@kibanamachine kibanamachine added the backport missing Added to PRs automatically when the are determined to be missing a backport. label Jun 20, 2025
@kibanamachine
Copy link
Contributor

Friendly reminder: Looks like this PR hasn’t been backported yet.
To create automatically backports add a backport:* label or prevent reminders by adding the backport:skip label.
You can also create backports manually by running node scripts/backport --pr 219261 locally
cc: @js-jankisalvi

@cnasikas
Copy link
Member

💚 All backports created successfully

Status Branch Result
8.19

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

cnasikas added a commit that referenced this pull request Jun 23, 2025
…e window events (#219261) (#224773)

# Backport

This will backport the following commits from `main` to `8.19`:
- [[ResponseOps][Alerting] Create a task to regenerate maintenance
window events (#219261)](#219261)

<!--- Backport version: 10.0.1 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sorenlouv/backport)

<!--BACKPORT [{"author":{"name":"Janki
Salvi","email":"117571355+js-jankisalvi@users.noreply.github.com"},"sourceCommit":{"committedDate":"2025-06-18T11:46:12Z","message":"[ResponseOps][Alerting]
Create a task to regenerate maintenance window events (#219261)\n\n##
Summary\n\nResolves
https://github.com/elastic/kibana/issues/211534\n\nThis PR adds a
recurring task which will\n- run once every day \n- collect maintenance
windows which have expiration date within 1 week\n- updates expiration
date to +1 year if it is recurring\n- generate events for the next 1
year\n- adds new events to maintenance window\n\n### Checklist\n\n- [x]
[Unit or
functional\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\nwere
updated or added to match the most common scenarios\n\n### How to
test\n- Set expiration date to less than 1 week before creating
maintenance\nwindows: update line 70 `expirationDate =
moment().utc().add(1,\n'year').toISOString();` to `expirationDate =
moment().utc().add(5,\n'days').toISOString();` in the
file\n`x-pack/platform/plugins/shared/alerting/server/application/maintenance_window/methods/create/create_maintenance_window.ts`\n-
Create maintenance windows with different scenarios (recurring,
non\nrecurring, etc.)\n- Update task schedule to run every five minutes
to test: set `{\ninterval: '1d' }` to `{ interval: '5m' }` in
file\n`x-pack/platform/plugins/shared/alerting/server/maintenance_window_events/task.ts`\n-
Verify the task ran successfully\n- Verify maintenance windows are
updated properly with new expiration\ndate and new events\n\n\n### Flaky
test
runner:\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/8358\n\n---------\n\nCo-authored-by:
kibanamachine
<42973632+kibanamachine@users.noreply.github.com>","sha":"eaa9c1c4cdfe2310d430a1f8c0a4fabf39475ad8","branchLabelMapping":{"^v9.1.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","Team:ResponseOps","backport
missing","Feature:Alerting/RulesFramework","backport:version","v9.1.0","v8.19.0"],"title":"[ResponseOps][Alerting]
Create a task to regenerate maintenance window
events","number":219261,"url":"https://github.com/elastic/kibana/pull/219261","mergeCommit":{"message":"[ResponseOps][Alerting]
Create a task to regenerate maintenance window events (#219261)\n\n##
Summary\n\nResolves
https://github.com/elastic/kibana/issues/211534\n\nThis PR adds a
recurring task which will\n- run once every day \n- collect maintenance
windows which have expiration date within 1 week\n- updates expiration
date to +1 year if it is recurring\n- generate events for the next 1
year\n- adds new events to maintenance window\n\n### Checklist\n\n- [x]
[Unit or
functional\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\nwere
updated or added to match the most common scenarios\n\n### How to
test\n- Set expiration date to less than 1 week before creating
maintenance\nwindows: update line 70 `expirationDate =
moment().utc().add(1,\n'year').toISOString();` to `expirationDate =
moment().utc().add(5,\n'days').toISOString();` in the
file\n`x-pack/platform/plugins/shared/alerting/server/application/maintenance_window/methods/create/create_maintenance_window.ts`\n-
Create maintenance windows with different scenarios (recurring,
non\nrecurring, etc.)\n- Update task schedule to run every five minutes
to test: set `{\ninterval: '1d' }` to `{ interval: '5m' }` in
file\n`x-pack/platform/plugins/shared/alerting/server/maintenance_window_events/task.ts`\n-
Verify the task ran successfully\n- Verify maintenance windows are
updated properly with new expiration\ndate and new events\n\n\n### Flaky
test
runner:\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/8358\n\n---------\n\nCo-authored-by:
kibanamachine
<42973632+kibanamachine@users.noreply.github.com>","sha":"eaa9c1c4cdfe2310d430a1f8c0a4fabf39475ad8"}},"sourceBranch":"main","suggestedTargetBranches":["8.19"],"targetPullRequestStates":[{"branch":"main","label":"v9.1.0","branchLabelMappingKey":"^v9.1.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/219261","number":219261,"mergeCommit":{"message":"[ResponseOps][Alerting]
Create a task to regenerate maintenance window events (#219261)\n\n##
Summary\n\nResolves
https://github.com/elastic/kibana/issues/211534\n\nThis PR adds a
recurring task which will\n- run once every day \n- collect maintenance
windows which have expiration date within 1 week\n- updates expiration
date to +1 year if it is recurring\n- generate events for the next 1
year\n- adds new events to maintenance window\n\n### Checklist\n\n- [x]
[Unit or
functional\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\nwere
updated or added to match the most common scenarios\n\n### How to
test\n- Set expiration date to less than 1 week before creating
maintenance\nwindows: update line 70 `expirationDate =
moment().utc().add(1,\n'year').toISOString();` to `expirationDate =
moment().utc().add(5,\n'days').toISOString();` in the
file\n`x-pack/platform/plugins/shared/alerting/server/application/maintenance_window/methods/create/create_maintenance_window.ts`\n-
Create maintenance windows with different scenarios (recurring,
non\nrecurring, etc.)\n- Update task schedule to run every five minutes
to test: set `{\ninterval: '1d' }` to `{ interval: '5m' }` in
file\n`x-pack/platform/plugins/shared/alerting/server/maintenance_window_events/task.ts`\n-
Verify the task ran successfully\n- Verify maintenance windows are
updated properly with new expiration\ndate and new events\n\n\n### Flaky
test
runner:\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/8358\n\n---------\n\nCo-authored-by:
kibanamachine
<42973632+kibanamachine@users.noreply.github.com>","sha":"eaa9c1c4cdfe2310d430a1f8c0a4fabf39475ad8"}},{"branch":"8.19","label":"v8.19.0","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"}]}]
BACKPORT-->

Co-authored-by: Janki Salvi <117571355+js-jankisalvi@users.noreply.github.com>
@kibanamachine kibanamachine removed the backport missing Added to PRs automatically when the are determined to be missing a backport. label Jun 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport:version Backport to applied version labels Feature:Alerting/RulesFramework Issues related to the Alerting Rules Framework release_note:skip Skip the PR/issue when compiling release notes Team:ResponseOps Platform ResponseOps team (formerly the Cases and Alerting teams) t// v8.19.0 v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ResponseOps][MW] Recreate maintenance window events after expiration date.

7 participants