Skip to content

[Task Manager] Adding list of explicitly de-registered task types#123963

Merged
ymao1 merged 16 commits intoelastic:mainfrom
ymao1:task-manager/unregistered-task-types
Feb 15, 2022
Merged

[Task Manager] Adding list of explicitly de-registered task types#123963
ymao1 merged 16 commits intoelastic:mainfrom
ymao1:task-manager/unregistered-task-types

Conversation

@ymao1
Copy link
Copy Markdown
Contributor

@ymao1 ymao1 commented Jan 27, 2022

Resolves #122389

Summary

Adds a list of explicitly de-registered task types and updates claim script to only mark a task as unrecognized if the task type exists in that list.

In order to explicitly stop registering a task, plugin authors will need to remove the task registration code as well as add the task type to the REMOVED_TYPES list in x-pack/plugins/task_manager/server/task_type_dictionary.ts

  • If the task is not registered but not added to REMOVED_TYPES, it will not be marked as unrecognized. It will just not be claimed.
  • If a task is added to REMOVED_TYPES but still registered, an error will be thrown. These scenarios should be caught in development.

Todo

  • Add old siem rule type to list.

To Verify

  1. Create a rule that runs frequently and make sure it runs (I used the People in Space rule). Query for the rule in the task manager index and see that status is either idle or running.
  2. Update the code so that this rule type is no longer registered with alerting
--- a/x-pack/examples/alerting_example/server/plugin.ts
+++ b/x-pack/examples/alerting_example/server/plugin.ts
@@ -28,7 +28,7 @@ export interface AlertingExampleDeps {
 export class AlertingExamplePlugin implements Plugin<void, void, AlertingExampleDeps> {
   public setup(core: CoreSetup, { alerting, features }: AlertingExampleDeps) {
     alerting.registerType(alwaysFiringAlert);
-    alerting.registerType(peopleInSpaceAlert);
+    // alerting.registerType(peopleInSpaceAlert);
  1. Verify that the rule no longer runs and the status stays idle
  2. Revert the update made in step 2. Verify that the rule starts running again.
  3. Update the code to add this rule type to the list of removed types
--- a/x-pack/plugins/task_manager/server/task_type_dictionary.ts
+++ b/x-pack/plugins/task_manager/server/task_type_dictionary.ts
@@ -14,6 +14,7 @@ import { Logger } from '../../../../src/core/server';
 export const REMOVED_TYPES: string[] = [
   // for testing
   'sampleTaskRemovedType',
+  'alerting:example.people-in-space',
 ];
  1. Verify that the rule status is updated to unrecognized.

Checklist

@ymao1
Copy link
Copy Markdown
Contributor Author

ymao1 commented Feb 2, 2022

@elasticmachine merge upstream

@ymao1
Copy link
Copy Markdown
Contributor Author

ymao1 commented Feb 3, 2022

@elasticmachine merge upstream

@ymao1 ymao1 changed the title Adding REMOVED_TYPES to task manager and only marking those types as … [Task Manager] Adding list of explicitly de-registered task types Feb 3, 2022
@ymao1 ymao1 self-assigned this Feb 3, 2022
@ymao1 ymao1 added backport:skip This PR does not require backporting Feature:Task Manager release_note:skip Skip the PR/issue when compiling release notes Team:ResponseOps Platform ResponseOps team (formerly the Cases and Alerting teams) t// v8.2.0 labels Feb 3, 2022
@ymao1 ymao1 marked this pull request as ready for review February 3, 2022 17:06
@ymao1 ymao1 requested a review from a team as a code owner February 3, 2022 17:06
@elasticmachine
Copy link
Copy Markdown
Contributor

Pinging @elastic/response-ops (Team:ResponseOps)

*/
export const REMOVED_TYPES: string[] = [
// for testing
'sampleTaskRemovedType',
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once this PR is merged will we not actually have any type in here?
It raises the question of whether we need the unusedTypes feature at all, no? 🤔

@pmuellr
Copy link
Copy Markdown
Contributor

pmuellr commented Feb 9, 2022

This reminds me of an issue I forgot to open up, when looking to disable apm telemetry. I believe when I looked into this, if apm was configured to disable telemetry, the existing telemetry task would eventually get marked as unrecognized, with no way to reverse that without explicitly deleting the task doc from the TM index. I think if we merge something like this, the existing code would then work fine.

Start here, where the apm plugin sets up telemetry stuff, including registering the telemetry task, in the plugin setup(), by calling createApmTelemetry() - but that call is made conditionally:

public setup(
core: CoreSetup<APMPluginStartDependencies>,
plugins: APMPluginSetupDependencies
) {
this.logger = this.initContext.logger.get();
const config$ = this.initContext.config.create<APMConfig>();
core.savedObjects.registerType(apmIndices);
core.savedObjects.registerType(apmTelemetry);
core.savedObjects.registerType(apmServerSettings);
const currentConfig = this.initContext.config.get<APMConfig>();
this.currentConfig = currentConfig;
if (
plugins.taskManager &&
plugins.usageCollection &&
currentConfig.telemetryCollectionEnabled
) {
createApmTelemetry({
core,
config$,
usageCollector: plugins.usageCollection,
taskManager: plugins.taskManager,
logger: this.logger,
kibanaVersion: this.initContext.env.packageInfo.version,
});
}

The actual task registration is in that createApmTelemetry() function:

export async function createApmTelemetry({
core,
config$,
usageCollector,
taskManager,
logger,
kibanaVersion,
}: {
core: CoreSetup;
config$: Observable<APMConfig>;
usageCollector: UsageCollectionSetup;
taskManager: TaskManagerSetupContract;
logger: Logger;
kibanaVersion: string;
}) {
taskManager.registerTaskDefinitions({
[APM_TELEMETRY_TASK_NAME]: {
title: 'Collect APM usage',
createTaskRunner: () => {
return {
run: async () => {
await collectAndStore();
},
cancel: async () => {},
};
},
},
});

It wouldn't surprise me to find other task registrations that are conditional like this. Again, I >think< with the change in this PR, such tasks would be ignored, until Kibana was later restarted and when down a conditional path to register the task again. And wouldn't end up in the unrecognized task black hole. Am I thinking about that right?

@ymao1 ymao1 requested review from mikecote and pmuellr February 10, 2022 13:19
@ymao1 ymao1 requested a review from ersin-erdal February 10, 2022 13:19
@ymao1
Copy link
Copy Markdown
Contributor Author

ymao1 commented Feb 10, 2022

It wouldn't surprise me to find other task registrations that are conditional like this. Again, I >think< with the change in this PR, such tasks would be ignored, until Kibana was later restarted and when down a conditional path to register the task again. And wouldn't end up in the unrecognized task black hole. Am I thinking about that right?

@pmuellr Yes, that's correct. With the change in this PR, those tasks would not get marked as unrecognized unless they are specifically added to the UNUSED_TYPES list. Instead task manager would just ignore them so they stay idle until a Kibana that does register the task type tries to claim them.

ctx._source.task.status = "claiming"; ${Object.keys(fieldUpdates)
.map((field) => `ctx._source.task.${field}=params.fieldUpdates.${field};`)
.join(' ')}
} else if (!params.skippedTaskTypes.contains(ctx._source.task.taskType)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need to test the skipped tasks?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously we maintained a list of claimableTaskTypes and skippedTaskTypes. These are all the types that this task manager knows about so if a task is not claimable or skipped, we mark is as unrecognized. Now we are just using claimableTaskTypes to claim and unusedTaskTypes to determine what to mark as unrecognized. Skipped task types will go into the else bucket and get a no-op.

Copy link
Copy Markdown
Contributor

@mikecote mikecote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM, tested locally the following scenarios:

  • Alerting rule with a task in "unrecognized" status returning to idle after upgrade
  • Alerting rule with a task definition missing did not make the task have an unrecognized status
  • Alerting rule with a task definition missing and part of REMOVED_TYPES caused the task status to become unrecognized

👍

@ymao1
Copy link
Copy Markdown
Contributor Author

ymao1 commented Feb 14, 2022

@elasticmachine merge upstream

@ymao1 ymao1 requested a review from ersin-erdal February 14, 2022 13:06
updateFieldsAndMarkAsFailed(fieldUpdates, claimTasksById, ['foo', 'bar'], [], {
foo: 5,
bar: 2,
updateFieldsAndMarkAsFailed({
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean Code, group the arguments, love it :)

Copy link
Copy Markdown
Contributor

@ersin-erdal ersin-erdal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ymao1
Copy link
Copy Markdown
Contributor Author

ymao1 commented Feb 15, 2022

@elasticmachine merge upstream

'sampleTaskRemovedType',

// deprecated in https://github.com/elastic/kibana/pull/121442
'alerting:siem.signals',
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@madirey @marshallmain

Tagging you on this PR for awareness. We adding an explicit registry of removed task types that will be marked as unrecognized. Adding alerting:siem.signals due to this PR that removed that rule type. There should be no change to the outcome of that PR (meaning old siem.signals rule types will still be marked as unrecognized)

@kibana-ci
Copy link
Copy Markdown

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @ymao1

@ymao1 ymao1 merged commit c2a0103 into elastic:main Feb 15, 2022
@tsullivan
Copy link
Copy Markdown
Member

I just found out about this PR by investigating #120995 and finding I could no longer reproduce the Reporting issue. Thank you @ymao1!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport:skip This PR does not require backporting Feature:Task Manager release_note:skip Skip the PR/issue when compiling release notes Team:ResponseOps Platform ResponseOps team (formerly the Cases and Alerting teams) t// v8.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Task Manager] Investigate better handling of unregistered task types

9 participants