[Task Manager] Ensure putTemplate will only create/update the index template #28540

tsullivan · 2019-01-10T22:55:39Z

Summary

This PR implements changes per feedback from the original Task Manager PR, that slipped through the cracks during review #24356 (comment):

@pickypg on Dec 7, 2018 Member
...
But also we should not be setting this unless we need to set it. If it already exists, then we're unnecessary causing a cluster state change on the ES side every time that Kibana starts up.

As an aside, the ES Template API supports version values and we should be using it based on the version of Kibana so that we do not replace a newer version of Kibana's template with our own. And, given a newer template, we should consider not marking the task manager as initialized.

Checklist

Use ~~strikethroughs~~ to remove checklist items you don't feel are applicable to this PR.

~~This was checked for cross-browser compatibility, including a check against IE11~~
~~Any text added follows EUI's writing guidelines, uses sentence case text and includes i18n support~~
~~Documentation was added for features that require explanation or tutorials~~
Unit or functional tests were updated or added to match the most common scenarios
~~This was checked for keyboard-only and screenreader accessibility~~

For maintainers

This was checked for breaking API changes and was labeled appropriately
This includes a feature addition or change that requires a release note and was labeled appropriately

elasticmachine · 2019-01-10T22:55:41Z

Pinging @elastic/kibana-platform

x-pack/plugins/task_manager/lib/get_template_version.ts

chrisdavies

The semver checking seems flawed to me. Have you considered using the semver library?

chrisdavies · 2019-01-11T12:46:07Z

x-pack/plugins/task_manager/lib/get_template_version.ts

+ * https://github.com/elastic/elasticsearch/blob/de962b2/server/src/main/java/org/elasticsearch/Version.java
+ */
+export function getTemplateVersion(versionStr: string): number {
+  // break up the string parts


I tested this, and it doesn't seem to handle some edge cases.

The following expression evaluates to true, but should be false (e.g. 1.32 is greater than 1.3).

getTemplateVersion('1.32.0') < getTemplateVersion('1.3.223423')

Is there a reason we aren't using the semver library to handle semvers? It's what we're using in migrations, for example.

He matched the version behavior that Elasticsearch uses, which simply doesn't allow for 3-digit, or greater, minor and patch releases. If we get into a situation where we have Kibana 6.2.100, then I think we have other issues.

We could add a test that pulls in the packaged version and asserts that it isn't >= 99 so that we can be aware that the next minor / patch release will have that issue though.

I would love using semver here, but the value for the version field needs to be an integer per the ES API. I took a stab at looking for whether using semver would help this code more-or-less as it is. It can be used to replace versionStr.split but I don't see much other benefit.

But as this function needs to produce an integer, I think it still has the same problem with 3-digit version parts that you bring up

This thread is relevant to a call we had with @njd5475 and @chrisdavies and others

What actually "upgrades" existing documents in the task manager index?
Since that isn't a concern to worry about for this version, I haven't been addressing that very much. I think this PR will create a logical space in the code where upgrade logic will go in the future.

Use semver? Or integer?
In this PR, I'm setting up a way to define what version of Kibana a document will be compatible with, and doing that with the kibana.apiVersion field. That lets me create a filter in the search query to ES. However, if the documents are going to be compatible with the Kibana Migrations service, their versioning will be done with semver, and the version should not necessarily update with each release. Also, if the documents were stored using the Saved Objects client, this would happen automatically.

If I understand things correctly, Kibana Migrations and Saved Objects client mean we don't need to worry about BWC in our own code? That would be great! Although, for now there seem to be no features that use those capabilities other than core Kibana for the .kibana index. Given that, I think the more hands-on versioning that I'm doing here is OK for now (Task Manager has no consumers yet, and is a beta service). In the future, maintainers can go over TaskStore and fashion it to use the Saved Objects client.

Does this sound ok?

cc @cwurm, was on the call and had some thoughts about Saved Object client as well

sorry - I meant @weltenwort

pickypg

I really like the direction of this PR. But it opened up some other thoughts.

pickypg · 2019-01-10T23:53:28Z

x-pack/plugins/task_manager/task_store.ts

+    }
+
+    if (existingVersion > templateVersion) {
+      throw new Error(


We have two options here:

Throw the error and prevent ourselves from initializing the task manager in such cases (what happens with this code currently).

Ignore it, just like we would with existingVersion === templateVersion.

Both options are valid, but worth discussing:

If we throw the error, then we're saying "we do not support backward compatibility within the task manager and you must disable the task manager for this Kibana instance to be fully functional, or upgrade it."

If we do not throw the error, then we're saying "we do support backward compatibility within the task manager and you can expect this version to coexist with the other version(s) of Kibana(s) running against Elasticsearch [right now]."

I think all of us want to be in the second bucket, but we aren't currently prepared to be there and for the beta that's probably acceptable (to the point that maybe we should even say that we do not support BWC during the beta). That said, I think this leads to a few thoughts about the existing documents / mappings that we should probably prepare for:

We should add the Kibana version that created the task, to every document (and thus the mappings). This should allow us to upgrade documents later and allow us to get to BWC versions.

Older versions of Kibana's task manager should then ignore tasks from newer versions.

Newer versions can continue to consume older versions (note that this would likely lead to starving an older Kibana's task manager as the new version(s) rewrite older tasks).

We should consider using an ingest pipeline to upgrade documents from older versions on-the-fly.

We should consider using dynamic: false instead of dynamic: strict to allow us to add/rename, then remove/rename fields in the future.

It's safe to remove a field from a strict mapping, but if we wanted to add one, then remove/rename it later, then incoming documents would have that added field "incorrectly" and fail to be indexed even though, theoretically, the newer version could properly handle the older version.

TL;DR: Keep this as-is for now, but plan to change it by adding some other logic.

What do you think?

Yes, it's great to think about these things now.

In the 3 preparation steps, number 2 and 3 will apply to future versions of the stack. For now, I really like the point about indexing the Kibana version in the document of the scheduled task. It will have to be the integer / template version number, so the numeric comparison can be made that will let a "new" instance claim a task scheduled by an "old" instance. I like how you wrote it in our chat conversation: "[we] want it to so that Kibana can query for tasks with a version <= their version"

That means that the "old" instance can still start and complete Task Manager initialization, but it will be aware that there are going to be tasks that it can not claim. That definitely calls for a warning log message.

x-pack/plugins/task_manager/lib/get_template_version.test.ts

x-pack/plugins/task_manager/task_store.ts

pickypg

I really like the direction of it, but it opened up some other thoughts.

tsullivan · 2019-01-11T23:32:36Z

With this change, if you run a version 7.0.0 and a version 7.0.1 Kibana instance in the same cluster, you will see this warning log in the 7.0.0 instance's logs:

server log [23:26:50.970] [warning][task_manager] This Kibana instance defines an older template version (7000099) than is currently in Elasticsearch (7000199). Because of the potential for non-backwards compatible changes, this Kibana instance will only be able to claim scheduled tasks with "kibana.apiVersion": 1 in the task metadata.

tsullivan · 2019-01-11T23:37:20Z

I pushed some changes to address #28540 (comment). There are still a few TODOs:

Provide the server UUID and scheduled ts in the Kibana metadata
Incorporate API version into the search filters.

elasticmachine · 2019-01-30T17:19:38Z

💚 Build Succeeded

continuous-integration/kibana-ci/pull-request

elasticmachine · 2019-02-05T14:32:43Z

💔 Build Failed

continuous-integration/kibana-ci/pull-request

njd5475 · 2019-02-04T17:48:46Z

x-pack/plugins/task_manager/task_manager.ts

      maxAttempts: config.get('xpack.task_manager.max_attempts'),
      supportedTypes: Object.keys(this.definitions),
+      logger,
+      getKibanaUuid: () => config.get('server.uuid'),


It doesn't look like we are using the kibana uuid. Why exactly would we want to add it if we are not using it?

This is for troubleshooting. With any Kibana instance in the cluster able to claim a task, it'll be great being able to track problematic instances doing stuff to the tasks. We're doing this in the Reporting index as well, because we saw benefit in being able to track it

njd5475 · 2019-02-05T18:35:12Z

x-pack/plugins/task_manager/task_store.ts

-      this._isInitialized = true;
-      return templateResult;
+      // extract the existing version
+      existingVersion = get(templateCheck[templateName], 'version') || 0;


You could just use the default parameter in lodashes get method.

Suggested change

existingVersion = get(templateCheck[templateName], 'version') || 0;

existingVersion = get(templateCheck[templateName], 'version', 0);

Otherwise why use get at all?

Suggested change

existingVersion = get(templateCheck[templateName], 'version') || 0;

existingVersion = templateCheck[templateName].version || 0;

But if we expect that templateCheck[templateName] might not exist then we'd need to do

Suggested change

existingVersion = get(templateCheck[templateName], 'version') || 0;

existingVersion = (templateCheck[templateName] || {version: 0}).version;

Good thing you brought this up. Does #29906 apply to template versions as well? @spalger do you know?

Actually I think this is different. It's not for optimistic concurrency for document updates.

c4d1611 5efaa93

njd5475 · 2019-02-05T18:43:49Z

x-pack/plugins/task_manager/task_store.ts


  return {
    _id: doc.id,
    _index: index,


I think you wanted to change this

Suggested change

_index: index,

_index: store.index,

Though I don't know why taskDocToRaw is a first-class function if this is just being passed to it. Why not make it a private member function to the TaskStore class?

yeah that was a bad merge resolution: 36dd212

The taskDocToRaw thing is pre-existing code, if first-class functions are a problem then it would be +1 from me to do a cleanup PR. That's not for 6.7 though.

Yeah, I think a cleanup PR would make sense, not sure the code is very clear on the usage of taskDocToRaw and rawSource. But not essential for 6.7

njd5475

So this LGTM it solves the original purpose but will leave the mappings if they exist as is, something that will have to be addressed in a followup PR, among other things.

tsullivan · 2019-02-05T22:24:44Z

Added blocker-ff as Task Manager needs this change before it can be launched in 6.7.0

Working on fixing a failing unit test.

elasticmachine · 2019-02-05T22:32:51Z

💔 Build Failed

continuous-integration/kibana-ci/pull-request

elasticmachine · 2019-02-05T23:43:51Z

💚 Build Succeeded

continuous-integration/kibana-ci/pull-request

…emplate (elastic#28540) * get template version * ensure putTemplate only creates/updates the template * fix tests * new test for throwing error re old template * note comment * clean up test * test clarification * store kibana metadata in scheduled task doc * map `dynamic: false` * logging * add kibana uuid * fix tests * last todo * fetching available task uses apiVersion in the query * scheduledAt * ts fix * ts fix * fix snapshot * await to fail test if snapshot does not match * fix bad merge * remove _.get * fix typeError happening in tests

…ndex template (#28540) (#30163) * [Task Manager] Ensure putTemplate will only create/update the index template (#28540) * get template version * ensure putTemplate only creates/updates the template * fix tests * new test for throwing error re old template * note comment * clean up test * test clarification * store kibana metadata in scheduled task doc * map `dynamic: false` * logging * add kibana uuid * fix tests * last todo * fetching available task uses apiVersion in the query * scheduledAt * ts fix * ts fix * fix snapshot * await to fail test if snapshot does not match * fix bad merge * remove _.get * fix typeError happening in tests * undo merged 7.0 changes

tsullivan · 2019-02-06T04:30:41Z

We should consider using dynamic: false instead of dynamic: strict to allow us to add/rename, then remove/rename fields in the future.

It's safe to remove a field from a strict mapping, but if we wanted to add one, then remove/rename it later, then incoming documents would have that added field "incorrectly" and fail to be indexed even though, theoretically, the newer version could properly handle the older version.

Just to be explicit about this, this change wasn't taken into feedback, so the mapping is still dynamic: 'strict'. This impacts developers that retain their Elasticsearch data as they update the source code from master.

Kibana will log a server error if it tries to schedule tasks and the mapping hasn't gotten updated.

Since dynamic is set to strict, taking in the new changes that add scheduledAt, etc will require clearing out the old Task Manager data first:

curl -X DELETE http://localhost:9200/_template/.kibana_task_manager -u username:password
curl -X DELETE http://localhost:9200/.kibana_task_manager -u username:password

Then restart Kibana.

This is a 6.7.0-6.7.0 breaking change :)

tsullivan added 5 commits January 10, 2019 13:21

get template version

fdcd142

ensure putTemplate only creates/updates the template

0a8a195

fix tests

f01f89f

new test for throwing error re old template

e55750a

note comment

640fa48

tsullivan added review Team:Core Platform Core services: plugins, logging, config, saved objects, http, ES client, i18n, etc t// v7.0.0 v6.7.0 labels Jan 10, 2019

tsullivan requested review from chrisdavies, epixa, njd5475 and pickypg January 10, 2019 22:55

tsullivan commented Jan 10, 2019

View reviewed changes

x-pack/plugins/task_manager/lib/get_template_version.ts Show resolved Hide resolved

clean up test

5e18822

chrisdavies suggested changes Jan 11, 2019

View reviewed changes

njd5475 mentioned this pull request Jan 11, 2019

[WIP] Store should not update template on every restart. #28611

Closed

7 tasks

pickypg reviewed Jan 11, 2019

View reviewed changes

tsullivan added 4 commits January 11, 2019 14:08

Merge branch 'master' into taskmanager/template-safety

3d39746

test clarification

1196d88

store kibana metadata in scheduled task doc

a0845b1

map dynamic: false

d7b1891

tsullivan added 4 commits January 14, 2019 12:05

Merge branch 'master' into taskmanager/template-safety

1ca259b

logging

1ccafe5

add kibana uuid

4f2b61a

fix tests

a367be4

elastic deleted a comment from elasticmachine Jan 30, 2019

tsullivan added 3 commits January 30, 2019 09:07

Merge branch 'master' into taskmanager/template-safety

578bc29

fix snapshot

b659944

await to fail test if snapshot does not match

dcd90d8

elastic deleted a comment from elasticmachine Jan 30, 2019

Merge branch 'master' into taskmanager/template-safety

0e5d17c

tsullivan added 2 commits February 5, 2019 09:02

Merge branch 'master' into taskmanager/template-safety

3e4a3fe

fix bad merge

36dd212

njd5475 reviewed Feb 5, 2019

View reviewed changes

remove _.get

c4d1611

njd5475 approved these changes Feb 5, 2019

View reviewed changes

tsullivan added the blocker-ff label Feb 5, 2019

fix typeError happening in tests

5efaa93

tsullivan merged commit 7bec93f into elastic:master Feb 5, 2019

tsullivan deleted the taskmanager/template-safety branch February 5, 2019 23:56

tsullivan mentioned this pull request Feb 6, 2019

[6.x] [Task Manager] Ensure putTemplate will only create/update the index template (#28540) #30163

Merged

	existingVersion = get(templateCheck[templateName], 'version') \|\| 0;
	existingVersion = get(templateCheck[templateName], 'version', 0);

	existingVersion = get(templateCheck[templateName], 'version') \|\| 0;
	existingVersion = templateCheck[templateName].version \|\| 0;

	existingVersion = get(templateCheck[templateName], 'version') \|\| 0;
	existingVersion = (templateCheck[templateName] \|\| {version: 0}).version;

[Task Manager] Ensure putTemplate will only create/update the index template #28540

[Task Manager] Ensure putTemplate will only create/update the index template #28540

Uh oh!

Conversation

tsullivan commented Jan 10, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Checklist

For maintainers

Uh oh!

elasticmachine commented Jan 10, 2019

Uh oh!

Uh oh!

chrisdavies left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pickypg left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pickypg left a comment

Choose a reason for hiding this comment

Uh oh!

tsullivan commented Jan 11, 2019

Uh oh!

tsullivan commented Jan 11, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticmachine commented Jan 30, 2019

💚 Build Succeeded

Uh oh!

elasticmachine commented Feb 5, 2019

💔 Build Failed

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

njd5475 left a comment

Choose a reason for hiding this comment

Uh oh!

tsullivan commented Feb 5, 2019

Uh oh!

elasticmachine commented Feb 5, 2019

💔 Build Failed

Uh oh!

tsullivan commented Jan 10, 2019 •

edited

Loading

tsullivan commented Jan 11, 2019 •

edited

Loading