Skip to content

Conversation

@tsullivan
Copy link
Member

@tsullivan tsullivan commented Jan 10, 2019

Summary

This PR implements changes per feedback from the original Task Manager PR, that slipped through the cracks during review #24356 (comment):

@pickypg on Dec 7, 2018 Member
...
But also we should not be setting this unless we need to set it. If it already exists, then we're unnecessary causing a cluster state change on the ES side every time that Kibana starts up.

As an aside, the ES Template API supports version values and we should be using it based on the version of Kibana so that we do not replace a newer version of Kibana's template with our own. And, given a newer template, we should consider not marking the task manager as initialized.

Checklist

Use strikethroughs to remove checklist items you don't feel are applicable to this PR.

For maintainers

@tsullivan tsullivan added review Team:Core Platform Core services: plugins, logging, config, saved objects, http, ES client, i18n, etc t// v7.0.0 v6.7.0 labels Jan 10, 2019
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-platform

Copy link
Contributor

@chrisdavies chrisdavies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The semver checking seems flawed to me. Have you considered using the semver library?

* https://github.com/elastic/elasticsearch/blob/de962b2/server/src/main/java/org/elasticsearch/Version.java
*/
export function getTemplateVersion(versionStr: string): number {
// break up the string parts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this, and it doesn't seem to handle some edge cases.

The following expression evaluates to true, but should be false (e.g. 1.32 is greater than 1.3).

getTemplateVersion('1.32.0') < getTemplateVersion('1.3.223423')

Is there a reason we aren't using the semver library to handle semvers? It's what we're using in migrations, for example.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

He matched the version behavior that Elasticsearch uses, which simply doesn't allow for 3-digit, or greater, minor and patch releases. If we get into a situation where we have Kibana 6.2.100, then I think we have other issues.

We could add a test that pulls in the packaged version and asserts that it isn't >= 99 so that we can be aware that the next minor / patch release will have that issue though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love using semver here, but the value for the version field needs to be an integer per the ES API. I took a stab at looking for whether using semver would help this code more-or-less as it is. It can be used to replace versionStr.split but I don't see much other benefit.

But as this function needs to produce an integer, I think it still has the same problem with 3-digit version parts that you bring up

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This thread is relevant to a call we had with @njd5475 and @chrisdavies and others

  1. What actually "upgrades" existing documents in the task manager index?
    Since that isn't a concern to worry about for this version, I haven't been addressing that very much. I think this PR will create a logical space in the code where upgrade logic will go in the future.
  2. Use semver? Or integer?
    In this PR, I'm setting up a way to define what version of Kibana a document will be compatible with, and doing that with the kibana.apiVersion field. That lets me create a filter in the search query to ES. However, if the documents are going to be compatible with the Kibana Migrations service, their versioning will be done with semver, and the version should not necessarily update with each release. Also, if the documents were stored using the Saved Objects client, this would happen automatically.

If I understand things correctly, Kibana Migrations and Saved Objects client mean we don't need to worry about BWC in our own code? That would be great! Although, for now there seem to be no features that use those capabilities other than core Kibana for the .kibana index. Given that, I think the more hands-on versioning that I'm doing here is OK for now (Task Manager has no consumers yet, and is a beta service). In the future, maintainers can go over TaskStore and fashion it to use the Saved Objects client.

Does this sound ok?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @cwurm, was on the call and had some thoughts about Saved Object client as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not me :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry - I meant @weltenwort

Copy link
Member

@pickypg pickypg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the direction of this PR. But it opened up some other thoughts.

}

if (existingVersion > templateVersion) {
throw new Error(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have two options here:

  1. Throw the error and prevent ourselves from initializing the task manager in such cases (what happens with this code currently).
  2. Ignore it, just like we would with existingVersion === templateVersion.

Both options are valid, but worth discussing:

  1. If we throw the error, then we're saying "we do not support backward compatibility within the task manager and you must disable the task manager for this Kibana instance to be fully functional, or upgrade it."
  2. If we do not throw the error, then we're saying "we do support backward compatibility within the task manager and you can expect this version to coexist with the other version(s) of Kibana(s) running against Elasticsearch [right now]."

I think all of us want to be in the second bucket, but we aren't currently prepared to be there and for the beta that's probably acceptable (to the point that maybe we should even say that we do not support BWC during the beta). That said, I think this leads to a few thoughts about the existing documents / mappings that we should probably prepare for:

  1. We should add the Kibana version that created the task, to every document (and thus the mappings). This should allow us to upgrade documents later and allow us to get to BWC versions.
    1. Older versions of Kibana's task manager should then ignore tasks from newer versions.
    2. Newer versions can continue to consume older versions (note that this would likely lead to starving an older Kibana's task manager as the new version(s) rewrite older tasks).
  2. We should consider using an ingest pipeline to upgrade documents from older versions on-the-fly.
  3. We should consider using dynamic: false instead of dynamic: strict to allow us to add/rename, then remove/rename fields in the future.
    • It's safe to remove a field from a strict mapping, but if we wanted to add one, then remove/rename it later, then incoming documents would have that added field "incorrectly" and fail to be indexed even though, theoretically, the newer version could properly handle the older version.

TL;DR: Keep this as-is for now, but plan to change it by adding some other logic.

What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's great to think about these things now.

In the 3 preparation steps, number 2 and 3 will apply to future versions of the stack. For now, I really like the point about indexing the Kibana version in the document of the scheduled task. It will have to be the integer / template version number, so the numeric comparison can be made that will let a "new" instance claim a task scheduled by an "old" instance. I like how you wrote it in our chat conversation: "[we] want it to so that Kibana can query for tasks with a version <= their version"

That means that the "old" instance can still start and complete Task Manager initialization, but it will be aware that there are going to be tasks that it can not claim. That definitely calls for a warning log message.

Copy link
Member

@pickypg pickypg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the direction of it, but it opened up some other thoughts.

@tsullivan
Copy link
Member Author

With this change, if you run a version 7.0.0 and a version 7.0.1 Kibana instance in the same cluster, you will see this warning log in the 7.0.0 instance's logs:

server log [23:26:50.970] [warning][task_manager] This Kibana instance defines an older template version (7000099) than is currently in Elasticsearch (7000199). Because of the potential for non-backwards compatible changes, this Kibana instance will only be able to claim scheduled tasks with "kibana.apiVersion": 1 in the task metadata.

@tsullivan
Copy link
Member Author

tsullivan commented Jan 11, 2019

I pushed some changes to address #28540 (comment). There are still a few TODOs:

  1. Provide the server UUID and scheduled ts in the Kibana metadata
  2. Incorporate API version into the search filters.

@elastic elastic deleted a comment from elasticmachine Jan 30, 2019
@elastic elastic deleted a comment from elasticmachine Jan 30, 2019
@elastic elastic deleted a comment from elasticmachine Jan 30, 2019
@elastic elastic deleted a comment from elasticmachine Jan 30, 2019
@elastic elastic deleted a comment from elasticmachine Jan 30, 2019
@elastic elastic deleted a comment from elasticmachine Jan 30, 2019
@elastic elastic deleted a comment from elasticmachine Jan 30, 2019
@elasticmachine
Copy link
Contributor

💚 Build Succeeded

@elasticmachine
Copy link
Contributor

💔 Build Failed

maxAttempts: config.get('xpack.task_manager.max_attempts'),
supportedTypes: Object.keys(this.definitions),
logger,
getKibanaUuid: () => config.get('server.uuid'),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like we are using the kibana uuid. Why exactly would we want to add it if we are not using it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for troubleshooting. With any Kibana instance in the cluster able to claim a task, it'll be great being able to track problematic instances doing stuff to the tasks. We're doing this in the Reporting index as well, because we saw benefit in being able to track it

this._isInitialized = true;
return templateResult;
// extract the existing version
existingVersion = get(templateCheck[templateName], 'version') || 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could just use the default parameter in lodashes get method.

Suggested change
existingVersion = get(templateCheck[templateName], 'version') || 0;
existingVersion = get(templateCheck[templateName], 'version', 0);

Otherwise why use get at all?

Suggested change
existingVersion = get(templateCheck[templateName], 'version') || 0;
existingVersion = templateCheck[templateName].version || 0;

But if we expect that templateCheck[templateName] might not exist then we'd need to do

Suggested change
existingVersion = get(templateCheck[templateName], 'version') || 0;
existingVersion = (templateCheck[templateName] || {version: 0}).version;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good thing you brought this up. Does #29906 apply to template versions as well? @spalger do you know?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think this is different. It's not for optimistic concurrency for document updates.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


return {
_id: doc.id,
_index: index,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you wanted to change this

Suggested change
_index: index,
_index: store.index,

Though I don't know why taskDocToRaw is a first-class function if this is just being passed to it. Why not make it a private member function to the TaskStore class?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that was a bad merge resolution: 36dd212

The taskDocToRaw thing is pre-existing code, if first-class functions are a problem then it would be +1 from me to do a cleanup PR. That's not for 6.7 though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think a cleanup PR would make sense, not sure the code is very clear on the usage of taskDocToRaw and rawSource. But not essential for 6.7

Copy link
Contributor

@njd5475 njd5475 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this LGTM it solves the original purpose but will leave the mappings if they exist as is, something that will have to be addressed in a followup PR, among other things.

@tsullivan
Copy link
Member Author

Added blocker-ff as Task Manager needs this change before it can be launched in 6.7.0

Working on fixing a failing unit test.

@elasticmachine
Copy link
Contributor

💔 Build Failed

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

@tsullivan tsullivan merged commit 7bec93f into elastic:master Feb 5, 2019
@tsullivan tsullivan deleted the taskmanager/template-safety branch February 5, 2019 23:56
tsullivan added a commit to tsullivan/kibana that referenced this pull request Feb 6, 2019
…emplate (elastic#28540)

* get template version

* ensure putTemplate only creates/updates the template

* fix tests

* new test for throwing error re old template

* note comment

* clean up test

* test clarification

* store kibana metadata in scheduled task doc

* map `dynamic: false`

* logging

* add kibana uuid

* fix tests

* last todo

* fetching available task uses apiVersion in the query

* scheduledAt

* ts fix

* ts fix

* fix snapshot

* await to fail test if snapshot does not match

* fix bad merge

* remove _.get

* fix typeError happening in tests
tsullivan added a commit that referenced this pull request Feb 6, 2019
…ndex template (#28540) (#30163)

* [Task Manager] Ensure putTemplate will only create/update the index template (#28540)

* get template version

* ensure putTemplate only creates/updates the template

* fix tests

* new test for throwing error re old template

* note comment

* clean up test

* test clarification

* store kibana metadata in scheduled task doc

* map `dynamic: false`

* logging

* add kibana uuid

* fix tests

* last todo

* fetching available task uses apiVersion in the query

* scheduledAt

* ts fix

* ts fix

* fix snapshot

* await to fail test if snapshot does not match

* fix bad merge

* remove _.get

* fix typeError happening in tests

* undo merged 7.0 changes
@tsullivan
Copy link
Member Author

We should consider using dynamic: false instead of dynamic: strict to allow us to add/rename, then remove/rename fields in the future.

  • It's safe to remove a field from a strict mapping, but if we wanted to add one, then remove/rename it later, then incoming documents would have that added field "incorrectly" and fail to be indexed even though, theoretically, the newer version could properly handle the older version.

Just to be explicit about this, this change wasn't taken into feedback, so the mapping is still dynamic: 'strict'. This impacts developers that retain their Elasticsearch data as they update the source code from master.

Kibana will log a server error if it tries to schedule tasks and the mapping hasn't gotten updated.

Since dynamic is set to strict, taking in the new changes that add scheduledAt, etc will require clearing out the old Task Manager data first:

curl -X DELETE http://localhost:9200/_template/.kibana_task_manager -u username:password
curl -X DELETE http://localhost:9200/.kibana_task_manager -u username:password

Then restart Kibana.

This is a 6.7.0-6.7.0 breaking change :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

review Team:Core Platform Core services: plugins, logging, config, saved objects, http, ES client, i18n, etc t// v6.7.0 v7.0.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants