-
Notifications
You must be signed in to change notification settings - Fork 8.5k
[Task Manager] Ensure putTemplate will only create/update the index template #28540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Task Manager] Ensure putTemplate will only create/update the index template #28540
Conversation
|
Pinging @elastic/kibana-platform |
chrisdavies
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The semver checking seems flawed to me. Have you considered using the semver library?
| * https://github.com/elastic/elasticsearch/blob/de962b2/server/src/main/java/org/elasticsearch/Version.java | ||
| */ | ||
| export function getTemplateVersion(versionStr: string): number { | ||
| // break up the string parts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested this, and it doesn't seem to handle some edge cases.
The following expression evaluates to true, but should be false (e.g. 1.32 is greater than 1.3).
getTemplateVersion('1.32.0') < getTemplateVersion('1.3.223423')
Is there a reason we aren't using the semver library to handle semvers? It's what we're using in migrations, for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
He matched the version behavior that Elasticsearch uses, which simply doesn't allow for 3-digit, or greater, minor and patch releases. If we get into a situation where we have Kibana 6.2.100, then I think we have other issues.
We could add a test that pulls in the packaged version and asserts that it isn't >= 99 so that we can be aware that the next minor / patch release will have that issue though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would love using semver here, but the value for the version field needs to be an integer per the ES API. I took a stab at looking for whether using semver would help this code more-or-less as it is. It can be used to replace versionStr.split but I don't see much other benefit.
But as this function needs to produce an integer, I think it still has the same problem with 3-digit version parts that you bring up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This thread is relevant to a call we had with @njd5475 and @chrisdavies and others
- What actually "upgrades" existing documents in the task manager index?
Since that isn't a concern to worry about for this version, I haven't been addressing that very much. I think this PR will create a logical space in the code where upgrade logic will go in the future. - Use semver? Or integer?
In this PR, I'm setting up a way to define what version of Kibana a document will be compatible with, and doing that with thekibana.apiVersionfield. That lets me create a filter in the search query to ES. However, if the documents are going to be compatible with the Kibana Migrations service, their versioning will be done with semver, and the version should not necessarily update with each release. Also, if the documents were stored using the Saved Objects client, this would happen automatically.
If I understand things correctly, Kibana Migrations and Saved Objects client mean we don't need to worry about BWC in our own code? That would be great! Although, for now there seem to be no features that use those capabilities other than core Kibana for the .kibana index. Given that, I think the more hands-on versioning that I'm doing here is OK for now (Task Manager has no consumers yet, and is a beta service). In the future, maintainers can go over TaskStore and fashion it to use the Saved Objects client.
Does this sound ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @cwurm, was on the call and had some thoughts about Saved Object client as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not me :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry - I meant @weltenwort
pickypg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like the direction of this PR. But it opened up some other thoughts.
| } | ||
|
|
||
| if (existingVersion > templateVersion) { | ||
| throw new Error( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have two options here:
- Throw the error and prevent ourselves from initializing the task manager in such cases (what happens with this code currently).
- Ignore it, just like we would with
existingVersion === templateVersion.
Both options are valid, but worth discussing:
- If we throw the error, then we're saying "we do not support backward compatibility within the task manager and you must disable the task manager for this Kibana instance to be fully functional, or upgrade it."
- If we do not throw the error, then we're saying "we do support backward compatibility within the task manager and you can expect this version to coexist with the other version(s) of Kibana(s) running against Elasticsearch [right now]."
I think all of us want to be in the second bucket, but we aren't currently prepared to be there and for the beta that's probably acceptable (to the point that maybe we should even say that we do not support BWC during the beta). That said, I think this leads to a few thoughts about the existing documents / mappings that we should probably prepare for:
- We should add the Kibana version that created the task, to every document (and thus the mappings). This should allow us to upgrade documents later and allow us to get to BWC versions.
- Older versions of Kibana's task manager should then ignore tasks from newer versions.
- Newer versions can continue to consume older versions (note that this would likely lead to starving an older Kibana's task manager as the new version(s) rewrite older tasks).
- We should consider using an ingest pipeline to upgrade documents from older versions on-the-fly.
- We should consider using
dynamic: falseinstead ofdynamic: strictto allow us to add/rename, then remove/rename fields in the future.- It's safe to remove a field from a strict mapping, but if we wanted to add one, then remove/rename it later, then incoming documents would have that added field "incorrectly" and fail to be indexed even though, theoretically, the newer version could properly handle the older version.
TL;DR: Keep this as-is for now, but plan to change it by adding some other logic.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's great to think about these things now.
In the 3 preparation steps, number 2 and 3 will apply to future versions of the stack. For now, I really like the point about indexing the Kibana version in the document of the scheduled task. It will have to be the integer / template version number, so the numeric comparison can be made that will let a "new" instance claim a task scheduled by an "old" instance. I like how you wrote it in our chat conversation: "[we] want it to so that Kibana can query for tasks with a version <= their version"
That means that the "old" instance can still start and complete Task Manager initialization, but it will be aware that there are going to be tasks that it can not claim. That definitely calls for a warning log message.
pickypg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like the direction of it, but it opened up some other thoughts.
|
With this change, if you run a version 7.0.0 and a version 7.0.1 Kibana instance in the same cluster, you will see this warning log in the 7.0.0 instance's logs:
|
|
I pushed some changes to address #28540 (comment). There are still a few TODOs:
|
💚 Build Succeeded |
💔 Build Failed |
| maxAttempts: config.get('xpack.task_manager.max_attempts'), | ||
| supportedTypes: Object.keys(this.definitions), | ||
| logger, | ||
| getKibanaUuid: () => config.get('server.uuid'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look like we are using the kibana uuid. Why exactly would we want to add it if we are not using it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for troubleshooting. With any Kibana instance in the cluster able to claim a task, it'll be great being able to track problematic instances doing stuff to the tasks. We're doing this in the Reporting index as well, because we saw benefit in being able to track it
| this._isInitialized = true; | ||
| return templateResult; | ||
| // extract the existing version | ||
| existingVersion = get(templateCheck[templateName], 'version') || 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could just use the default parameter in lodashes get method.
| existingVersion = get(templateCheck[templateName], 'version') || 0; | |
| existingVersion = get(templateCheck[templateName], 'version', 0); |
Otherwise why use get at all?
| existingVersion = get(templateCheck[templateName], 'version') || 0; | |
| existingVersion = templateCheck[templateName].version || 0; |
But if we expect that templateCheck[templateName] might not exist then we'd need to do
| existingVersion = get(templateCheck[templateName], 'version') || 0; | |
| existingVersion = (templateCheck[templateName] || {version: 0}).version; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I think this is different. It's not for optimistic concurrency for document updates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
|
||
| return { | ||
| _id: doc.id, | ||
| _index: index, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you wanted to change this
| _index: index, | |
| _index: store.index, |
Though I don't know why taskDocToRaw is a first-class function if this is just being passed to it. Why not make it a private member function to the TaskStore class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah that was a bad merge resolution: 36dd212
The taskDocToRaw thing is pre-existing code, if first-class functions are a problem then it would be +1 from me to do a cleanup PR. That's not for 6.7 though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think a cleanup PR would make sense, not sure the code is very clear on the usage of taskDocToRaw and rawSource. But not essential for 6.7
njd5475
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this LGTM it solves the original purpose but will leave the mappings if they exist as is, something that will have to be addressed in a followup PR, among other things.
|
Added blocker-ff as Task Manager needs this change before it can be launched in 6.7.0 Working on fixing a failing unit test. |
💔 Build Failed |
💚 Build Succeeded |
…emplate (elastic#28540) * get template version * ensure putTemplate only creates/updates the template * fix tests * new test for throwing error re old template * note comment * clean up test * test clarification * store kibana metadata in scheduled task doc * map `dynamic: false` * logging * add kibana uuid * fix tests * last todo * fetching available task uses apiVersion in the query * scheduledAt * ts fix * ts fix * fix snapshot * await to fail test if snapshot does not match * fix bad merge * remove _.get * fix typeError happening in tests
…ndex template (#28540) (#30163) * [Task Manager] Ensure putTemplate will only create/update the index template (#28540) * get template version * ensure putTemplate only creates/updates the template * fix tests * new test for throwing error re old template * note comment * clean up test * test clarification * store kibana metadata in scheduled task doc * map `dynamic: false` * logging * add kibana uuid * fix tests * last todo * fetching available task uses apiVersion in the query * scheduledAt * ts fix * ts fix * fix snapshot * await to fail test if snapshot does not match * fix bad merge * remove _.get * fix typeError happening in tests * undo merged 7.0 changes
Just to be explicit about this, this change wasn't taken into feedback, so the mapping is still Kibana will log a server error if it tries to schedule tasks and the mapping hasn't gotten updated. Since dynamic is set to strict, taking in the new changes that add Then restart Kibana. This is a 6.7.0-6.7.0 breaking change :) |
Summary
This PR implements changes per feedback from the original Task Manager PR, that slipped through the cracks during review #24356 (comment):
Checklist
Use
strikethroughsto remove checklist items you don't feel are applicable to this PR.This was checked for cross-browser compatibility, including a check against IE11Any text added follows EUI's writing guidelines, uses sentence case text and includes i18n supportDocumentation was added for features that require explanation or tutorialsThis was checked for keyboard-only and screenreader accessibilityFor maintainers