Conversation
|
This pull request does not have a backport label.
To fixup this pull request, you need to add the backport labels for the needed
|
|
Excited to see this! I will review more next week, but I first wanted to just test it out but I'm having some trouble. After building metricbeat with this PR, I'm running setup against a Serverless instance on production and the process is hanging on the initial ping. I tried running regular setup and with the I also tried completely omitting the API key to see if I got an auth error and adding the port number explicitly to the ES host URL but it all hangs on this initial ping. Anything I might be missing? |
|
@joshdover So, I've run into this problem a few times. The URL that beats is trying to connect to is and I did a curl myself, no response. I'm guessing you entered |
|
Yah, agreed, we need to set aside time and decide what various edge cases are, since for some things like template name versus policy name I'm not entirely sure what the correct behavior should be. |
@fearful-symmetry since you are closest to this you can create this list :) In particular create issues for any parts of this that are unclear to you so that we can discuss them. |
This error message should be special cased to either ILM or DSL depending on which type we detect we are connected to. Remember this error is going to be seen by users who have never used Filebeat before, if we don't tell them exactly what to do they likely won't know what the solution is themselves. |
| #setup.dsl.check_exists: true | ||
|
|
||
| # Overwrite the lifecycle policy at startup. The default is false. | ||
| #setup.dsl.overwrite: false No newline at end of file |
There was a problem hiding this comment.
You need a newline at the end of the file otherwise there is a missing space between the end of this section and the next one:
# Overwrite the lifecycle policy at startup. The default is false.
#setup.dsl.overwrite: false
# =================================== Kibana ===================================
# Starting with Beats version 6.0.0, the dashboards are loaded via the Kibana API.
# This requires a Kibana endpoint configuration.|
Similarly for the error: We should be able to suggest which one of ILM or DSL should be used because we know which ES type we are connected to. |
|
If I set the following DSL configuration I get an error. If I remove the policy_name setting and use the default it works: setup.dsl.enabled: true
# Set the lifecycle policy name. The default policy name is
# 'filebeat'.
setup.dsl.policy_name: mypolicy
setup.dsl.policy_file: "dsl_policy.json"
setup.dsl.overwrite: true❯ cat dsl_policy.json
{"data_retention": "71d"}
❯ ./filebeat setup --index-management
Exiting: error loading template: error updating lifecycle policy: error creating policy from config: error submitting policy: error creating lifecycle policy: got 404 from elasticsearch: {"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index [mypolicy]","resource.type":"index_or_alias","resource.id":"mypolicy","index_uuid":"_na_","index":"mypolicy"}],"type":"index_not_found_exception","reason":"no such index [mypolicy]","resource.type":"index_or_alias","resource.id":"mypolicy","index_uuid":"_na_","index":"mypolicy"},"status":404}If I set It seems like the |
|
@cmacknz Yah, the My initial idea was just to keep the DSL and ILM config the same, but you're probably right, |
It seems like it can be a pattern so |
| #setup.dsl.enabled: true | ||
|
|
||
| # Set the lifecycle policy name or pattern. For DSL, this name must match the data stream that the lifecycle is for. | ||
| # The default data stream pattern is %{[beat.name]}-%{[beat.version]}" |
There was a problem hiding this comment.
Were you intended to actually have %{[beat.name]}-%{[beat.version]} be templated or did you want this literal text in the docs?
I don't know that I've seen this format used elsewhere but I might be missing it.
There was a problem hiding this comment.
Ah, you're right, that's probably not the right way to do that...
There was a problem hiding this comment.
So, turns out there's no template variable for .VersionName or anything similar, so the next best thing is metricbeat-%{[agent.version]} an such.
|
Kinda baffled by the tar error in the packaging step, let's try that again... |
|
/test |
|
Alright, think we're at the point where we can force-merge? |
|
Yes, one more test and I'll merge it. I still think some of the error messages can be improved but that doesn't need to block this PR, will file a follow up for that. |
cmacknz
left a comment
There was a problem hiding this comment.
Re-tested the latest DSL changes with the data_stream_pattern config, still works. Thank!
Merging this one.
|
@elastic/fleet-qasource-external please test that ILM continues to work. You will need to add manual test cases for the new data stream lifecycle (DSL) configuration when using a serverless project. The instructions are in the PR description. It would also be a good idea to do some exploratory testing around ILM and DSL. |
|
@elastic/fleet-qasource-external we will want to test this for each Beat individually to confirm that:
|
|
@elastic/fleet-qasource-external The reference documentation for configuring ILM can be found in each Beat's reference configuration file: ILMbeats/filebeat/filebeat.reference.yml Lines 2413 to 2435 in e322104 A sample ILM configuration is below which can be saved to a file for testing: {
"policy": {
"phases": {
"hot": {
"min_age": "0ms",
"actions": {
"rollover": {
"max_age": "10d",
"max_primary_shard_size": "50gb"
}
}
}
}
}
}A sample ILM configuration for use with # ====================== Index Lifecycle Management (ILM) ======================
setup.ilm.enabled: false
setup.ilm.policy_name: "mypolicy"
setup.ilm.policy_file: "ilm_template.json"
#setup.ilm.check_exists: true
setup.ilm.overwrite: trueTo setup ILM you will need to first generate an API key which can be done from stack management. Note that the API key has to have the Beats format.
The loaded lifecycle policies can be viewed in Stack Management. The default ILM policy for filebeat will be named
DSLbeats/filebeat/filebeat.reference.yml Lines 2437 to 2462 in e322104 To setup DSL you will need to generate an API key which can be done from the Serverless project security page:
A sample DSL configuration is below which can be saved to a file for testing: {"data_retention": "5d"}If you run # ======================== Data Stream Lifecycle (DSL) =========================
setup.dsl.enabled: true
setup.dsl.data_stream_pattern: "filebeat-*"
setup.dsl.policy_file: "dsl_policy.json"
# setup.dsl.check_exists: true
setup.dsl.overwrite: true
|
|
Hi @cmacknz We have completed the testing by installing all beats on stateful 8.12.0 SNAPSHOT kibana cloud environment and 8.12.0 SNAPSHOT serverless environment. Beats Installed:
Build details:
Please let us know if we are missing any scenario needs to be covered here. Thanks!! |
|
Thanks for those tests, looks great! |
* make serverless integration tests run * update deps * linter, error handling * still fixing error handling * fixing old formatting verbs * still finding format verbs * add docs, fix typos * initial functional pass * fix setup, config * fix naming of config section * add headers * make linter happy * still making linter happy * tinkering with tests * still fixing tests * revert file * tinker with export * fix logging in tests * fix load checking in setup * fix url in integration test * fix commented out test line * stil tinkering with integraton test * fix bad init in tests, add more check to ES handler * add init checks for client handler, add more unit tests * make template loader serverless aware * change naming, error handling, rework config system * fix up integration tests * clean up load tests * stil making linter happy * simplify manager init, fix tests, update docs * minor test fixes * clean up tests * clean up typos, remove legacy error handling * expand logging * logging, error handling changes * change error messages * update lifetimes for serverless elasticsearch * fix integration tests * change error handling, clean up log messages * tinker with DSL config name * update docs * fix name example







This is a rather large PR that updates beats to provide DSL support and setup on serverless. Some of the parts are:
setup.dsl.*sectionAs an added note: the index management code is pretty complex and labyrinthine, and while I tried to simplify some things, it's still (needlessly) complex. I'm fairly certain that it would be possible to significantly simplify a lot of this code, but due to the tight deadlines of serverless support, it probably won't happen as part of this PR.
Checklist
CHANGELOG.next.asciidocorCHANGELOG-developer.next.asciidoc.How to test this PR locally
When I started working on this, my hope was that I could just glue on DSL support without touching the existing ILM code, but that wasn't the case. This means we'll need to test new DSL support, and existing ILM support.
Here's a high-level overview of suggested tests, which will need to be performed against both serverless and stateful instances
[beat] setup --index-managementagainst a fresh ES instance, with all default settings[beat] setup --index-managementagainst a fresh ES instance, with a custom policy file[beat] setup --index-managementagainst a fresh ES instance, with both ILM and DSL enabled in the config, ensure beats fails[beat] setup --index-managementagainst an existing ES instance with existing datastream config, all default settings, insure we don't overwrite[beat] setup --index-managementagainst an existing ES instance with existing datastream config, withoverwriteenabled, ensure we have overwritten data.[beat] setup --index-managementwith nosetup.*flags set in the config, ensure defaults are properly set up.overwriteenabled and a custom policy on the second setup. Ensure the policy is successfully updated.Tests to run over serverless only:
setup.template.nameandsetup.dsl.policy_nameto different values, runsetup --index-management. A warning message should be printed.setup.template.nameandsetup.dsl.policy_nameto the same value, runsetup --index-management, then run again with a custom policy, andsetup.dsl.overwriteset to trueTo create a custom policy for testing:
For DSL: create a file with
{"data_retention": "71d"}, point to it with thesetup.dsl.policy_filevalueFor ILM: ILM policies are a little more complex, see here for an example, point to it with the
setup.ilm.policy_filevalueValidating setup
To validate the
setup --index-management, there are two ES endpoints to check:_data_streampoints to the data stream, and contains alifecyclesection that should contain a valid lifecycle policy (see the above section for an example). The other endpoint is_index_template, which should contain a matching index template that links to the data stream.Known issues
configlibrary inelastic-agent-libs(issue forthcoming), if a user has bothsetup.ilm.*andsetup.dsl.*config values, one will need to be explicitly set todisabled, even if the other is explicitly set toenabledConfig principles
Beats now has to care about both ILM and DSL config, as well as the upstream elasticsearch. Beats should behave based on these principles:
Undefined behavior
Right now, there's some edge cases where I'm not sure about the correct behavior.
overwriteenabled will fail, This is because the initial index management setup injects the lifecycle policy directly into the template policy. However, to update the DSL policy, we must make a separate REST call to an endpoint that includes the template name. If the user has set a custom template name but not a policy name, the code will revert to a default (and incorrect) template name. Should the user be expected to settemplate.nameandpolicy_name? Should the code silently defer totemplate.name? Should the initial setup fail and tell the user to correct their config if one is set but not the other?