[Ingest-Manager] Implementation for the Deploy scenario of the stand-alone mode#140
Conversation
| return err | ||
| } | ||
|
|
||
| esQuery := map[string]interface{}{ |
|
|
||
| cfg := es.Config{ | ||
| Addresses: []string{fmt.Sprintf("http://%s:%d", host, port)}, | ||
| Username: "elastic", |
There was a problem hiding this comment.
I ran locally other tests using the ES client, and they pass!
| // queryMaxAttempts is the number of attempts to query elasticsearch before aborting | ||
| // It can be overriden by OP_QUERY_MAX_ATTEMPTS env var | ||
| var queryMaxAttempts = 5 | ||
|
|
||
| // queryRetryTimeout is the number of seconds between elasticsearch retry queries. | ||
| // It can be overriden by OP_RETRY_TIMEOUT env var | ||
| var queryRetryTimeout = 3 | ||
|
|
There was a problem hiding this comment.
As a follow-up improvement, I'm considering converting this basic retry into the backoff strategy. So it's in the TODO
💔 Tests FailedExpand to view the summary
Build stats
Test stats 🧪
Test errorsExpand to view the tests failures
Log outputExpand to view the last 100 lines of log output
|
d13da16 to
c00ef94
Compare
This new method would replace existing retry+search calls in follow up commits
We want to start querying right after that moment, for the same hostname
| parameters { | ||
| choice(name: 'runTestsSuite', choices: ['all', 'helm', 'ingest-manager', 'metricbeat'], description: 'Choose which test suite to run (default: all)') | ||
| choice(name: 'LOG_LEVEL', choices: ['INFO', 'DEBUG'], description: 'Log level to be used') | ||
| choice(name: 'QUERY_MAX_ATTEMPTS', choices: ['5', '10', '20'], description: 'Number of attempts to create the connection to Elasticsearch') |
There was a problem hiding this comment.
Removed this argument, as the code is not using a fixed retry option any more
| }).Error("The Kibana instance could not get the healthy status") | ||
| } | ||
|
|
||
| imts.StandAlone.RuntimeDependenciesStartDate = time.Now() |
There was a problem hiding this comment.
Marking when the runtime deps are started. We want to have here a lower boundary for queries
| maxTimeout := time.Duration(30) * time.Second | ||
| minimumHitsCount := 1 |
There was a problem hiding this comment.
With just one hit found in 30 secs, we should fail the assertion
EricDavisX
left a comment
There was a problem hiding this comment.
Go for it - this is great. Thanks Manu
|
|
||
| // WaitForNumberOfHits waits for an elasticsearch query to return more than a number of hits, | ||
| // returning false if the query does not reach that number in a defined number of time. | ||
| func WaitForNumberOfHits(indexName string, query map[string]interface{}, desiredHits int, maxTimeout time.Duration) (SearchResult, error) { |
There was a problem hiding this comment.
this is fabulous. I wonder if (in a coming PR) the next step will be to parametrize this further and make it even more re-usable than it is. I see we have the index, and # of hits, timeout, and the query.
It may not be easy or quite appropriate, but I expect keeping fewer copies of the big 'query' listed in code the better... we can chat it, and its certainly mergeable as is, I'm just thinking long term (the first thing someone will do is another query check for something different in ES I'm betting!)
| maxTimeout := time.Duration(queryRetryTimeout) * time.Minute | ||
| minimumHitsCount := 100 | ||
|
|
||
| result, err := searchAgentData(sats.Hostname, sats.RuntimeDependenciesStartDate, minimumHitsCount, maxTimeout) |
There was a problem hiding this comment.
We use the moment the runtime deps (kibana + ES) started to check for documents
| maxTimeout := time.Duration(30) * time.Second | ||
| minimumHitsCount := 1 | ||
|
|
||
| result, err := searchAgentData(sats.Hostname, sats.AgentStoppedDate, minimumHitsCount, maxTimeout) |
There was a problem hiding this comment.
We use the moment the agent stopped to check for documents
There was a problem hiding this comment.
I should have caught this earlier, but indeed - we should put in some minimal (1 second or 2??) padding on the chance that a doc was sent right as agent is stopping and is in ES microseconds after the agent stop time is recorded @mdelapenya what do you think?
There was a problem hiding this comment.
Flakiness at sight!!
TBH not sure, we can add a few seconds to that time
There was a problem hiding this comment.
Let's merge as is, so we can observe how it behaves
What does this PR do?
It adds the code for checking that there are documents in the index. To achieve it, we use the query provided here: #126 (comment)
To perform this query, we need the hostname, which in a docker container matches the container id. Nevertheless, we execute the
hostnamecommand in the agent container.To query Elasticsearch when security is enabled, we had to refactor a bit the existing helper methods for querying, adding authentication to the creation of the ES client. Besides that, we refactored the helper method to check for hits in an ES response.
Why is it important?
It puts in green the three BDD scenarios for the stand-alone mode of the Elastic Agent
Follow-up considerations
I noticed in Kibana that the hits always belong to this index: "logs-agent-default", that's why I'm hardcoding it.