Skip to content

Conversation

@mark-vieira
Copy link
Contributor

@mark-vieira mark-vieira commented Jan 18, 2023

This PR converts all existing full cluster restart tests (except the ingest-geoip module) to use the new junit rule-based test clusters framework. The geoip module relies on a existing docker-compose test fixture and we want to move away from those to using TestContainers. That'll get done in a follow up.

The main takeaway here is the logic in build scripts is drastically reduced. This boilerplate is no longer needed since there's only a single test task per BWC version and the upgrade is done inside the test execution itself.

@mark-vieira mark-vieira added >test Issues or PRs that are addressing/adding tests :Delivery/Build Build or test infrastructure test-full-bwc Trigger full BWC version matrix tests auto-backport-and-merge v8.6.1 v7.17.9 labels Jan 18, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-delivery (Team:Delivery)

@elasticsearchmachine elasticsearchmachine added Team:Delivery Meta label for Delivery team v8.7.0 labels Jan 18, 2023
@mark-vieira
Copy link
Contributor Author

I'm blocked currently on an issue where the tests are asserting that the request will result in a warning about trying to access a system index, but no such warning is returned.

My knowledge of system indicies is basically nil but as I understand it there was no such thing prior to 7.10, so when upgrading from those versions we also upgrade those old indexes to "system" indexes. As far as I can tell this is done asynchronously after startup so my hunch is that this hasn't completed before we go to run our next test, causing it to fail. I believe the reason why this is just now happening is because we're simply running our tests much sooner after an upgrade because there is no longer the overhead is starting a new test task, forking a jvm, etc after the upgrade completes. I've "confirmed" this by adding a sleep before the failing test that and causes the failures to stop.

So what's the best way forward here? Should we add a small delay before these tests? Is there a way to ask ES if the system indicies upgrade has completed? Should I just retry the assertion some number of times?

cc @williamrandolph @rjernst

@mark-vieira
Copy link
Contributor Author

@droberts195 I'm hitting this error occasionally.

I'm thinking this is similar to above, where on "upgrade" there are some processes that happen asynchronously and one side effect of the refactoring of how the upgrade tests work is they now run much sooner after upgrade as there's less overhead between cluster orchestration and test execution. Does this sound right? If so, would wrapping those hidden index assertions in an assertBusy() be a reasonable solution?

# Conflicts:
#	test/test-clusters/src/main/java/org/elasticsearch/test/cluster/local/AbstractLocalSpecBuilder.java
#	test/test-clusters/src/main/java/org/elasticsearch/test/cluster/local/DefaultLocalClusterSpecBuilder.java
#	test/test-clusters/src/main/java/org/elasticsearch/test/cluster/local/LocalClusterFactory.java
#	test/test-clusters/src/main/java/org/elasticsearch/test/cluster/local/LocalClusterSpec.java
#	test/test-clusters/src/main/java/org/elasticsearch/test/cluster/local/LocalSpecBuilder.java
droberts195 pushed a commit to droberts195/elasticsearch that referenced this pull request Jan 23, 2023
When a cluster is upgraded from a version that didn't have hidden
indices/aliases to one that does, we make the relevant ML indices
and aliases hidden. However, the steps to do this are split over
5 actions that run sequentially, so the work might take a second or
two after the upgraded cluster starts up. This can cause the test
that asserts on the final state to fail if it runs very quickly
after the upgraded cluster startup. The solution is to retry the
test for a few seconds to give a chance for all the upgrade actions
to complete.

Relates elastic#93062
@droberts195
Copy link

I'm thinking this is similar to above, where on "upgrade" there are some processes that happen asynchronously and one side effect of the refactoring of how the upgrade tests work is they now run much sooner after upgrade as there's less overhead between cluster orchestration and test execution. Does this sound right?

Yes, definitely. There are 5 steps that happen in sequence, and it's failing on the 3rd one in that build scan you linked:

// Step 3: Once indices are hidden, fetch ML internal aliases to find out whether the aliases are hidden or not.

If so, would wrapping those hidden index assertions in an assertBusy() be a reasonable solution?

Yes, I have opened #93148 to do this.

mark-vieira and others added 16 commits January 31, 2023 11:07
The issue with this test failure is actually that we were silently
failing to install the plugin under test into the cluster. The root
cause here was the FIPS security policy file was not copied into cluster
config directory before we attempting to run the plugin installer. Since
we pass the FIPS JVM arguments to all CLI tools as well this caused
plugin installation to fail. We now ensure that these files are copied
before we attempt to run _any_ ES tools.

Closes elastic#93303
It's pretty common to run a block of code in a `try ... catch` block
that just passes exceptions off to a listener's `onFailure` method. This
commit adds a small utility to encapsulate this, enabling some
one-liners.
For skipping replication:
* ReplicationTracker and Group filter shards that are promotable to primary
* Remove unpromotable shards from in sync allocations in metadata
* There is a new Refresh action for unpromotable replica shards

Fixes ES-4861

For skipping peer recovery:
* Unpromotable shards pass directly to STARTED skipping some intermediate peer recovery stages and messages

Fixes ES-5257
* Add a section about token-based authentication

It took me a considerable time to figure out the syntax for a token-based authentication, and I said why not add it to the documentation

* Update x-pack/docs/en/watcher/input/http.asciidoc

* Update x-pack/docs/en/watcher/input/http.asciidoc

---------

Co-authored-by: Abdon Pijpelink <[email protected]>
This PR adds support for building roles for remote_access
authentication instances, under the new remote cluster security model.

This change is stand-alone and not wired up to active code flows yet. A
proof of concept in elastic#92089 highlights how the model change in this PR
fits into the broader context of the fulfilling cluster processing
cross cluster requests.
Change ilm and data streams dependencies to be test dependencies.
…lastic#93383)

In elastic#93160, we never set the forced_refresh flag in the response. With
this change, the bulk response now correctly reflects what happened. It
also unblocks a bunch of YAML tests for Stateless.

Relates ES-5292
cache potentially duped values in the `DateProcessor`, avoiding the creation of disposable objects during the different executions
This change introduces the configuration option `ignore_missing_component_templates` as discussed in elastic#92426 The implementation [option 6](elastic#92426 (comment)) was picked with a slight adjustment meaning no patterns are allowed.

## Implementation

During the creation of an index template, the list of component templates is checked if all component templates exist. This check is extended to skip any component templates which are listed under `ignore_missing_component_templates`. An index template that skips the check for the component template `logs-foo@custom` looks as following:


```
PUT _index_template/logs-foo
{
  "index_patterns": ["logs-foo-*"],
  "data_stream": { },
  "composed_of": ["logs-foo@package", "logs-foo@custom"],
  "ignore_missing_component_templates": ["logs-foo@custom"],
  "priority": 500
}
```

The component template `logs-foo@package` has to exist before creation. It can be created with:

```
PUT _component_template/logs-foo@custom
{
  "template": {
    "mappings": {
      "properties": {
        "host.ip": {
          "type": "ip"
        }
      }
    }
  }
}
```

## Testing

For manual testing, different scenarios can be tested. To simplify testing, the commands from `.http` file are added. Before each test run, a clean cluster is expected.

### New behaviour, missing component template

With the new config option, it must be possible to create an index template with a missing component templates without getting an error:

```
### Add logs-foo@package component template

PUT http://localhost:9200/
    _component_template/logs-foo@package
Authorization: Basic elastic password
Content-Type: application/json

{
  "template": {
    "mappings": {
      "properties": {
        "host.name": {
          "type": "keyword"
        }
      }
    }
  }
}

### Add logs-foo index template

PUT http://localhost:9200/
    _index_template/logs-foo
Authorization: Basic elastic password
Content-Type: application/json

{
  "index_patterns": ["logs-foo-*"],
  "data_stream": { },
  "composed_of": ["logs-foo@package", "logs-foo@custom"],
  "ignore_missing_component_templates": ["logs-foo@custom"],
  "priority": 500
}

### Create data stream

PUT http://localhost:9200/
    _data_stream/logs-foo-bar
Authorization: Basic elastic password
Content-Type: application/json

### Check if mappings exist

GET http://localhost:9200/
    logs-foo-bar
Authorization: Basic elastic password
Content-Type: application/json
```

It is checked if all templates could be created and data stream mappings are correct.

### Old behaviour, with all component templates

In the following, a component template is made optional but it already exists. It is checked, that it will show up in the mappings:

```
### Add logs-foo@package component template

PUT http://localhost:9200/
    _component_template/logs-foo@package
Authorization: Basic elastic password
Content-Type: application/json

{
  "template": {
    "mappings": {
      "properties": {
        "host.name": {
          "type": "keyword"
        }
      }
    }
  }
}

### Add logs-foo@custom component template

PUT http://localhost:9200/
    _component_template/logs-foo@custom
Authorization: Basic elastic password
Content-Type: application/json

{
  "template": {
    "mappings": {
      "properties": {
        "host.ip": {
          "type": "ip"
        }
      }
    }
  }
}

### Add logs-foo index template

PUT http://localhost:9200/
    _index_template/logs-foo
Authorization: Basic elastic password
Content-Type: application/json

{
  "index_patterns": ["logs-foo-*"],
  "data_stream": { },
  "composed_of": ["logs-foo@package", "logs-foo@custom"],
  "ignore_missing_component_templates": ["logs-foo@custom"],
  "priority": 500
}

### Create data stream

PUT http://localhost:9200/
    _data_stream/logs-foo-bar
Authorization: Basic elastic password
Content-Type: application/json

### Check if mappings exist

GET http://localhost:9200/
    logs-foo-bar
Authorization: Basic elastic password
Content-Type: application/json
```

### Check old behaviour

Ensure, that the old behaviour still exists when a component template is used that is not part of `ignore_missing_component_templates`: 

```
### Add logs-foo index template

PUT http://localhost:9200/
    _index_template/logs-foo
Authorization: Basic elastic password
Content-Type: application/json

{
  "index_patterns": ["logs-foo-*"],
  "data_stream": { },
  "composed_of": ["logs-foo@package", "logs-foo@custom"],
  "ignore_missing_component_templates": ["logs-foo@custom"],
  "priority": 500
}
```

Co-authored-by: Lee Hinman <[email protected]>
# Conflicts:
#	test/test-clusters/src/main/java/org/elasticsearch/test/cluster/local/LocalClusterFactory.java
@mark-vieira mark-vieira merged commit 013b2e5 into elastic:main Jan 31, 2023
@mark-vieira mark-vieira deleted the test_containers_bwc_testing branch January 31, 2023 20:26
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.6 Commit could not be cherrypicked due to conflicts
7.17 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 93062

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Delivery/Build Build or test infrastructure Team:Delivery Meta label for Delivery team >test Issues or PRs that are addressing/adding tests test-full-bwc Trigger full BWC version matrix tests v7.17.10 v8.6.2 v8.7.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.