Single-node clusters use non-trivial discovery config when they shouldn't #1644

danielmitterdorfer · 2023-01-04T09:16:30Z

Rally version (get with esrally --version): esrally 2.7.1.dev0 (git revision: 4e1335ee07e16a4af0de1a0b543a86595ca37803) (current master branch)

Description of the problem including expected versus actual behavior:

When running a benchmark with Rally against a single node cluster, the Elasticsearch logs contain repeated warning messages like the following:

[2023-01-04T09:39:10,994][WARN ][o.e.c.c.Coordinator      ] [rally-node-0] This node is a fully-formed single-node cluster with cluster UUID [o1zdVpJiSMOVfeSz9nye_A], but it is configured as if to discover other nodes and form a multi-node cluster via the [discovery.seed_hosts=[127.0.0.1]] setting. Fully-formed clusters do not attempt to discover other nodes, and nodes with different cluster UUIDs cannot belong to the same cluster. The cluster UUID persists across restarts and can only be changed by deleting the contents of the node's data path(s). Remove the discovery configuration to suppress this message.

This indicates that the discovery configuration is not correct, see elastic/elasticsearch#85222 for full details.

Steps to reproduce:

esrally race --track=geonames --challenge=append-no-conflicts-index-only --distribution-version=8.5.3
Inspect the corresponding Elasticsearch log

This can be fixed by not rendering the following line in rally-teams when there is only one node:

https://github.com/elastic/rally-teams/blob/74f96e3fab247e0c31d253408e4affd696a50eff/cars/v1/vanilla/templates/config/elasticsearch.yml#L72

However, as we pass the node ips as a pre-rendered string in the provisioner we cannot use Jinja's length filter to determine the number of items (would return the string length instead):

rally/esrally/mechanic/provisioner.py

Line 320 in 4c7141a

"all_node_ips": '["%s"]' % '","'.join(self.all_node_ips),

We could instead either provide the node ips as list or instead add an additional variable in the provisioner that contains the node count, which can then be used in the template in rally-teams. The latter is even backwards-compatible if we check for the existence of the template variable before evaluating it.

The text was updated successfully, but these errors were encountered:

inqueue · 2023-01-04T14:35:16Z

Thanks, @danielmitterdorfer. I will submit a PR proposal soon.

pquentin · 2023-01-18T12:36:14Z

This can be fixed by not rendering the following line in rally-teams when there is only one node:

We discovered the hard way that it's not about the number of nodes since elastic/rally-teams#77 broke our nightly benchmarks with this error:

[2023-01-18T11:56:56,877][INFO ][o.e.t.TransportService ] [rally-node-0] publish_address {192.168.20.29:9300}, bound_addresses {192.168.20.29:9300}
[2023-01-18T11:56:56,985][INFO ][o.e.b.BootstrapChecks ] [rally-node-0] bound or publishing to a non-loopback address, enforcing bootstrap checks
    bootstrap check failure [1] of [1]: the default discovery settings are unsuitable for production use; at least one of [discovery.seed_hosts, discovery.seed_providers, cluster.initial_master_nodes] must be configured

It took me a lot of time to realize that this was an error even though this is an INFO log. I reverted in elastic/rally-teams#78 until we find a better solution.

Indeed, the way Elasticsearch differentiates development vs. production mode is the address that Elasticsearch is bound to, not the number of nodes in the cluster: https://www.elastic.co/guide/en/elasticsearch/reference/8.6/bootstrap-checks.html#dev-vs-prod-mode. And on nightly benchmark that address is not localhost. So I suppose we could look at the ips and see if it's a loopback address, but we would have to get that check right (including 127.0.0.1, ::1, localhost and probably others? not sure what Elasticsearch does).

Or we could live with this warning.

danielmitterdorfer added bug Something's wrong :Benchmark Candidate Management Anything affecting how Rally sets up Elasticsearch good first issue Small, contained changes that are good for newcomers labels Jan 4, 2023

inqueue self-assigned this Jan 4, 2023

This was referenced Jan 4, 2023

Only set discovery seed hosts if greater than 1 elastic/rally-teams#77

Merged

Add a variable in the provisioner for the seed node count #1647

Merged

pquentin mentioned this issue Mar 2, 2023

Revert "Add a variable in the provisioner for the seed node count" #1680

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Single-node clusters use non-trivial discovery config when they shouldn't #1644

Single-node clusters use non-trivial discovery config when they shouldn't #1644

danielmitterdorfer commented Jan 4, 2023

inqueue commented Jan 4, 2023

pquentin commented Jan 18, 2023 •

edited

Loading

Single-node clusters use non-trivial discovery config when they shouldn't #1644

Single-node clusters use non-trivial discovery config when they shouldn't #1644

Comments

danielmitterdorfer commented Jan 4, 2023

inqueue commented Jan 4, 2023

pquentin commented Jan 18, 2023 • edited Loading

pquentin commented Jan 18, 2023 •

edited

Loading