Skip to content

ARS implementation inverts the priority of servicetime #65838

@robertclancy

Description

@robertclancy

Elasticsearch version (bin/elasticsearch --version): commit c3ab6f7

Plugins installed: []

JVM version (java -version):

openjdk version "12.0.2" 2019-07-16
OpenJDK Runtime Environment AdoptOpenJDK (build 12.0.2+10)
OpenJDK 64-Bit Server VM AdoptOpenJDK (build 12.0.2+10, mixed mode)

OS version (uname -a if on a Unix-like system):

Darwin Roberts-MacBook-Pro.local 19.6.0 Darwin Kernel Version 19.6.0: Thu Oct 29 22:56:45 PDT 2020; root:xnu-6153.141.2.2~1/RELEASE_X86_64 x86_64

Description of the problem including expected versus actual behavior:

The implementation of the C3 algorithm uses 1/serviceTime in place of serviceTime.

i.e.


should be

double muBarS = FACTOR / serviceTime;

The confusion derives from the fact that the paper in question uses the notation mu_s to denote the service rate, which is the inverse of the service-time.

This results in incorrect choices of the fastest node.

Steps to reproduce:

For example, if you apply this change to tests:

diff --git a/server/src/test/java/org/elasticsearch/cluster/routing/OperationRoutingTests.java b/server/src/test/java/org/elasticsearch/cluster/routing/OperationRoutingTests.java
index c05d0a551fe..f2bf47e49b2 100644
--- a/server/src/test/java/org/elasticsearch/cluster/routing/OperationRoutingTests.java
+++ b/server/src/test/java/org/elasticsearch/cluster/routing/OperationRoutingTests.java
@@ -571,8 +571,8 @@ public class OperationRoutingTests extends ESTestCase{
         assertThat(searchedShards.size(), equalTo(3));
 
         // Now let's start adding node metrics, since that will affect which node is chosen
-        collector.addNodeStatistics("node_0", 2, TimeValue.timeValueMillis(200).nanos(), TimeValue.timeValueMillis(150).nanos());
-        collector.addNodeStatistics("node_1", 1, TimeValue.timeValueMillis(100).nanos(), TimeValue.timeValueMillis(50).nanos());
+        collector.addNodeStatistics("node_0", 1, TimeValue.timeValueMillis(200).nanos(), TimeValue.timeValueMillis(150).nanos());
+        collector.addNodeStatistics("node_1", 1, TimeValue.timeValueMillis(200).nanos(), TimeValue.timeValueMillis(50).nanos());
         collector.addNodeStatistics("node_2", 1, TimeValue.timeValueMillis(200).nanos(), TimeValue.timeValueMillis(200).nanos());
         outstandingRequests.put("node_0", 1L);
         outstandingRequests.put("node_1", 1L);

You can see that with the same number of queued requests and the same client response time for each node, the algorithm should choose the node with the lowest service time (i.e. node 1), however the test fails because the current implementation chooses node_2 (the slowest).

Provide logs (if relevant):

Metadata

Metadata

Assignees

Labels

:Search/SearchSearch-related issues that do not fall into other categories>bugTeam:SearchMeta label for search team

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions