Skip to content

Commit 699a9c7

Browse files
authored
Track query stats bug (#70273)
The query cache stats can currently report negative memory usage, which breaks REST responses for indices stats, node stats, etc, see #55434 as one such example. The reason why negative memory usage is reported is because of the following trappy calculation: https://github.com/elastic/elasticsearch/blob/1de0b616ebb92de6060510e92269ef87fc6a8649/server/src/main/java/org/elasticsearch/indices/IndicesQueryCache.java#L98-L101 - weight can be `Double.POSITIVE_INFINITY` when `totalSize == 0` and `stats.size() == 0` - if we then also have `sharedRamBytesUsed > 0`, then `additionalRamBytesUsed` will be `Long.MAX_VALUE` - if you then sum up multiple of these `Long.MAX_VALUE` values (one for each shard), this leads to an overflow, resulting in negative numbers as seen in #55434. The reason sharedRamBytesUsed can be > 0 when there are no shard stats at all is because of the memory occupied by `LRUQueryCache.uniqueQueries`, where the lifetime can extend even closing of shards (where they are removed from shardStats). Note that a workaround to the above bug is to [clear the cache](https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-clearcache.html), as it makes `sharedRamBytesUsed == 0`, see https://github.com/elastic/elasticsearch/blob/1de0b616ebb92de6060510e92269ef87fc6a8649/server/src/main/java/org/elasticsearch/indices/IndicesQueryCache.java#L168-L174 Closes #55434
1 parent 800414e commit 699a9c7

File tree

2 files changed

+16
-9
lines changed

2 files changed

+16
-9
lines changed

server/src/main/java/org/elasticsearch/indices/IndicesQueryCache.java

Lines changed: 14 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -89,17 +89,22 @@ public QueryCacheStats getStats(ShardId shard) {
8989
}
9090
shardStats.add(info);
9191

92-
// We also have some shared ram usage that we try to distribute to
93-
// proportionally to their number of cache entries of each shard
94-
long totalSize = 0;
95-
for (QueryCacheStats s : stats.values()) {
96-
totalSize += s.getCacheSize();
97-
}
98-
final double weight = totalSize == 0
92+
// We also have some shared ram usage that we try to distribute
93+
// proportionally to their number of cache entries of each shard.
94+
// Sometimes it's not possible to do this when there are no shard entries at all,
95+
// which can happen as the shared ram usage can extend beyond the closing of all shards.
96+
if (stats.isEmpty() == false) {
97+
long totalSize = 0;
98+
for (QueryCacheStats s : stats.values()) {
99+
totalSize += s.getCacheSize();
100+
}
101+
final double weight = totalSize == 0
99102
? 1d / stats.size()
100103
: ((double) shardStats.getCacheSize()) / totalSize;
101-
final long additionalRamBytesUsed = Math.round(weight * sharedRamBytesUsed);
102-
shardStats.add(new QueryCacheStats(additionalRamBytesUsed, 0, 0, 0, 0));
104+
final long additionalRamBytesUsed = Math.round(weight * sharedRamBytesUsed);
105+
assert additionalRamBytesUsed >= 0L : additionalRamBytesUsed;
106+
shardStats.add(new QueryCacheStats(additionalRamBytesUsed, 0, 0, 0, 0));
107+
}
103108
return shardStats;
104109
}
105110

server/src/test/java/org/elasticsearch/indices/IndicesQueryCacheTests.java

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -275,12 +275,14 @@ public void testTwoShards() throws IOException {
275275
assertEquals(0L, stats1.getCacheCount());
276276
assertEquals(0L, stats1.getHitCount());
277277
assertEquals(0L, stats1.getMissCount());
278+
assertEquals(0L, stats1.getMemorySizeInBytes());
278279

279280
stats2 = cache.getStats(shard2);
280281
assertEquals(0L, stats2.getCacheSize());
281282
assertEquals(0L, stats2.getCacheCount());
282283
assertEquals(0L, stats2.getHitCount());
283284
assertEquals(0L, stats2.getMissCount());
285+
assertEquals(0L, stats2.getMemorySizeInBytes());
284286

285287
cache.close(); // this triggers some assertions
286288
}

0 commit comments

Comments
 (0)