Skip to content

Framed thread pool utilization benchmark hacking#2

Merged
mhl-b merged 2 commits intomhl-b:framed-thread-pool-utilizationfrom
nicktindall:framed-thread-pool-utilization_bm
Jul 29, 2025
Merged

Framed thread pool utilization benchmark hacking#2
mhl-b merged 2 commits intomhl-b:framed-thread-pool-utilizationfrom
nicktindall:framed-thread-pool-utilization_bm

Conversation

@nicktindall
Copy link

@nicktindall nicktindall commented Jul 28, 2025

I micro-ized the benchmark to try and see what effect concurrency had on the time taken to call startTask, endTask and previousFrameTime concurrently from multiple threads.

Play around with

 ./gradlew -p benchmarks run --args 'ThreadPoolUtilizationBenchmark -t $NUM_THREADS'

I think it looks very cheap. If you turn on sampling you see some outliers in the order of 10ms with 12 threads.

Also the amount of contention is probably a lot more than what we'd see in the real world. I tried to add some "work" in between calls, that's what callIntervalTicks does. But even worst case we're only adding in 0.2ms work between calls.

I'm not sure what are the largest core counts we see but it probably makes sense to see what happens on a bigger machine, perhaps increasing callIntervalTicks to something representative (you can see how long it takes with baseline)

(12-thread run on my machine, you have to deduct the baseline for the corresponding callIntervalTicks)

Benchmark                                                      (callIntervalTicks)  (utilizationIntervalMs)  Mode  Cnt    Score    Error  Units
ThreadPoolUtilizationBenchmark.JustWrite                                         0                       10  avgt    5    1.819 ±  0.126  us/op
ThreadPoolUtilizationBenchmark.JustWrite                                     10000                       10  avgt    5   20.762 ±  0.927  us/op
ThreadPoolUtilizationBenchmark.JustWrite                                    100000                       10  avgt    5  202.935 ± 10.831  us/op
ThreadPoolUtilizationBenchmark.ReadAndWrite                                      0                       10  avgt    5    0.939 ±  0.252  us/op
ThreadPoolUtilizationBenchmark.ReadAndWrite:readPrevious                         0                       10  avgt    5    1.097 ±  0.387  us/op
ThreadPoolUtilizationBenchmark.ReadAndWrite:startAndStopTasks                    0                       10  avgt    5    0.780 ±  0.160  us/op
ThreadPoolUtilizationBenchmark.ReadAndWrite                                  10000                       10  avgt    5   20.605 ±  0.838  us/op
ThreadPoolUtilizationBenchmark.ReadAndWrite:readPrevious                     10000                       10  avgt    5   20.518 ±  1.397  us/op
ThreadPoolUtilizationBenchmark.ReadAndWrite:startAndStopTasks                10000                       10  avgt    5   20.693 ±  0.632  us/op
ThreadPoolUtilizationBenchmark.ReadAndWrite                                 100000                       10  avgt    5  201.309 ±  8.873  us/op
ThreadPoolUtilizationBenchmark.ReadAndWrite:readPrevious                    100000                       10  avgt    5  200.402 ±  9.548  us/op
ThreadPoolUtilizationBenchmark.ReadAndWrite:startAndStopTasks               100000                       10  avgt    5  202.216 ± 11.474  us/op
ThreadPoolUtilizationBenchmark.baseline                                          0                       10  avgt    5    0.002 ±  0.001  us/op
ThreadPoolUtilizationBenchmark.baseline                                      10000                       10  avgt    5   20.078 ±  0.845  us/op
ThreadPoolUtilizationBenchmark.baseline                                     100000                       10  avgt    5  201.026 ± 11.471  us/op

You can also run for specific callIntervalTicks. e.g.

./gradlew -p benchmarks run --args 'ThreadPoolUtilizationBenchmark -t 12 -pcallIntervalTicks=1000000'

@nicktindall nicktindall changed the base branch from main to framed-thread-pool-utilization July 28, 2025 09:00
@mhl-b mhl-b merged commit 872d7cd into mhl-b:framed-thread-pool-utilization Jul 29, 2025
4 checks passed
mhl-b pushed a commit that referenced this pull request Jul 29, 2025
…UpdateIT testDenseVectorMappingUpdate {initialType=flat updateType=bbq_disk #2} elastic#132130
mhl-b pushed a commit that referenced this pull request Jul 30, 2025
…UpdateIT testDenseVectorMappingUpdate {initialType=bbq_hnsw updateType=bbq_disk #2} elastic#132152
mhl-b pushed a commit that referenced this pull request Jul 30, 2025
…UpdateIT testDenseVectorMappingUpdate {initialType=bbq_flat updateType=bbq_disk #2} elastic#132184
mhl-b pushed a commit that referenced this pull request Jul 30, 2025
…UpdateIT testDenseVectorMappingUpdate {initialType=int8_flat updateType=bbq_disk #2} elastic#132189
mhl-b pushed a commit that referenced this pull request Jul 30, 2025
…UpdateIT testDenseVectorMappingUpdate {initialType=int8_hnsw updateType=bbq_disk #2} elastic#132213
mhl-b pushed a commit that referenced this pull request Jul 31, 2025
…UpdateIT testDenseVectorMappingUpdate {initialType=int4_hnsw updateType=bbq_disk #2} elastic#132228
mhl-b pushed a commit that referenced this pull request Aug 6, 2025
…UpdateIT testDenseVectorMappingUpdate {initialType=int4_flat updateType=bbq_disk #2} elastic#132234
@nicktindall nicktindall deleted the framed-thread-pool-utilization_bm branch September 3, 2025 04:20
mhl-b pushed a commit that referenced this pull request Jan 22, 2026
…tic#140027)

This PR fixes the issue where `INLINE STATS GROUP BY null` was being
incorrectly pruned by `PruneLeftJoinOnNullMatchingField`.

Fixes elastic#139887

## Problem For query:

```
FROM employees
| INLINE STATS c = COUNT(*) BY n = null
| KEEP c, n
| LIMIT 3
```

During `LogicalPlanOptimizer`:

```
Limit[3[INTEGER],false,false]
\_EsqlProject[[c{r}#2, n{r}elastic#4]]
  \_InlineJoin[LEFT,[n{r}elastic#4],[n{r}elastic#4]]
    |_Eval[[null[NULL] AS n#4]]
    | \_EsRelation[employees][<no-fields>{r$}elastic#7]
    \_Aggregate[[n{r}elastic#4],[COUNT(*[KEYWORD],true[BOOLEAN],PT0S[TIME_DURATION]) AS c#2, n{r}elastic#4]]
      \_StubRelation[[<no-fields>{r$}elastic#7, n{r}elastic#4]]
```

The following join node:

```
InlineJoin[LEFT,[n{r}elastic#4],[n{r}elastic#4]]
|_Eval[[null[NULL] AS n#4]]
| \_EsRelation[employees][<no-fields>{r$}elastic#7]
\_Aggregate[[n{r}elastic#4],[COUNT(*[KEYWORD],true[BOOLEAN],PT0S[TIME_DURATION]) AS c#2, n{r}elastic#4]]
  \_StubRelation[[<no-fields>{r$}elastic#7, n{r}elastic#4]]
```

should NOT have `PruneLeftJoinOnNullMatchingField` applied, because the
right side is an `Aggregate` (originating from `INLINE STATS`). Since
`STATS` supports `GROUP BY null`, the join key being null is a valid use
case. Pruning this join would incorrectly eliminate the aggregation
results, changing the query semantics.

During `LocalLogicalPlanOptimizer`:

```
ProjectExec[[c{r}#2, n{r}elastic#4]]
\_LimitExec[3[INTEGER],null]
  \_ExchangeExec[[c{r}#2, n{r}elastic#4],false]
    \_FragmentExec[filter=null, estimatedRowSize=0, reducer=[], fragment=[<>
Project[[c{r}#2, n{r}elastic#4]]
\_Limit[3[INTEGER],false,false]
  \_InlineJoin[LEFT,[n{r}elastic#4],[n{r}elastic#4]]
    |_Eval[[null[NULL] AS n#4]]
    | \_EsRelation[employees][<no-fields>{r$}elastic#7]
    \_LocalRelation[[c{r}#2, n{r}elastic#4],Page{blocks=[LongVectorBlock[vector=ConstantLongVector[positions=1, value=100]], ConstantNullBlock[positions=1]]}]<>]]
```

The following join node:

```
InlineJoin[LEFT,[n{r}elastic#4],[n{r}elastic#4]]
|_Eval[[null[NULL] AS n#4]]
| \_EsRelation[employees][<no-fields>{r$}elastic#7]
\_LocalRelation[[c{r}#2, n{r}elastic#4],Page{blocks=[LongVectorBlock[vector=ConstantLongVector[positions=1, value=100]], ConstantNullBlock[positions=1]]}]
```

should NOT have `PruneLeftJoinOnNullMatchingField` applied, because the
right side is a `LocalRelation` (the `Aggregate` was optimized into a
`LocalRelation` containing the pre-computed aggregation results).
Pruning this join when the join key is null would discard the valid
aggregation results stored in the `LocalRelation`, incorrectly producing
null values instead of the expected count.

## Solution The fix ensures that `PruneLeftJoinOnNullMatchingField` only
applies to `LOOKUP JOIN` nodes, where `join.right()` is an `EsRelation`.
For `INLINE STATS` joins, the right side can be:

 - `Aggregate` (before optimization), or
 - `LocalRelation` (after the aggregate is optimized)

By checking `join.right() instanceof EsRelation`, we correctly skip the
pruning optimization for `INLINE STATS` joins, preserving the expected
query results when grouping by null.
mhl-b pushed a commit that referenced this pull request Feb 2, 2026
…tic#140027) (elastic#141095)

This PR fixes the issue where `INLINE STATS GROUP BY null` was being
incorrectly pruned by `PruneLeftJoinOnNullMatchingField`.

Fixes elastic#139887

## Problem For query:

```
FROM employees
| INLINE STATS c = COUNT(*) BY n = null
| KEEP c, n
| LIMIT 3
```

During `LogicalPlanOptimizer`:

```
Limit[3[INTEGER],false,false]
\_EsqlProject[[c{r}#2, n{r}elastic#4]]
  \_InlineJoin[LEFT,[n{r}elastic#4],[n{r}elastic#4]]
    |_Eval[[null[NULL] AS n#4]]
    | \_EsRelation[employees][<no-fields>{r$}elastic#7]
    \_Aggregate[[n{r}elastic#4],[COUNT(*[KEYWORD],true[BOOLEAN],PT0S[TIME_DURATION]) AS c#2, n{r}elastic#4]]
      \_StubRelation[[<no-fields>{r$}elastic#7, n{r}elastic#4]]
```

The following join node:

```
InlineJoin[LEFT,[n{r}elastic#4],[n{r}elastic#4]]
|_Eval[[null[NULL] AS n#4]]
| \_EsRelation[employees][<no-fields>{r$}elastic#7]
\_Aggregate[[n{r}elastic#4],[COUNT(*[KEYWORD],true[BOOLEAN],PT0S[TIME_DURATION]) AS c#2, n{r}elastic#4]]
  \_StubRelation[[<no-fields>{r$}elastic#7, n{r}elastic#4]]
```

should NOT have `PruneLeftJoinOnNullMatchingField` applied, because the
right side is an `Aggregate` (originating from `INLINE STATS`). Since
`STATS` supports `GROUP BY null`, the join key being null is a valid use
case. Pruning this join would incorrectly eliminate the aggregation
results, changing the query semantics.

During `LocalLogicalPlanOptimizer`:

```
ProjectExec[[c{r}#2, n{r}elastic#4]]
\_LimitExec[3[INTEGER],null]
  \_ExchangeExec[[c{r}#2, n{r}elastic#4],false]
    \_FragmentExec[filter=null, estimatedRowSize=0, reducer=[], fragment=[<>
Project[[c{r}#2, n{r}elastic#4]]
\_Limit[3[INTEGER],false,false]
  \_InlineJoin[LEFT,[n{r}elastic#4],[n{r}elastic#4]]
    |_Eval[[null[NULL] AS n#4]]
    | \_EsRelation[employees][<no-fields>{r$}elastic#7]
    \_LocalRelation[[c{r}#2, n{r}elastic#4],Page{blocks=[LongVectorBlock[vector=ConstantLongVector[positions=1, value=100]], ConstantNullBlock[positions=1]]}]<>]]
```

The following join node:

```
InlineJoin[LEFT,[n{r}elastic#4],[n{r}elastic#4]]
|_Eval[[null[NULL] AS n#4]]
| \_EsRelation[employees][<no-fields>{r$}elastic#7]
\_LocalRelation[[c{r}#2, n{r}elastic#4],Page{blocks=[LongVectorBlock[vector=ConstantLongVector[positions=1, value=100]], ConstantNullBlock[positions=1]]}]
```

should NOT have `PruneLeftJoinOnNullMatchingField` applied, because the
right side is a `LocalRelation` (the `Aggregate` was optimized into a
`LocalRelation` containing the pre-computed aggregation results).
Pruning this join when the join key is null would discard the valid
aggregation results stored in the `LocalRelation`, incorrectly producing
null values instead of the expected count.

## Solution The fix ensures that `PruneLeftJoinOnNullMatchingField` only
applies to `LOOKUP JOIN` nodes, where `join.right()` is an `EsRelation`.
For `INLINE STATS` joins, the right side can be:

 - `Aggregate` (before optimization), or
 - `LocalRelation` (after the aggregate is optimized)

By checking `join.right() instanceof EsRelation`, we correctly skip the
pruning optimization for `INLINE STATS` joins, preserving the expected
query results when grouping by null.

(cherry picked from commit f3ccb70)

Co-authored-by: kanoshiou <uiaao@tuta.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants