Skip to content

Improve reindex rethrottle API in stateless#143771

Merged
PeteGillinElastic merged 8 commits intoelastic:mainfrom
PeteGillinElastic:reindex-rethrottle-stateless-response
Mar 10, 2026
Merged

Improve reindex rethrottle API in stateless#143771
PeteGillinElastic merged 8 commits intoelastic:mainfrom
PeteGillinElastic:reindex-rethrottle-stateless-response

Conversation

@PeteGillinElastic
Copy link
Copy Markdown
Member

@PeteGillinElastic PeteGillinElastic commented Mar 6, 2026

This makes a number of improvements to the reindex rethrottle API, in stateless only, in preparation for making it public in serverless as part of the reindex managament API work.

The changes are:

  1. The group_by request parameter is no longer supported. This was never very useful in this API, since the ListTasksResponse will only every contain one task.

  2. The API never groups the tasks in the response, i.e. it acts as though group_by=none (contrast with stateful, which defaults to group_by=nodes). Again, grouping is not useful in this case. This also means that it omits the node information which would be present with group_by=nodes, and which we do not want to expose in serverless.

The API is unchanged in stateful, for backwards compatiblity reasons.

Implementation note: This change is done in the REST layer rather, because the group_by parameter is only entirely implemented in that layer. To implement the changes in the transport layer would mean passing the requested group_by from the REST layer to the transport layer for validation, and passing the group_by to use from the transport layer back to the REST layer.

Testing note: Adding a YAML REST test for this would require a whole new base class, and a whole new cluster with the stateless setting enabled. This seems unnecessarily heavyweight. Instead, a unit test for the REST action is added. This uses a real RestController, a fake RestChannel, and a fake NodeClient (i.e. the transport layer is faked out).

@PeteGillinElastic PeteGillinElastic force-pushed the reindex-rethrottle-stateless-response branch 3 times, most recently from b67599e to aeb370b Compare March 7, 2026 11:28
PeteGillinElastic added a commit to elastic/elasticsearch-specification that referenced this pull request Mar 9, 2026
The API spec failed to reflect the fact that this API accepts the
`group_by` parameter (for historical reasons: it doesn't actually make
much sense), and doesn't include all the possible elements which could
be present in the response.

The `group_by` parameter is marked as stack-only, as
elastic/elasticsearch#143771 plans to block it
in serverless (which is okay, as it is currently internal-only).

The changes to the response follow the pattern used by the list tasks
API (which is what this is using under the hood):
https://github.com/elastic/elasticsearch-specification/blob/main/specification/tasks/_types/TaskListResponseBase.ts
.
@PeteGillinElastic PeteGillinElastic added >non-issue :Distributed/Reindex Issues relating to reindex that are not caused by issues further down labels Mar 9, 2026
This makes a number of improvements to the reindex rethrottle API, in
stateless only, in preparation for making it public in serverless as
part of the reindex managament API work.

The changes are:

 1. The `group_by` request parameter is no longer supported. This was
    never very useful in this API, since the `ListTasksResponse` will
    only every contain one task.

 2. The API never groups the tasks in the response, i.e. it acts as
    though `group_by=none` (contrast with stateful, which defaults to
    `group_by=nodes`). Again, grouping is not useful in this
    case. This also means that it omits the node information which
    would be present with `group_by=nodes`, and which we do not want
    to expose in serverless.

 3. The `node` property of the task in the response is redacted with
    `stateless` instead of giving the node ID. (The get task API
    behaves similarly in serverless.)

The API is unchanged in stateful, for backwards compatiblity reasons.

Implementation note: This change is done in the REST layer rather,
because the `group_by` parameter is only entirely implemented in that
layer. To implement changes 1 and 2 in the transport layer would mean
passing the requested `group_by` from the REST layer to the transport
layer for validation, and passing the `group_by` to use from the
transport layer back to the REST layer. Although change 3 could be
done in the transport layer, it seems neater to keep the three changes
together.

Testing note: Adding a YAML REST test for this would require a whole
new base class, and a whole new cluster with the stateless setting
enabled. This seems unnecessarily heavyweight. Instead, a unit test
for the REST action is added. This uses a real `RestController`, a
fake `RestChannel`, and a fake `NodeClient` (i.e. the transport layer
is faked out).
@PeteGillinElastic PeteGillinElastic force-pushed the reindex-rethrottle-stateless-response branch from aeb370b to dd2707b Compare March 9, 2026 13:57
@PeteGillinElastic PeteGillinElastic marked this pull request as ready for review March 9, 2026 13:57
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Meta label for distributed team. label Mar 9, 2026
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@ServerlessScope(Scope.INTERNAL)
public class RestReindexRethrottleAction extends BaseRestHandler {

static final String REDACTED_NODE_ID_IN_STATELESS = "stateless";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we generally redact node IDs for stateless, stateless could also be on-prem, right? I thought stateless is more about decoupling storage and compute with external object store, and hiding node IDs is only relevant to serverless.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question. I don't actually know which bits of the redaction that we do over in the serverless repo will get applied to stateless-on-prem. I certainly don't think we need to include the node ID in the reindex rethrottle response (why should the user care?) but it would be better to be consistent.

So I guess I can move the redaction of the node ID into serverless, where the equivalent for get tasks lives.

The change to not force group_by to "none" has to be done here, because the serverless filtering only happens at in transport layer (where there is no concept of group_by) but I think that's fine. That change is actually a better API (group_by makes no sense for this API which can only ever return a single task) and we're only not making the change in stateful for BWC reasons, and stateless-on-prem doesn't need to be BWC.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving the redaction to serverless sounds good to me.

Copy link
Copy Markdown
Contributor

@samxbr samxbr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@elasticsearchmachine elasticsearchmachine added the serverless-linked Added by automation, don't add manually label Mar 9, 2026
@PeteGillinElastic PeteGillinElastic merged commit 0888482 into elastic:main Mar 10, 2026
36 checks passed
@PeteGillinElastic PeteGillinElastic deleted the reindex-rethrottle-stateless-response branch March 10, 2026 12:21
PeteGillinElastic added a commit to elastic/elasticsearch-specification that referenced this pull request Mar 10, 2026
The API spec failed to reflect the fact that this API accepts the
`group_by` parameter (for historical reasons: it doesn't actually make
much sense), and doesn't include all the possible elements which could
be present in the response.

The `group_by` parameter is marked as stack-only, as
elastic/elasticsearch#143771 plans to block it
in serverless (which is okay, as it is currently internal-only).

The changes to the response follow the pattern used by the list tasks
API (which is what this is using under the hood):
https://github.com/elastic/elasticsearch-specification/blob/main/specification/tasks/_types/TaskListResponseBase.ts
.
szybia added a commit to szybia/elasticsearch that referenced this pull request Mar 10, 2026
…locations

* upstream/main: (126 commits)
  Update KnnIndexTester to use more settings from datasets (elastic#143869)
  fix: dynamic template vector array is overridden by automatic dense_vector mapping (elastic#143733)
  ES|QL: Don't reuse the same alias for _fork column (elastic#143909)
  Close and initialize clients after each node upgrade in logsdb rolling upgrade tests. (elastic#143823)
  ESQL: Added GroupedTopNOperator for LIMIT BY, compute only (elastic#143476)
  Handle views in ResolveIndexAction (elastic#143561)
  Improve reindex rethrottle API in stateless (elastic#143771)
  Use a copy of the SearchExecutionContext for each Percolator execution (elastic#142765)
  Log the stacktrace when we encounter a deprecation warning for `default_metric` (elastic#143929)
  ESQL: evaluate ReferenceAttributes to potentially FieldAttributes for full-text functions restriction (elastic#143893)
  Add ClusterStateSerializationStats Serializatation Tests (elastic#142703)
  Adds Coordination Diagnostics Tests (elastic#142709)
  Upgrade Elasticsearch to Apache Lucene 10.4 (elastic#141882)
  ESQL: Add configurable bracket-based multi-value support for CSV reader (elastic#143890)
  time series es819 binary dv use up to a 1mb block size (elastic#143049)
  Dynamically enable / disable plugins in correspondence to stateless mode. (elastic#142147)
  ES|QL: Implement first/last_over_time for tdigest (elastic#143832)
  Document CHANGE_POINT limitation (elastic#143877)
  Fix OperationsOnSeqNoDisabledIndicesIT (elastic#143892)
  [Test] Test that sequence numbers are not pruned with retention lease (elastic#143825)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed/Reindex Issues relating to reindex that are not caused by issues further down >non-issue serverless-linked Added by automation, don't add manually Team:Distributed Meta label for distributed team. v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants