Skip to content

Commit 5fcf5e6

Browse files
committed
Further improve docs for requests_per_second
In #26185 we made the description of `requests_per_second` sane for reindex. This improves on the description by using some more common vocabulary ("batch size", etc) and improving the formatting of the example calculation so it stands out and doesn't require scrolling.
1 parent 01fdbfe commit 5fcf5e6

File tree

3 files changed

+53
-27
lines changed

3 files changed

+53
-27
lines changed

docs/reference/docs/delete-by-query.asciidoc

Lines changed: 19 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -164,14 +164,25 @@ shards to become available. Both work exactly how they work in the
164164
<<docs-bulk,Bulk API>>.
165165

166166
`requests_per_second` can be set to any positive decimal number (`1.4`, `6`,
167-
`1000`, etc) and throttles the number of requests per second that the delete-by-query
168-
issues or it can be set to `-1` to disabled throttling. The throttling is done
169-
waiting between bulk batches so that it can manipulate the scroll timeout. The
170-
wait time is the difference between the time it took the batch to complete and
171-
the time `requests_per_second * requests_in_the_batch`. Since the batch isn't
172-
broken into multiple bulk requests large batch sizes will cause Elasticsearch
173-
to create many requests and then wait for a while before starting the next set.
174-
This is "bursty" instead of "smooth". The default is `-1`.
167+
`1000`, etc) and throttles rate at which `_delete_by_query` issues batches of
168+
delete operations by padding each batch with a wait time. The throttling can be
169+
disabled by setting `requests_per_second` to `-1`.
170+
171+
The throttling is done by waiting between batches so that scroll that
172+
`_delete_by_query` uses internally can be given a timeout that takes into
173+
account the padding. The padding time is the difference between the batch size
174+
divided by the `requests_per_second` and the time spent writing. By default the
175+
batch size is `1000`, so if the `requests_per_second` is set to `500`:
176+
177+
[source,txt]
178+
--------------------------------------------------
179+
target_time = 1000 / 500 per second = 2 seconds
180+
wait_time = target_time - write_time = 2 seconds - .5 seconds = 1.5 seconds
181+
--------------------------------------------------
182+
183+
Since the batch is issued as a single `_bulk` request large batch sizes will
184+
cause Elasticsearch to create many requests and then wait for a while before
185+
starting the next set. This is "bursty" instead of "smooth". The default is `-1`.
175186

176187
[float]
177188
=== Response body

docs/reference/docs/reindex.asciidoc

Lines changed: 15 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -534,20 +534,24 @@ shards to become available. Both work exactly how they work in the
534534
<<docs-bulk,Bulk API>>.
535535

536536
`requests_per_second` can be set to any positive decimal number (`1.4`, `6`,
537-
`1000`, etc) and throttles the number of batches that the reindex issues by
538-
padding each batch with a wait time. The throttling can be disabled by
539-
setting `requests_per_second` to `-1`.
537+
`1000`, etc) and throttles rate at which reindex issues batches of index
538+
operations by padding each batch with a wait time. The throttling can be
539+
disabled by setting `requests_per_second` to `-1`.
540540

541-
The throttling is done waiting between bulk batches so that it can manipulate the
542-
scroll timeout. The wait time is the difference between the request scroll search
543-
size divided by the `requests_per_second` and the `batch_write_time`. By default
544-
the scroll batch size is `1000`, so if the `requests_per_second` is set to `500`:
541+
The throttling is done by waiting between batches so that scroll that reindex
542+
uses internally can be given a timeout that takes into account the padding.
543+
The padding time is the difference between the batch size divided by the
544+
`requests_per_second` and the time spent writing. By default the batch size is
545+
`1000`, so if the `requests_per_second` is set to `500`:
545546

546-
`target_total_time` = `1000` / `500 per second` = `2 seconds` +
547-
`wait_time` = `target_total_time` - `batch_write_time` = `2 seconds` - `.5 seconds` = `1.5 seconds`
547+
[source,txt]
548+
--------------------------------------------------
549+
target_time = 1000 / 500 per second = 2 seconds
550+
wait_time = target_time - write_time = 2 seconds - .5 seconds = 1.5 seconds
551+
--------------------------------------------------
548552

549-
Since the batch isn't broken into multiple bulk requests large batch sizes will
550-
cause Elasticsearch to create many requests and then wait for a while before
553+
Since the batch is issued as a single `_bulk` request large batch sizes will
554+
cause Elasticsearch to create many requests and then wait for a while before
551555
starting the next set. This is "bursty" instead of "smooth". The default is `-1`.
552556

553557
[float]

docs/reference/docs/update-by-query.asciidoc

Lines changed: 19 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -221,14 +221,25 @@ shards to become available. Both work exactly how they work in the
221221
<<docs-bulk,Bulk API>>.
222222

223223
`requests_per_second` can be set to any positive decimal number (`1.4`, `6`,
224-
`1000`, etc) and throttles the number of requests per second that the update-by-query
225-
issues or it can be set to `-1` to disabled throttling. The throttling is done
226-
waiting between bulk batches so that it can manipulate the scroll timeout. The
227-
wait time is the difference between the time it took the batch to complete and
228-
the time `requests_per_second * requests_in_the_batch`. Since the batch isn't
229-
broken into multiple bulk requests large batch sizes will cause Elasticsearch
230-
to create many requests and then wait for a while before starting the next set.
231-
This is "bursty" instead of "smooth". The default is `-1`.
224+
`1000`, etc) and throttles rate at which `_update_by_query` issues batches of
225+
index operations by padding each batch with a wait time. The throttling can be
226+
disabled by setting `requests_per_second` to `-1`.
227+
228+
The throttling is done by waiting between batches so that scroll that
229+
`_update_by_query` uses internally can be given a timeout that takes into
230+
account the padding. The padding time is the difference between the batch size
231+
divided by the `requests_per_second` and the time spent writing. By default the
232+
batch size is `1000`, so if the `requests_per_second` is set to `500`:
233+
234+
[source,txt]
235+
--------------------------------------------------
236+
target_time = 1000 / 500 per second = 2 seconds
237+
wait_time = target_time - delete_time = 2 seconds - .5 seconds = 1.5 seconds
238+
--------------------------------------------------
239+
240+
Since the batch is issued as a single `_bulk` request large batch sizes will
241+
cause Elasticsearch to create many requests and then wait for a while before
242+
starting the next set. This is "bursty" instead of "smooth". The default is `-1`.
232243

233244
[float]
234245
[[docs-update-by-query-response-body]]

0 commit comments

Comments
 (0)