11[[docs-bulk]]
22=== Bulk API
3+ ++++
4+ <titleabbrev>Bulk</titleabbrev>
5+ ++++
36
4- The bulk API makes it possible to perform many index/delete operations
5- in a single API call. This can greatly increase the indexing speed.
7+ Performs multiple indexing or delete operations in a single API call.
8+ This reduces overhead and can greatly increase indexing speed.
69
7- .Client support for bulk requests
8- *********************************************
9-
10- Some of the officially supported clients provide helpers to assist with
11- bulk requests and reindexing of documents from one index to another:
10+ [source,console]
11+ --------------------------------------------------
12+ POST _bulk
13+ { "index" : { "_index" : "test", "_id" : "1" } }
14+ { "field1" : "value1" }
15+ { "delete" : { "_index" : "test", "_id" : "2" } }
16+ { "create" : { "_index" : "test", "_id" : "3" } }
17+ { "field1" : "value3" }
18+ { "update" : {"_id" : "1", "_index" : "test"} }
19+ { "doc" : {"field2" : "value2"} }
20+ --------------------------------------------------
1221
13- Perl::
22+ [[docs-bulk-api-request]]
23+ ==== {api-request-title}
1424
15- See https://metacpan.org/pod/Search::Elasticsearch::Client::5_0::Bulk[Search::Elasticsearch::Client::5_0::Bulk]
16- and https://metacpan.org/pod/Search::Elasticsearch::Client::5_0::Scroll[Search::Elasticsearch::Client::5_0::Scroll]
25+ `POST /_bulk`
1726
18- Python::
27+ `POST /<index>/_bulk`
1928
20- See http://elasticsearch-py.readthedocs.org/en/master/helpers.html[elasticsearch.helpers.*]
29+ [[docs-bulk-api-desc]]
30+ ==== {api-description-title}
2131
22- *********************************************
32+ Provides a way to perform multiple `index`, `create`, `delete`, and `update` actions in a single request.
2333
24- The REST API endpoint is `/_bulk`, and it expects the following newline delimited JSON
25- (NDJSON) structure:
34+ The actions are specified in the request body using a newline delimited JSON (NDJSON) structure:
2635
2736[source,js]
2837--------------------------------------------------
@@ -36,19 +45,70 @@ optional_source\n
3645--------------------------------------------------
3746// NOTCONSOLE
3847
39- *NOTE*: The final line of data must end with a newline character `\n`. Each newline character
40- may be preceded by a carriage return `\r`. When sending requests to this endpoint the
41- `Content-Type` header should be set to `application/x-ndjson`.
48+ The `index` and `create` actions expect a source on the next line,
49+ and have the same semantics as the `op_type` parameter in the standard index API:
50+ create fails if a document with the same name already exists in the index,
51+ index adds or replaces a document as necessary.
52+
53+ `update` expects that the partial doc, upsert,
54+ and script and its options are specified on the next line.
55+
56+ `delete` does not expect a source on the next line and
57+ has the same semantics as the standard delete API.
58+
59+ [NOTE]
60+ ====
61+ The final line of data must end with a newline character `\n`.
62+ Each newline character may be preceded by a carriage return `\r`.
63+ When sending requests to the `_bulk` endpoint,
64+ the `Content-Type` header should be set to `application/x-ndjson`.
65+ ====
66+
67+ Because this format uses literal `\n`'s as delimiters,
68+ make sure that the JSON actions and sources are not pretty printed.
69+
70+ If you specify an index in the request URI,
71+ it is used for any actions that don't explicitly specify an index.
72+
73+ A note on the format: The idea here is to make processing of this as
74+ fast as possible. As some of the actions are redirected to other
75+ shards on other nodes, only `action_meta_data` is parsed on the
76+ receiving node side.
77+
78+ Client libraries using this protocol should try and strive to do
79+ something similar on the client side, and reduce buffering as much as
80+ possible.
81+
82+ The response to a bulk action is a large JSON structure with
83+ the individual results of each action performed,
84+ in the same order as the actions that appeared in the request.
85+ The failure of a single action does not affect the remaining actions.
86+
87+ There is no "correct" number of actions to perform in a single bulk request.
88+ Experiment with different settings to find the optimal size for your particular workload.
89+
90+ When using the HTTP API, make sure that the client does not send HTTP chunks,
91+ as this will slow things down.
92+
93+ [float]
94+ [[bulk-clients]]
95+ ===== Client support for bulk requests
96+
97+ Some of the officially supported clients provide helpers to assist with
98+ bulk requests and reindexing of documents from one index to another:
99+
100+ Perl::
101+
102+ See https://metacpan.org/pod/Search::Elasticsearch::Client::5_0::Bulk[Search::Elasticsearch::Client::5_0::Bulk]
103+ and https://metacpan.org/pod/Search::Elasticsearch::Client::5_0::Scroll[Search::Elasticsearch::Client::5_0::Scroll]
104+
105+ Python::
106+
107+ See http://elasticsearch-py.readthedocs.org/en/master/helpers.html[elasticsearch.helpers.*]
42108
43- The possible actions are `index`, `create`, `delete`, and `update`.
44- `index` and `create` expect a source on the next
45- line, and have the same semantics as the `op_type` parameter to the
46- standard index API (i.e. create will fail if a document with the same
47- index exists already, whereas index will add or replace a
48- document as necessary). `delete` does not expect a source on the
49- following line, and has the same semantics as the standard delete API.
50- `update` expects that the partial doc, upsert and script and its options
51- are specified on the next line.
109+ [float]
110+ [[bulk-curl]]
111+ ===== Submitting bulk requests with cURL
52112
53113If you're providing text file input to `curl`, you *must* use the
54114`--data-binary` flag instead of plain `-d`. The latter doesn't preserve
@@ -65,9 +125,97 @@ $ curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/_bulk --
65125// NOTCONSOLE
66126// Not converting to console because this shows how curl works
67127
68- Because this format uses literal `\n`'s as delimiters, please be sure
69- that the JSON actions and sources are not pretty printed. Here is an
70- example of a correct sequence of bulk commands:
128+ [float]
129+ [[bulk-optimistic-concurrency-control]]
130+ ===== Optimistic Concurrency Control
131+
132+ Each `index` and `delete` action within a bulk API call may include the
133+ `if_seq_no` and `if_primary_term` parameters in their respective action
134+ and meta data lines. The `if_seq_no` and `if_primary_term` parameters control
135+ how operations are executed, based on the last modification to existing
136+ documents. See <<optimistic-concurrency-control>> for more details.
137+
138+
139+ [float]
140+ [[bulk-versioning]]
141+ ===== Versioning
142+
143+ Each bulk item can include the version value using the
144+ `version` field. It automatically follows the behavior of the
145+ index / delete operation based on the `_version` mapping. It also
146+ support the `version_type` (see <<index-versioning, versioning>>).
147+
148+ [float]
149+ [[bulk-routing]]
150+ ===== Routing
151+
152+ Each bulk item can include the routing value using the
153+ `routing` field. It automatically follows the behavior of the
154+ index / delete operation based on the `_routing` mapping.
155+
156+ [float]
157+ [[bulk-wait-for-active-shards]]
158+ ===== Wait For Active Shards
159+
160+ When making bulk calls, you can set the `wait_for_active_shards`
161+ parameter to require a minimum number of shard copies to be active
162+ before starting to process the bulk request. See
163+ <<index-wait-for-active-shards,here>> for further details and a usage
164+ example.
165+
166+ [float]
167+ [[bulk-refresh]]
168+ ===== Refresh
169+
170+ Control when the changes made by this request are visible to search. See
171+ <<docs-refresh,refresh>>.
172+
173+ NOTE: Only the shards that receive the bulk request will be affected by
174+ `refresh`. Imagine a `_bulk?refresh=wait_for` request with three
175+ documents in it that happen to be routed to different shards in an index
176+ with five shards. The request will only wait for those three shards to
177+ refresh. The other two shards that make up the index do not
178+ participate in the `_bulk` request at all.
179+
180+ [float]
181+ [[bulk-security]]
182+ ===== Security
183+
184+ See <<url-access-control>>.
185+
186+ [float]
187+ [[bulk-partial-responses]]
188+ ===== Partial responses
189+ To ensure fast responses, the bulk API will respond with partial results if one or more shards fail.
190+ See <<shard-failures, Shard failures>> for more information.
191+
192+ [[docs-bulk-api-path-params]]
193+ ==== {api-path-parms-title}
194+
195+ `<index>`::
196+ (Optional, string) Name of the index to perform the bulk actions against.
197+
198+ [[docs-bulk-api-query-params]]
199+ ==== {api-query-parms-title}
200+
201+ include::{docdir}/rest-api/common-parms.asciidoc[tag=pipeline]
202+
203+ include::{docdir}/rest-api/common-parms.asciidoc[tag=refresh]
204+
205+ include::{docdir}/rest-api/common-parms.asciidoc[tag=routing]
206+
207+ include::{docdir}/rest-api/common-parms.asciidoc[tag=source]
208+
209+ include::{docdir}/rest-api/common-parms.asciidoc[tag=source_excludes]
210+
211+ include::{docdir}/rest-api/common-parms.asciidoc[tag=source_includes]
212+
213+ include::{docdir}/rest-api/common-parms.asciidoc[tag=timeout]
214+
215+ include::{docdir}/rest-api/common-parms.asciidoc[tag=wait_for_active_shards]
216+
217+ [[docs-bulk-api-example]]
218+ ==== {api-examples-title}
71219
72220[source,console]
73221--------------------------------------------------
@@ -81,7 +229,7 @@ POST _bulk
81229{ "doc" : {"field2" : "value2"} }
82230--------------------------------------------------
83231
84- The result of this bulk operation is :
232+ The API returns the following result :
85233
86234[source,console-result]
87235--------------------------------------------------
@@ -171,85 +319,9 @@ The result of this bulk operation is:
171319// TESTRESPONSE[s/"_seq_no" : 3/"_seq_no" : $body.items.3.update._seq_no/]
172320// TESTRESPONSE[s/"_primary_term" : 4/"_primary_term" : $body.items.3.update._primary_term/]
173321
174- The endpoints are `/_bulk` and `/{index}/_bulk`. When the index is provided, it
175- will be used by default on bulk items that don't provide it explicitly.
176-
177- A note on the format. The idea here is to make processing of this as
178- fast as possible. As some of the actions will be redirected to other
179- shards on other nodes, only `action_meta_data` is parsed on the
180- receiving node side.
181-
182- Client libraries using this protocol should try and strive to do
183- something similar on the client side, and reduce buffering as much as
184- possible.
185-
186- The response to a bulk action is a large JSON structure with the individual
187- results of each action that was performed in the same order as the actions that
188- appeared in the request. The failure of a single action does not affect the
189- remaining actions.
190-
191- There is no "correct" number of actions to perform in a single bulk
192- call. You should experiment with different settings to find the optimum
193- size for your particular workload.
194-
195- If using the HTTP API, make sure that the client does not send HTTP
196- chunks, as this will slow things down.
197-
198- [float]
199- [[bulk-optimistic-concurrency-control]]
200- ==== Optimistic Concurrency Control
201-
202- Each `index` and `delete` action within a bulk API call may include the
203- `if_seq_no` and `if_primary_term` parameters in their respective action
204- and meta data lines. The `if_seq_no` and `if_primary_term` parameters control
205- how operations are executed, based on the last modification to existing
206- documents. See <<optimistic-concurrency-control>> for more details.
207-
208-
209- [float]
210- [[bulk-versioning]]
211- ==== Versioning
212-
213- Each bulk item can include the version value using the
214- `version` field. It automatically follows the behavior of the
215- index / delete operation based on the `_version` mapping. It also
216- support the `version_type` (see <<index-versioning, versioning>>).
217-
218- [float]
219- [[bulk-routing]]
220- ==== Routing
221-
222- Each bulk item can include the routing value using the
223- `routing` field. It automatically follows the behavior of the
224- index / delete operation based on the `_routing` mapping.
225-
226- [float]
227- [[bulk-wait-for-active-shards]]
228- ==== Wait For Active Shards
229-
230- When making bulk calls, you can set the `wait_for_active_shards`
231- parameter to require a minimum number of shard copies to be active
232- before starting to process the bulk request. See
233- <<index-wait-for-active-shards,here>> for further details and a usage
234- example.
235-
236- [float]
237- [[bulk-refresh]]
238- ==== Refresh
239-
240- Control when the changes made by this request are visible to search. See
241- <<docs-refresh,refresh>>.
242-
243- NOTE: Only the shards that receive the bulk request will be affected by
244- `refresh`. Imagine a `_bulk?refresh=wait_for` request with three
245- documents in it that happen to be routed to different shards in an index
246- with five shards. The request will only wait for those three shards to
247- refresh. The other two shards that make up the index do not
248- participate in the `_bulk` request at all.
249-
250322[float]
251323[[bulk-update]]
252- ==== Update
324+ ===== Bulk update example
253325
254326When using the `update` action, `retry_on_conflict` can be used as a field in
255327the action itself (not in the extra payload line), to specify how many
@@ -276,13 +348,3 @@ POST _bulk
276348--------------------------------------------------
277349// TEST[continued]
278350
279- [float]
280- [[bulk-security]]
281- ==== Security
282-
283- See <<url-access-control>>.
284-
285- [float]
286- [[bulk-partial-responses]]
287- ==== Partial responses
288- To ensure fast responses, the bulk API will respond with partial results if one or more shards fail. See <<shard-failures, Shard failures>> for more information.
0 commit comments