Skip to content

[BUG] Indexing a document through the bulk api bypasses the 512 byte _id size limit #6595

@trend-adam-kinnell

Description

@trend-adam-kinnell

Describe the bug
There is currently a hard limit on the size of a document _id field of 512 bytes when indexing a document. In this case we get an error like below.

{"error":{"root_cause":[{"type":"action_request_validation_exception","reason":"Validation Failed: 1: id [...] is too long, must be no longer than 512 bytes but was: ...;"}],"type":"action_request_validation_exception","reason":"Validation Failed: 1: id [...] is too long, must be no longer than 512 bytes but was:...;"},"status":400}

However, we are able to use the _bulk api to create a new document that bypasses this limit. The document is created successfully and no error is returned.

Of particular interest is that reindexing will fail for any documents with an _id longer than 512 bytes. As a workaround we are able to exclude these documents with a query.

To Reproduce
Steps to reproduce the behavior:

docker run -p 9200:9200 -p 9600:9600 -e "discovery.type=single-node" --name opensearch-node -d opensearchproject/opensearch:latest
curl -X PUT "https://localhost:9200/myindex" -ku admin:admin

# This fails as the _id field is longer than 512 bytes
curl -X PUT "https://localhost:9200/myindex/_create/aaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbc" -ku admin:admin -H 'Content-Type: application/json' -d'{}'

# This succeeds, despite the length of the _id
curl -XPOST "https://localhost:9200/myindex/_bulk" -ku admin:admin -H 'Content-Type: application/json' -d '
{ "update" : {"_id" : "aaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbbaaaaaaaaaaabbbbbbbbbbbc", "_index" : "myindex", "retry_on_conflict" : 3} }
{ "upsert": {}, "doc": {} }
'

Expected behavior
The same limitations on _id length should be applied no matter what method is used to index a document.

Plugins
Default

Screenshots
N/A

Host/Environment (please complete the following information):

"version" : {
    "distribution" : "opensearch",
    "number" : "2.6.0",
    "build_type" : "tar",
    "build_hash" : "7203a5af21a8a009aece1474446b437a3c674db6",
    "build_date" : "2023-02-24T18:57:04.388618985Z",
    "build_snapshot" : false,
    "lucene_version" : "9.5.0",
    "minimum_wire_compatibility_version" : "7.10.0",
    "minimum_index_compatibility_version" : "7.0.0"
  },

Additional context
We originally found and reproduced this issue on an ES 7.10 cluster, but we have confirmed it is still present in the latest version of OpenSearch (2.6.0 at time of writing).

It is likely that a fix to this issue would be considered a breaking change.

Metadata

Metadata

Assignees

No one assigned

    Labels

    IndexingIndexing, Bulk Indexing and anything related to indexingbugSomething isn't workingv2.9.0'Issues and PRs related to version v2.9.0'v3.0.0Issues and PRs related to version 3.0.0

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions