Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fields with lenght > 32766 bytes #1592

Closed
delfer opened this issue Nov 30, 2015 · 1 comment
Closed

Fields with lenght > 32766 bytes #1592

delfer opened this issue Nov 30, 2015 · 1 comment

Comments

@delfer
Copy link

delfer commented Nov 30, 2015

I am using JSON extractor and one of my fields can be > 16383 UTF-8 chars.
Elasticsearch give me the error:

IllegalArgumentException[Document contains at least one immense term in field="msg.response" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[110, 116, 114, 121, 62, 60, 107, 101, 121, 62, 115, 101, 114, 118, 105, 99, 101, 95, 116, 121, 112, 101, 60, 47, 107, 101, 121, 62, 60, 118]...', original message: bytes can be at most 32766 in length; got 38507]; nested: MaxBytesLengthExceededException[bytes can be at most 32766 in length; got 38507];

As result - log entries lost.
It can be multiply ways to resolve:

  1. Do not index fileds with size > 32766 bytes
  2. Cut this fileds (May be with some cut extractor)
  3. Divide one long field into multiple short

I fond only one 'dummy' solution:

{
  "extractors": [
    {
      "condition_type": "regex",
      "condition_value": "^.{16383,}$",
      "converters": [],
      "cursor_strategy": "cut",
      "extractor_config": {
        "regex_value": "^(.{0,16383})"
      },
      "extractor_type": "regex",
      "order": 0,
      "source_field": "msg.response",
      "target_field": "response0",
      "title": "response cutter 0"
    },
    ...
   {
      "condition_type": "regex",
      "condition_value": "^.{16383,}$",
      "converters": [],
      "cursor_strategy": "cut",
      "extractor_config": {
        "regex_value": "^(.{0,16383})"
      },
      "extractor_type": "regex",
      "order": 0,
      "source_field": "msg.response",
      "target_field": "response9",
      "title": "response cutter 9"
    }
  ],
  "version": "1.2.2 (91c7822)"
}

Please, provide any usefull solution. Thank you.

@joschi
Copy link
Contributor

joschi commented Nov 30, 2015

Duplicate of #873.

@joschi joschi closed this as completed Nov 30, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants