String fields longer than 32kb cannot be indexed #873

kroepke · 2015-01-14T17:03:25Z

Elasticsearch has an upper limit for term length, so trying to index values longer than ~32kb fails with an error.
Find a way to store those values but not trying to analyze them.

kroepke · 2015-01-14T17:05:38Z

FWIW here's how "other" people deal with it: http://answers.splunk.com/answers/136664/changing-max-length-of-field.html

razvanphp · 2015-03-20T09:33:37Z

+1

bernd · 2015-08-11T10:21:50Z

Removing this from 1.2, not gonna make it. Sorry.

delfer · 2015-12-01T08:13:49Z

Found one else 'dummy' solution. Cut tail from long field into another field and then replace this field by any other field.

{
  "extractors": [
    {
      "condition_type": "regex",
      "condition_value": "^.{16383,}$",
      "converters": [],
      "cursor_strategy": "cut",
      "extractor_config": {
        "regex_value": "^.{0,16383}(.*)"
      },
      "extractor_type": "regex",
      "order": 0,
      "source_field": "msg.response",
      "target_field": "responseTail",
      "title": "cut response"
    },
    {
      "condition_type": "none",
      "condition_value": "",
      "converters": [],
      "cursor_strategy": "copy",
      "extractor_config": {},
      "extractor_type": "copy_input",
      "order": 0,
      "source_field": "gl2_remote_ip",
      "target_field": "responseTail",
      "title": "replace responseTail by server IP"
    }
  ],
  "version": "1.2.2 (91c7822)"
}

ghost · 2016-01-28T16:03:27Z

+1

Same issue:
2016-01-28T17:03:43.165+01:00 ERROR [Messages] Failed to index [1] messages. Please check the index error log in your web interface for the reason. Error: failure in bulk execution:
[13]: index [graylog2_4], type [message], id [b31f5110-c5d8-11e5-8227-001a4a777b5d], message [IllegalArgumentException[Document contains at least one immense term in field="other" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[49, 50, 51, 52, 53, 54, 55, 56, 57, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 48]...', original message: bytes can be at most 32766 in length; got 186000]; nested: MaxBytesLengthExceededException[bytes can be at most 32766 in length; got 186000]; ]

My Gelf Message:
{ "message": "OK", "other": "more than 32k...." }

joschi · 2016-01-28T16:14:51Z

@kablz Please try creating a custom mapping (https://www.elastic.co/guide/en/elasticsearch/reference/1.7/mapping.html) for the other field with "index": "no" (https://www.elastic.co/guide/en/elasticsearch/reference/1.7/mapping-core-types.html) and add it to an index template (https://www.elastic.co/guide/en/elasticsearch/reference/1.7/indices-templates.html).

ghost · 2016-02-01T12:02:46Z

@joschi Thanks for the links.
Is there somewhere a good definition from the GELF? https://www.graylog.org/resources/gelf/
Like which field has which data type and limitations. That would help us a lot.
Now we find out such "limitations" by reverse engineering (Bugs and testing).

joschi · 2016-02-01T13:39:29Z

@kablz The GELF specification can be found at https://www.graylog.org/resources/gelf/ and describes the names and types of mandatory fields in a GELF message. Additional fields (see specification) naturally don't have a fixed schema unless you enforce that on your GELF producers.

csquire · 2016-03-22T16:45:29Z

You could try using an index template which includes a dynamic template that matches all string fields, then use 'ignore_above' to prevent the document from failing to index. Below is a template I use on another Elasticsearch cluster I feed logs to (not Graylog) where I was getting rejections from long fields, such as Java stacktraces. For my purposes, I didn't find it useful to index any field that has over 512 characters, but that value can be tweaked to whatever you like. The other settings could be removed or changed as desired.

Indices Templates
Index Mapping

{
  "logs_template": {
    "template": "logs*",
    "mappings": {
      "_default_": {
        "_all": {
          "enabled": false
        },
        "dynamic_templates": [
          {
            "notanalyzed": {
              "match": "*",
              "match_mapping_type": "string",
              "mapping": {
                "ignore_above": 512,
                "type": "string",
                "index": "not_analyzed",
                "doc_values": true
              }
            }
          }
        ]
      }
    }
  }
}

From the docs:

The analyzer will ignore strings larger than this size. Useful for generic not_analyzed fields that should ignore long text.

This option is also useful for protecting against Lucene’s term byte-length limit of 32766. Note: the value for ignore_above is the character count, but Lucene counts bytes, so if you have UTF-8 text, you may want to set the limit to 32766 / 3 = 10922 since UTF-8 characters may occupy at most 3 bytes.

meixger · 2016-04-14T09:28:35Z

@csquire Nice, and this would work on a custom template, but unfortunately i've not found a way to replace the store_generic template in the default graylog-internal template:

{
  "graylog-internal": {
    "order": -2147483648,
    "template": "graylog_*",
    "mappings": {
      "message": {
        ...
        "dynamic_templates": [
          {
            "internal_fields": {
              ...
              "match": "gl2_*"
            }
          },
          {
            "store_generic": {
              "mapping": {
                "index": "not_analyzed",
              },
              "match": "*"
            }
          }
        ],
        "properties": {
          ...
        }
      }
    }
  }
}

What would speak against adding a ignore_above as default?

...
"store_generic": {
  "mapping": {
    "index": "not_analyzed",
    "ignore_above": 32766
  },
  "match": "*"
}
...

sjoerdmulder · 2016-11-15T11:57:13Z

I also hit this issue on Graylog 2.1.1, ignore_above seems a good option that will fix this.

mike-daoust · 2017-03-02T21:51:32Z

this is an issue for me also

listingmirror · 2017-05-09T13:40:32Z

I hit this and my entire cluster dies (stops getting new messages). Shouldn't there be some kind of default limit to prevent cluster death? (Maybe this didnt kill cluster, still researching)

james-gonzalez · 2017-05-10T09:53:05Z

Same as @listingmirror. Anytime we get a larger than normal java stack trace, it's bringing down Graylog completely. Not indexing documents anymore, no new logs. The only solution I've found so far is to kill -9 and then delete the on-disk journal.

jebucha · 2017-05-31T17:42:43Z

I believe we are also running in to this issue. I hadn't connected the dots but I'm seeing indexing failures "Document contains at least one immense term in field="full_message"", and the node that threw that error is not currently processing incoming messages, just queuing them up, backed up by 2 million and counting. As with others, my primary resolution has been to restart the service.

Aenima4six2 · 2017-08-15T22:32:58Z

We were getting this issue in pre Graylog 2.3 versions and fixed the issue using @joschi's advice above (using custom mappings). However, with our recent upgrade to Graylog 2.3, the issue is back, even though a custom mapping is present in ES preventing the field from being indexed.

Current Error

{"type":"illegal_argument_exception","reason":"DocValuesField \"requestContent\" is too large, must be <= 32766"}
--

Old (Pre 2.3) Error

{"type":"illegal_argument_exception","reason":"Document contains at least one immense term in field=\"requestContent\" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[45, 45, 45, 32, 82, 101, 113, 117, 101, 115, 116, 32, 72, 101, 97, 100, 101, 114, 115, 32, 45, 45, 45, 13, 10, 67, 111, 110, 110, 101]...', original message: bytes can be at most 32766 in length; got 38345","caused_by":{"type":"max_bytes_length_exceeded_exception","reason":"max_bytes_length_exceeded_exception: bytes can be at most 32766 in length; got 38345"}}

Pushed the following custom mapping to ES to address this, but no luck.

curl -X PUT -d '{ "template": "graylog_*", "mappings" : { "message" : { "properties" : { "requestContent" : { "type" : "string", "index" : "no" } } } } }' http://localhost:9200/_template/graylog-custom-mapping?pretty

curl -X GET 'http://localhost:9200/graylog_deflector/_mapping?pretty' | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 12743  100 12743    0     0  2107k      0 --:--:-- --:--:-- --:--:-- 2488k
{
  "graylog_5": {
    "mappings": {
      "message": {
        "dynamic_templates": [
          {
            "internal_fields": {
              "match": "gl2_*",
              "mapping": {
                "type": "keyword"
              }
            }
          },
          {
            "store_generic": {
              "match": "*",
              "mapping": {
                "index": "not_analyzed"
              }
            }
          }
        ],
        "properties": {
          "AccountName": {
            "type": "keyword"
          },
          ...
          "requestContent": {
            "type": "keyword",
            "index": false
          },
         ...
        }
      }
    }
  }
}

UPDATE
Think i found a viable solution.
https://www.elastic.co/guide/en/elasticsearch/reference/current/doc-values.html

curl -X PUT http://localhost:9200/_template/graylog-custom-mapping?pretty -d '
{
  "template": "graylog_*",
  "mappings" : {
    "message" : {
      "properties" : {
        "requestContent" : {
          "type" : "string",
          "index" : "no",
          "doc_values": false ---> turn this off.. ES 5.5 appears to have a 32k size limit.
        }
      }
    }
  }
}'

avdhoot · 2017-09-25T05:27:16Z

@Aenima4six2 thanks for above solution.
+1

Ayyappa752 · 2018-02-19T10:15:41Z

Hi @csquire , I ran into the same problem when I'm storing a html template in elastic. I have tried not to index it, and to increase the size by using "ignoreabove":512. But that didn't work. finally i have to use "doc_values": true along with the size. how come this doc_values solved the issue?

zhangtemplar · 2018-12-11T09:54:01Z

@Aenima4six2 recent version of ElasticSearch does not allow you to change the type, index and/or doc_values.

However using ignore_above works. Here is the command:

curl -XPUT 'http://localhost:9200/graylog_0/_mapping/message' -d '
{
    "message" : {
      "properties" : {
        "screenShot" : {
          "type" : "keyword",
          "ignore_above": 32000
        }
      }
    }
}
'

joginsky · 2020-03-26T07:56:59Z

Hi all,

I have the same problem... ignore_above seems to be working but not for "type": "text". Any idea how to solve this for text type?

aimhighrana · 2020-08-29T16:43:47Z

Hello ,

Same issue , max_bytes_length_exceeded_exception for large text data..

Thanks

bghira · 2023-08-28T20:20:05Z

any chance some eyes on this one could happen in the next decade? :)

kroepke added the elasticsearch label Jan 14, 2015

kroepke added this to the 1.1.0 milestone Jan 14, 2015

joschi mentioned this issue Jan 14, 2015

MaxBytesLengthExceededException -- too long field length #745

Closed

kroepke modified the milestones: 1.2.0, 1.1.0 May 29, 2015

bernd removed this from the 1.2.0 milestone Aug 11, 2015

joschi mentioned this issue Aug 26, 2015

ERROR [NettyContainer] Uncaught exception during jersey resource handling #1259

Closed

joschi mentioned this issue Nov 30, 2015

Fields with lenght > 32766 bytes #1592

Closed

joschi mentioned this issue Feb 17, 2016

Indexer Failure: MaxBytesLengthExceededException[bytes can be at most 32766 in length #1827

Closed

bernd added the feature label Feb 26, 2016

Tobion mentioned this issue May 19, 2016

GelfMessageFormatter doesnt truncate large data Seldaek/monolog#751

Closed

emsearcy mentioned this issue Nov 11, 2016

MaxBytesLengthExceededException emsearcy/fluent-plugin-gelf#26

Open

jalogisch added the triaged label Jan 12, 2017

dennisoelkers mentioned this issue Jan 12, 2017

Field mapping management #1443

Open

0cjs mentioned this issue Jul 19, 2017

Change Write([]byte) to log full message to message field Graylog2/go-gelf#11

Open

kkamkou mentioned this issue Apr 12, 2018

Message Length clarification kkamkou/node-gelf-pro#40

Closed

jalogisch mentioned this issue Jun 27, 2018

Processing function length of field #4871

Open

txau mentioned this issue Apr 20, 2022

Text field character limits huridocs/uwazi#4520

Closed

ili101 mentioned this issue Jan 31, 2024

How to connect to Data Node to fix long messages not logged Graylog2/docker-compose#61

Open

osiegmar mentioned this issue Jun 6, 2024

Configurable max value length for key-value pairs and MDC osiegmar/logback-gelf#106

Closed

bmichalkiewicz mentioned this issue Aug 28, 2024

Gelf output plugin: improved docs and add missing flags kube-logging/logging-operator#1803

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

String fields longer than 32kb cannot be indexed #873

String fields longer than 32kb cannot be indexed #873

kroepke commented Jan 14, 2015

kroepke commented Jan 14, 2015

razvanphp commented Mar 20, 2015

bernd commented Aug 11, 2015

delfer commented Dec 1, 2015

ghost commented Jan 28, 2016

joschi commented Jan 28, 2016

ghost commented Feb 1, 2016

joschi commented Feb 1, 2016

csquire commented Mar 22, 2016

meixger commented Apr 14, 2016

sjoerdmulder commented Nov 15, 2016 •

edited

Loading

mike-daoust commented Mar 2, 2017

listingmirror commented May 9, 2017 •

edited

Loading

james-gonzalez commented May 10, 2017

jebucha commented May 31, 2017

Aenima4six2 commented Aug 15, 2017 •

edited

Loading

avdhoot commented Sep 25, 2017

Ayyappa752 commented Feb 19, 2018

zhangtemplar commented Dec 11, 2018

joginsky commented Mar 26, 2020

aimhighrana commented Aug 29, 2020

bghira commented Aug 28, 2023

String fields longer than 32kb cannot be indexed #873

String fields longer than 32kb cannot be indexed #873

Comments

kroepke commented Jan 14, 2015

kroepke commented Jan 14, 2015

razvanphp commented Mar 20, 2015

bernd commented Aug 11, 2015

delfer commented Dec 1, 2015

ghost commented Jan 28, 2016

joschi commented Jan 28, 2016

ghost commented Feb 1, 2016

joschi commented Feb 1, 2016

csquire commented Mar 22, 2016

meixger commented Apr 14, 2016

sjoerdmulder commented Nov 15, 2016 • edited Loading

mike-daoust commented Mar 2, 2017

listingmirror commented May 9, 2017 • edited Loading

james-gonzalez commented May 10, 2017

jebucha commented May 31, 2017

Aenima4six2 commented Aug 15, 2017 • edited Loading

avdhoot commented Sep 25, 2017

Ayyappa752 commented Feb 19, 2018

zhangtemplar commented Dec 11, 2018

joginsky commented Mar 26, 2020

aimhighrana commented Aug 29, 2020

bghira commented Aug 28, 2023

sjoerdmulder commented Nov 15, 2016 •

edited

Loading

listingmirror commented May 9, 2017 •

edited

Loading

Aenima4six2 commented Aug 15, 2017 •

edited

Loading