Skip to content

GelfMessageFormatter doesnt truncate large data #751

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rubao opened this issue Mar 23, 2016 · 3 comments
Closed

GelfMessageFormatter doesnt truncate large data #751

rubao opened this issue Mar 23, 2016 · 3 comments

Comments

@rubao
Copy link

rubao commented Mar 23, 2016

monolog/monolog/src/Monolog/Formatter/GelfMessageFormatter.php only convert toJson() when is not a scalar value.

but in case of big exceptions with many previous exception, we have a problem in graylog, because it's uses elasticsearch to store the data and elasticsearch accept max 32766 bytes.
see https://issues.apache.org/jira/browse/LUCENE-5472

in graylog we got some indexer failures:
IllegalArgumentException[Document contains at least one immense term in field="ctxt_exception" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[123, 34, 99, 108, 97, 115, 115, 34, 58, 34, 69, 120, 99, 101, 112, 116, 105, 111, 110, 34, 44, 34, 109, 101, 115, 115, 97, 103, 101, 34]...', original message: bytes can be at most 32766 in length; got 33656]; nested: MaxBytesLengthExceededException[bytes can be at most 32766 in length; got 33656];

and the message is not logged in graylog.

failing test:

public function testFormatWithLargeData() {
    $formatter = new GelfMessageFormatter();
    $record = array(
        'level' => Logger::ERROR,
        'level_name' => 'ERROR',
        'channel' => 'meh',
        'context' => array('exception' => str_repeat(' ', 32767)),
        'datetime' => new \DateTime("@0"),
        'extra' => array('key' => str_repeat(' ', 32767)),
        'message' => 'log'
    );
    $message = $formatter->format($record);
    $messageArray = $message->toArray();
    $this->assertLessThanOrEqual(32766, strlen($messageArray['_key']));
    $this->assertLessThanOrEqual(32766, strlen($messageArray['_ctxt_exception']));
}
@Seldaek Seldaek closed this as completed in 6bc1a44 Apr 2, 2016
@Tobion
Copy link

Tobion commented May 19, 2016

@Seldaek I don't think it's the task of Monolog to handle this but must be fixed at the elasticsearch level of Graylog. And they have a ticket for that: Graylog2/graylog2-server#873

Also the solution implemented here looks really strange. The 32766 maximum is per token. And the tokenization depends on the ES configuration per field. But the implemented seems to be doing something based on the total length. And even if it's done per field Monolog would make strange assumptions on the tokenization of the field, which it cannot.

@enleur
Copy link
Contributor

enleur commented Apr 7, 2017

Can we revert this or at least make it more configurable?

@Seldaek
Copy link
Owner

Seldaek commented Apr 7, 2017

I stand by my last comment on the commit 6bc1a44#commitcomment-17544637 - if someone can improve this and actually uses gelf, please send a PR (to 1.x branch!) :)

@enleur enleur mentioned this issue Apr 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants