Conversation
f3bf226 to
147b59e
Compare
|
I would prefer I think Just my 2¢ :-) |
|
Thanks a lot for the feedback, that is super useful. Currently the way I differentiate between logs and metrics is that metrics are pulled on a predefined period and logs are pushed, but this line is very blurry. Based on this the above would also be logs I'm thinking. But I see that a user is in this case not going to look for the "raw event" under The initial reason I wanted to get it out of event prefix was that it felt like the only object in there that does not contain meta information but actual data. I'm also ok with leaving where it is. @webmat Out of curiosity: Would you categorise operational events under logs, metrics or it's own category? |
|
@webmat Working on elastic/beats#7207 I realised |
For the type of data I was dealing with, I would definitely say "events", in the sense that it was not just numerical data. I was processing email addresses, IP addresses, email headers and email categories. So full on events with lots of juicy data ;-) On the other hand, I don't have a strong attachment to the "event" part of event.raw. It could totally be something else equally generic, like * Well, depending on how broad you define log. But I think most people think of text files, when they hear "log". And "most people" is our audience for this ;-) |
|
We have some data sources that aren't logs. Some are data base scrapes and some are API calls we make to a service and then index what comes out. Not sure Indexing the original data (before it is parsed and mutilated) as |
|
Based on the feedback above I reverted the change to For My suggestion is now to only introduce |
|
Yes that part I agree with. Makes total sense, especially wrt reconstructed messages (like multiline). |
schemas/log.yml
Outdated
|
|
||
| In contrast to the `message` field which can contain an extracted part | ||
| of the log message, this field contains the raw log message and should | ||
| not be processed. It can have already some modifications like encoding |
There was a problem hiding this comment.
This sounds like a contractiction: 'raw log message' and 'not be processed' vs. 'can have already some modifications'
There was a problem hiding this comment.
Agree. Any suggestion for a better description instead of raw we could use here?
CHANGELOG.md
Outdated
|
|
||
| ### Breaking changes | ||
|
|
||
| * Rename `event.raw` to `log.message`. #3 |
There was a problem hiding this comment.
This change should be removed now that event.raw stays as is.
There was a problem hiding this comment.
fixed, thanks for the review.
|
PR updated with disabling |
README.md
Outdated
| | <a name="log.level"></a>`log.level` | Log level of the log event.<br/>Some examples are `WARN`, `ERR`, `INFO`. | keyword | | `ERR` | | ||
| | <a name="log.line"></a>`log.line` | Line number the log event was collected from. | long | | `18` | | ||
| | <a name="log.offset"></a>`log.offset` | Offset of the beginning of the log event. | long | | `12` | | ||
| | <a name="log.message"></a>`log.message` | This is the log message and contains the full log message before splitting it up in multiple parts.<br/>In contrast to the `message` field which can contain an extracted part of the log message, this field contains the raw log message and should not be processed. It can have already some modifications like encoding applied or new lines removed to clean up the log message.<br/>This field is not index and doc_values are disabled so it can't be queried but the value can be retrieved from `_source`. | keyword | | `Sep 19 08:26:10 localhost My log` | |
There was a problem hiding this comment.
"not index" -> "not indexed"
There was a problem hiding this comment.
Also, w/ regards to my previous comment: "[..] this field contains the raw log message and should not be processed." -> "[..] this field contains the original, full log message."
The field `log.message` contains the full log message before splitting it up in multiple parts. In contrast to the `message` field which can contain an extracted part of the log message, this field contains the original, full log message. It can have already some modifications applied like encoding or new lines removed to clean up the log message. This field is not indexed and doc_values are disabled so it can't be queried but the value can be retrieved from `_source`.
|
PR rebased, commit and PR message updated, fixes applied. Read for an other review. |
| }, | ||
| "message": { | ||
| "doc_values": false, | ||
| "ignore_above": 1024, |
There was a problem hiding this comment.
For logs, ignore_above: 1024 will be too small.
There was a problem hiding this comment.
I checked the docs for this and run some tests: https://www.elastic.co/guide/en/elasticsearch/reference/current/ignore-above.html
It seems like ignore_above does not play any role here as it's not index anyway. _source is not affected by ignore_above.
There was a problem hiding this comment.
Good. Thanks for testing.
Maybe we shouldn't even set an ignore_above when index: false? I imagine others having a similar reaction to mine.
The field
log.messagecontains the full log message before splitting it up in multiple parts.In contrast to the
messagefield which can contain an extracted part of the log message, this field contains the original, full log message. It can have already some modifications applied like encoding or new lines removed to clean up the log message.This field is not indexed and doc_values are disabled so it can't be
queried but the value can be retrieved from
_source.