Conversation
|
We do plan on allowing nested objects in the future, however we cannot simply revert the existing behavior and change extractors because that will break existing users' setups. Pipeline functions might need changes as well. In addition displaying nested objects opens up all kinds of issues, too, which aren't addressed here. Data semantics are also an issue, for example to make sense of arrays in nested objects, we will need to support nested aggregations, which is not straightforward to implement because the query engine needs the exact mapping type or it will generated invalid queries. I'm reluctant to close this PR because I support the goal, however I believe that nested object support is much more involved than just allowing indexing them. I'll leave it open for now for the sake of discussion, but please don't be disappointed if this does not get merged any time soon. |
|
No, I completely understand; this PR was mostly to get the ball rolling on a feature that we need in my organization. Glad there's a plan for it! Some thoughts I had: We haven't implemented Graylog yet, so I'm happy to help with greenfield testing, but I imagine that the harder part will be supporting existing setups as you say. I wonder if it may be useful to publish a new spec for GELF with support for arrays and objects (cf #668). That may provide a migration path by allowing users to configure GELF inputs to use a specific GELF spec. That way, the inputs defined with the 1.1 spec would still process as expected, and 1.2 inputs could use the new functionality. IMO, displaying nested objects with dot-separated field names in the UI is pretty natural. I believe that is how Elasticsearch queries reference the fields and how Kibana displays them as well. That may cause issues, though, because there wouldn't be a visible difference between field names with dots in them and fields of nested objects. I ran into the problem with defining the data structure up front pretty quickly while testing this. It would most likely have to be defined up front for aggregation. I saw a proposal on IRC (I believe from @gimmic) to have a way to define the Elasticsearch index templates from the web app. That may help to make it easier to define the data structure for the indices up front so that the log messages are ingested correctly. I guess we'd also have to come up with a syntax for doing advanced queries of nested objects.(Elasticsearch/Lucene doesn't even have that (cf. elastic/elasticsearch#11322) Thanks for opening it up for discussion! |
|
I do think it is strange having graylog rename fields for backwards compatibility which is no longer required. The field rename has confused me with parsing rules now more than once until I look at the raw input/field value(because of the way the graylog web interface shows the renamed value). As backwards compatibility is dropped, it would make sense to sunset this transparent-renaming feature as well, maybe invert it with a flag, or provide a checkbox in the UI to enable/disable this feature. I also obviously support editing of templates via the web UI but that's an entire new feature request. This would also allow for use of the actual Elastic Common Schema (ECS) which uses nested field names as well via periods. |
|
Sorry, this turned into an essay :/ Indeed, ECS support (or something similar to it) is one of the motivations for supporting them. This was bad then and is bad now, IMHO, because it needs to be done just prior to sending it Elasticsearch due to backwards compatibility, however no one ever came up with a better way of doing it, so we never changed it again. Now the philosophy part:
Nested objects as in Elasticsearch's mapping type is rarely what people really want, at least not in logging, as it has to create one document per array index and comes with a few restrictions. That being said, scenarios where I've heard requests for nested field support are ECS support, generic JSON documents and messages that have fields containing a list of things, such as a stack trace. ECS support falls into scenario 1 from above, it provides a logical grouping that enables treating certain attributes as one, which totally makes sense from a processing and display point of view. Generic JSON support is something that sounds great on paper, but then it's rare that people can actually articulate how they intend to make sense of it. Querying is difficult, aggregations are hopelessly complex to express (do you need JSONPath to extract something, how about arrays, objects, what about data types like IP addresses that are represented as simple strings etc). The same goes for XML messages. Things like stack traces are interesting, because they are a combination of "contains interesting information" and "I just need to display it properly for analysis". Now, where does this leave us? Hope that sheds some light on what we are thinking about this. |
The backwards compatibility concern really is that we cannot silently change behavior in running setups. |
|
Yeah, I get it. At some point you will simply have to introduce a breaking change. Thankfully, this is a fairly minor breaking change compared to some of the stunts Elastic have pulled over the years. I think given sufficient notification this should be something the community could generally fix or otherwise have an option to toggle. The old functionality could be emulated on upgraded instances. |
|
@kroepke, thanks for taking the time to explain your thoughts. My eventual goal is to be able to send messages using ECS as well. I would be fine with just having dotted field names for now until we can get actual nested objects (fake it 'til you make it.) For this PR, would it be enough to add an option to rename |
Description
This removes the features that filter and rename field names on ingestion.
Motivation and Context
Now that Elasticsearch supports dots in field names to delimit nested objects and Graylog 3.0 has dropped support for Elasticsearch versions without this feature, this PR re-enables dots in field names.
Fixes. #4583
How Has This Been Tested?
The tests should all pass. I have also done an informal test in my environment, and the nested object mappings are created in Elasticsearch as expected.
Screenshots (if appropriate):
Types of changes
Checklist:
I am not sure how the documentation should be modified, but I created a proposed file change in the documentation repo to be published if this is accepted in Graylog 3.1.0.