Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows event_data format is difficult to consume #32952

Open
strawgate opened this issue May 8, 2024 · 26 comments
Open

Windows event_data format is difficult to consume #32952

strawgate opened this issue May 8, 2024 · 26 comments

Comments

@strawgate
Copy link
Contributor

strawgate commented May 8, 2024

Component(s)

receiver/windowseventlog

Describe the issue you're reporting

Prior to #28587 the format for event_data on windows events was easy to consume as it was a map:

    <EventData>
        <Data Name="ProcessId">7924</Data>
        <Data Name="Application">\device\harddiskvolume2\users\test\desktop\netcat\nc.exe</Data>
        <Data Name="SourceAddress">0.0.0.0</Data>
        <Data Name="SourcePort">5555</Data>
        <Data Name="Protocol">6</Data>
        <Data Name="FilterRTID">84614</Data>
        <Data Name="LayerName">%%14608</Data>
        <Data Name="LayerRTID">36</Data>
    </EventData>

would produce

  "event_data": {
    "ProcessId": "7924",
    "Application": "\\device\\harddiskvolume2\\users\test\\desktop\\netcat\\nc.exe",
    "SourceAddress": "0.0.0.0",
    "SourcePort": "5555",
    "Protocol": "6",
    "FilterRTID": "84614",
    "LayerName": "%%14608",
    "LayerRTID": "36"
  }

In order to support xml entries with no names, this PR #28587 switched the format to be an array of maps:

{
  "event_data": {
    "data": [
      {
        "ProcessId": "7924"
      },
      {
        "Application": "\\device\\harddiskvolume2\\users\test\\desktop\\netcat\\nc.exe"
      },
      {
        "SourceAddress": "0.0.0.0"
      },
      {
        "SourcePort": "5555"
      },
      {
        "Protocol": "6"
      },
      {
        "FilterRTID": "84614"
      },
      {
        "LayerName": "%%14608"
      },
      {
        "LayerRTID": "36"
      }
    ]
  }

Which seems to require a mergemaps(?) to be consumable via OTTL (though I'm not sure how that would work with multiple name-less entries).

We should consider handling the happy case (where event_data has a name and value) in a way that makes the data easy to consume.

Perhaps we could generate names (value1, value2, value3) for the name-less values to offer a consistent access experience. This would still allow whatever patterns are used today to identify the right event_data value to read while making it very easy to access by the order of the event_data.

@strawgate strawgate added the needs triage New item requiring triage label May 8, 2024
Copy link
Contributor

github-actions bot commented May 8, 2024

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@strawgate
Copy link
Contributor Author

I actually was unable to figure out how to work with the event_data in ottl as merge_maps doesn't work and flatten produces .0 .1 fields. I don't see any ottl function for merging multiple arbitrary maps so I filed #32954

@djaglowski
Copy link
Member

Thanks for reporting this @strawgate. I agree it's difficult to work with but it looks like you have identified a potential change to ottl which would help. Let us know if you have other ideas or requests to help with this.

@strawgate
Copy link
Contributor Author

Yeah though having everyone have to merge maps to look at event data doesn't seem ideal, especially when there are potential a dozen plus properties with empty keys that won't merge well.

I think we should go back to creating a single map and come up with an incrementing identifier like param1 for those unnamed parameters.

I'd be happy to try to put together a pr

@djaglowski
Copy link
Member

Or maybe we just separate named and unnamed parameters - a map for named, an array for unnamed? I don't know if there's appetite to switch it again but maybe @swiatekm-sumo @pjanotti have opinions too.

@swiatekm
Copy link
Contributor

I have very little appetite for a breaking change here, but I wouldn't mind a configuration option to switch between two well-defined formats.

@strawgate
Copy link
Contributor Author

strawgate commented May 14, 2024

I'm not sure how you would actually even use the structure that exists today.

Can you provide an example of how you can consume the data today?

As far as I can tell, you can't "find " keys or values that are in a map that's in an arbitrary array of maps so you're relying on pulling maps by the index in the array? Which I don't think is stable for named parameters but is stable for unnamed parameters.

It seems to me that it would be very hard for someone to consume as it is today as they would be taking a dependency on the order of the maps in the array?

In event_data XML these are largely accessed by value or by key and only sometimes by index. Having them in the map would provide a unified access pattern.

Perhaps the event_data unnamed keys being converted to array vs map should be a configuration?

@pjanotti
Copy link
Contributor

Thinking back to #28587 (comment) I opted to go with maps to keep it compact, ie.: avoiding naming the element fields, but, if we were to follow the actual XML event definition we would need to respect that. So if we were to follow the XML specification it would have to be something like:

{
  "event_data": {
    "data": [
      {
        "name": "ProcessId",
        "value": "7924"
      },
      {
        "name": "Application",
        "value": "\\device\\harddiskvolume2\\users\test\\desktop\\netcat\\nc.exe"
      },
    ]
  }

it means that for the Data elements without name it will be something like:

{
  "event_data": {
    "data": [
      {
        "name": "",
        "value": "7924"
      },
      {
        "name": "",
        "value": "\\device\\harddiskvolume2\\users\test\\desktop\\netcat\\nc.exe"
      },
    ]
  }

The map is really convenient, but, it is not the actual data format. The events without name are the oldest format for Windows events, but, as shown in the comment above are still used.

I want to improve the usage for @strawgate, but, let's decide this carefully to avoid having to revisit it later. 🤔

@strawgate
Copy link
Contributor Author

strawgate commented May 14, 2024

I went looking at how this works across other tools, it looks like there are 3 main methods utilized:

  1. Leave it as XML (Loki, Splunk(?), NXlog)
  2. Turn it into a dictionary with something like param1 (telegraf, elastic)
  3. Make it an array of maps with published index -> param (fluentd and fluentbit)

I think this may be a good enough argument on its own that there should be 3 modes and that perhaps leaving it as xml might make sense as the default.

The complaints around 3 (what is currently the default) is typically that it relies on the order of xml elements to produce the array of maps. Lots of event types can contain the same xml elements but in different orders.

A user who wants to pull say "TargetUserSid" using method 3 and work with it using OTTL will have to either:

  1. Use a future find_map that searches an array of maps for a map with a key or a map based on the value of a key
  2. Track the index of "TargetUserSid" across every possible event and every windows version, create some sort of switch statement to pull the right index from the array to get this field

@pjanotti
Copy link
Contributor

One idea regarding a configuration option: a versioned data struct option seems a good choice in this case. It allows users to stay on a desired format while allowing possible future breaking changes. Each version represents relatively a small piece of code and having a handful doesn't seem a problem and we can always drop some less popular options later.

@strawgate
Copy link
Contributor Author

One idea regarding a configuration option: a versioned data struct option seems a good choice in this case. It allows users to stay on a desired format while allowing possible future breaking changes. Each version represents relatively a small piece of code and having a handful doesn't seem a problem and we can always drop some less popular options later.

Are you aware of any examples of this implemented across any component that I can stare at for a while?

I may start putting together a PR for how this might look, so any guidance would be lovely.

@pjanotti
Copy link
Contributor

@strawgate sorry, I don't know one from the top of my mind. @djaglowski may know some component doing that.

@strawgate
Copy link
Contributor Author

@swiatekm-sumo do you view the introduction of the current behavior as a setting the user would have to explicitly pick as acceptable re: breaking change? As long as the actual event_data can come in the previous format with that setting?

Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@strawgate
Copy link
Contributor Author

I'm still a bit stuck on this as using the current format requires taking a dependency on an unstable feature of the event_data format (the order of the values with no keys).

I would like to address this if possible via a configuration flag that determines how event_data is "map"ped. Are there any objections to me putting together a PR that offers the alternative format?

@github-actions github-actions bot removed the Stale label Sep 14, 2024
@pjanotti
Copy link
Contributor

@strawgate making a PR will, at minimum, move the discussion ahead, however, there is the obvious risk of not being accepted. If you think the risk is too high for your time investment, let me (re)review this item before you embark on that.

@djaglowski
Copy link
Member

I went looking at how this works across other tools, it looks like there are 3 main methods utilized:

  1. Leave it as XML (Loki, Splunk(?), NXlog)
  2. Turn it into a dictionary with something like param1 (telegraf, elastic)
  3. Make it an array of maps with published index -> param (fluentd and fluentbit)

I think this may be a good enough argument on its own that there should be 3 modes and that perhaps leaving it as xml might make sense as the default.

The complaints around 3 (what is currently the default) is typically that it relies on the order of xml elements to produce the array of maps. Lots of event types can contain the same xml elements but in different orders.

With #34720 merged, I think users will now have a simple way to opt into 1. (This was sort of true before, but the raw flag also controlled whether we fully rendered the event, so the event data may not have been present in the XML.)

I want to offer a 4th option, after having spent some time on #35281

  1. Convert EventData into the a map, where elements with a Name attributes are key/value pairs, and elements without a Name attributes are items in a list. The list can be a value in the map with a fixed key, e.g. Unnamed

Starting from the original example, say we have a few unnamed elements in the result:

    <EventData>
        <Data Name="ProcessId">7924</Data>
        <Data Name="Application">\device\harddiskvolume2\users\test\desktop\netcat\nc.exe</Data>
        <Data Name="SourceAddress">0.0.0.0</Data>
        <Data Name="SourcePort">5555</Data>
        <Data Name="Protocol">6</Data>
        <Data Name="FilterRTID">84614</Data>
        <Data Name="LayerName">%%14608</Data>
        <Data Name="LayerRTID">36</Data>
        <Data>val 1</Data>
        <Data>val 2</Data>
    </EventData>

The output would be:

  "event_data": {
    "ProcessId": "7924",
    "Application": "\\device\\harddiskvolume2\\users\test\\desktop\\netcat\\nc.exe",
    "SourceAddress": "0.0.0.0",
    "SourcePort": "5555",
    "Protocol": "6",
    "FilterRTID": "84614",
    "LayerName": "%%14608",
    "LayerRTID": "36"
    "Unnamed": [ "val 1", "val 2" ]
  }

We may need to support the old format (3) for backwards compatibility, but I'm wondering if 4 would be superior to 2.

@pjanotti
Copy link
Contributor

pjanotti commented Oct 1, 2024

One observation @djaglowski, there shouldn't be a mix of named and unnamed entries in the same event. That is related to the code used to generate the actual event. So when doing option 4, if possible, the map can be created with the generated names, perhaps matching the legacy event format, IIRC, %1, %2, ...

@strawgate
Copy link
Contributor Author

This matches my understanding too that the unnamed ones should be from XP/2003 events.

@pjanotti by generated names do you mean something like param1, param2, param3? Isn't that option 2?

@atoulme atoulme removed the needs triage New item requiring triage label Oct 11, 2024
@djaglowski
Copy link
Member

there shouldn't be a mix of named and unnamed entries in the same event

Given that, I would be in favor of adding a config flag, e.g. interpret_event_data, which would automatically detect which format is used, and convert into either a map (if names are present) or array otherwise. By making this a config flag, we are not breaking the current behavior again, but users can opt into the improved behavior if they are comfortable with the fact that events from multiple sources may have slightly different schema.

@pjanotti
Copy link
Contributor

Sorry, for the delay...

@strawgate

@pjanotti by generated names do you mean something like param1, param2, param3? Isn't that option 2?

Yes, correct - just using the place holder of the message in the old format.

@djaglowski just to confirm if I'm understanding your proposal correct: if interpret_event_data is false we keep the current format. If true then the code will create a map. If the "name" attribute is empty the keys on the map will be %1, %2, and so on. If the "name" attribute is not empty we will use that as the key in the map.

@djaglowski
Copy link
Member

if interpret_event_data is false we keep the current format.

If true then the code will create a map.

I was thinking it would be a map or a slice, depending on whether the name attribute is present.

If the "name" attribute is empty the keys on the map will be %1, %2, and so on.

This is where I would have said the result is a slice containing only the values.

If the "name" attribute is not empty we will use that as the key in the map.


That said, I'm not opposed to having a map w/ some kind of numbered keys. The benefit of this would be that the result is structurally for all events. I think you should make this decision, and the rest we seem to be on the same page about already.

@pjanotti
Copy link
Contributor

@djaglowski I do prefer that we always have a map, this should simplify any ottl processing.

@strawgate any preferences?

@strawgate
Copy link
Contributor Author

Consistent use of a map seems ideal to me

@djaglowski
Copy link
Member

I'm good with using a map consistently.

Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants