Skip to content

tychon arp demonstration#8222

Closed
efd6 wants to merge 14 commits intoelastic:mainfrom
efd6:tychon_agentless_cleaned_arp
Closed

tychon arp demonstration#8222
efd6 wants to merge 14 commits intoelastic:mainfrom
efd6:tychon_agentless_cleaned_arp

Conversation

@efd6
Copy link
Contributor

@efd6 efd6 commented Oct 17, 2023

DISCUSSION ONLY DO NOT MERGE

There are still issues with this:

  1. destination, host, id and network are either non-ECS compliant or have child fields that are not ECS-compliant. Ideally, these would all be put under tychon and then have ECS fields populated from the relevant information there. As is, this cannot be merged.
  2. There is no way to currently retain the event.original despite the option in the agent config template. To do this the JSON parsing should be move to ingest so that the unparsed JSON text can be retained. I imagine, but don't know for sure if this is true, that the original data is single line JSON since it's coming from a log file.
  3. There is no system test (this would confirm/refute the assumption above and so is helpful even only in that way). This is also a blocker since as a log file input there is no reason not to have this present in the package.
  4. I think the data streams in general (and demonstrated here) stutter; it would be better if the datastream were arp instead of tychon_arp since that will show up to the user as tychon.tychon_arp in the datastream.dataset field in their index.

efd6 added 8 commits October 10, 2023 06:50
[git-generate]
find . -type f -exec dos2unix {} \;
[git-generate]
find . -type f -exec gsed -i -r 's/[[:space:]]+$//g' {} \;
[git-generate]
find . -type f -name '*.hbs' -exec gsed -i -r '$a\' {} \;
find . -type f -name '*.yml' -exec gsed -i -r '$a\' {} \;
- rely on generator to set header in README
- use current ECS version
- single quotes to reduce escape proliferation
- tag failable processors
- use non-escaping mustache braces
- use the timestamp in the document to set the @timestamp
- use canonical error message string
@elasticmachine
Copy link

elasticmachine commented Oct 17, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-10-18T03:45:15.397+0000

  • Duration: 15 min 15 sec

Test stats 🧪

Test Results
Failed 0
Passed 5
Skipped 0
Total 5

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

@elasticmachine
Copy link

elasticmachine commented Oct 17, 2023

🌐 Coverage report

Name Metrics % (covered/total) Diff
Packages 100.0% (1/1) 💚
Files 100.0% (1/1) 💚
Classes 100.0% (1/1) 💚
Methods 100.0% (12/12) 💚 7.65
Lines 100.0% (85/85) 💚 8.412
Conditionals 100.0% (0/0) 💚

@efd6 efd6 force-pushed the tychon_agentless_cleaned_arp branch from cf9aa03 to 048a218 Compare October 17, 2023 07:52
Assumes that the tychon log input is line-based.

Fields in tychon.arp need to be extracted out to ECS fields and field
definitions need to be fleshed out.

Sytem test now becomes trivial.
@joeperuzzi
Copy link

@efd6 Three of these items are somewhat straight forward

#2 There is no way to currently retain the event.original despite the option in the agent config template. To do this the JSON parsing should be move to ingest so that the unparsed JSON text can be retained. I imagine, but don't know for sure if this is true, that the original data is single line JSON since it's coming from a log file.
TYCHON ANSWER: - This item will be fixed across all data streams and tested.

#3 There is no system test (this would confirm/refute the assumption above and so is help even only in that way). This is also a blocker since as a log file input there is no reason not to have this present in the package.
TYCHON ANSWER: We are unaware of a "system test", we'll try to find it in the docs but if you could give us an example that would help us fix this.

#4 I think the data streams in general (and demonstrated here) stutter; it would be better if the datastream were arp instead of tychon_arp since that will show up to the user as tychon.tychon_arp in the datastream.dataset field in their index.
TYCHON ANSWER: We can update the folder names, however, this may run into issues with the transform and the index stream as they both create component templates and as of right now they are different, however, if we change the name of the datastream both the transform and the stream will have the same component pattern. Is this ok, or should we change the transform too?

#1 destination, host, id and network are either non-ECS compliant or have child fields that are not ECS-compliant. Ideally, these would all be put under tychon and then have ECS fields populated from the relevant information there. As is, this cannot be merged.
TYCHON ANSWER: This is going to be a huge issue across all of the data-streams as ECS is not very well covered in its dataset (its missing hundreds of data points) and I think forcing a user to have pre-defined knowledge when filtering which "destination" fields are ECS and which fall under "tychon.destiantion" would be extremely confusing. If there is no way around this we will need to update all of our rules, dashboards, and transforms as well as update third-party integrations. Extrernal integrations is the real issue here as we all have data that falls well outside ECS (it being so limited) and the only way to join our data together is with common field names. Third parties that send similar data in production environments won't be changing to use a "tychon." and therefore if we changed fields that aren't pre-defined in ECS then how do we advise customers to view our joint data in Elastic?

@efd6 efd6 force-pushed the tychon_agentless_cleaned_arp branch from 048a218 to 88955b4 Compare October 17, 2023 19:51
@efd6
Copy link
Contributor Author

efd6 commented Oct 17, 2023

  1. The approach that I have used here would allow the non-ECS fields to be present. See here under the tychon.arp field. The parts that can be mapped to ECS can be copied/moved there and tychon-specific fields remain visible.
  2. You can see how this is done here as well.
  3. I have been trying to get a system test to work, but have been having difficulties. I will push what I have, though it is failing. Note here that I also changed the input from logfile to the newer filestream (neither pass the tests at the moment, but given I had a fresh start I figured it would be best to use the newer and better input).
  4. The names in the transforms should also be changed IMO, and the relevant downstream consumers.

efd6 added 3 commits October 18, 2023 06:39
- add system test infrastucture (not passing yet)
- convert from logfile to filestream input
- use more conventional field definition structure
logs require trailing new lines for... reasons.
@efd6 efd6 force-pushed the tychon_agentless_cleaned_arp branch from b2c0fca to ee62547 Compare October 18, 2023 03:43
efd6 added 2 commits October 18, 2023 14:14
at some point there were corrupted (probably in the file sanitisation steps)
@efd6 efd6 force-pushed the tychon_agentless_cleaned_arp branch from ee62547 to 8430e26 Compare October 18, 2023 03:44
@efd6
Copy link
Contributor Author

efd6 commented Oct 18, 2023

@joeperuzzi I have the system tests functioning now and have added back the preserve_original_event functionality and added the preserve_duplicate_custom_fields logic that can be added to when the ECS mapping is done.

@joeperuzzi
Copy link

@efd6 Thanks!

These are the universal changes I caught that we'll apply across the rest of the project streams from your code push:

Items we can easily take care of:

  1. Use single quotes vice double quotes in our yml
  2. Update our ECS schema version from 8 to 10
  3. Add tags to our pipeline items that support this field
  4. Update our pipeline error to show more data
  5. Add the docker test file (this was already underway but a good example, thanks)
  6. Add preserve_duplicate_custom_fields (we already added the preserve_original_event) and add to the test config file
  7. Add "remove" items to our pipelines for the preserve tag items
  8. Add beats.yml file to our fields folder
  9. Add prospector.scanner.exclude_files: [".gz$"] to stream file

Items that are easy changes but need more testing from our QA team:
1, Add start and date_detection: true to all transforms (Just want to confirm this doesn't affect dashboards)
2. Add publisher_pipeline.disable_host: true (I think this is going to wreak havoc on our dashboards as we're using elastic-provided host info)
3. Change our streams to filestream from log in our manifest files (the way our data writes to the ndjson files could have an impact on the parser so this will trigger regression testing)

These items need more in-depth discussion as these types of changes will have drastic effects to the integration for our own items, the integration with Elastic Defend, and other third-party integrations using this data

  1. Having all of our data starting with "tychon." really doesn't seem to be aligned to the spirit of ECS: "ECS tries to model information by using the name of concepts, and avoids proper names such as tool names, project names or company names". I think if we go down this route for all our streams it's going to be extremely difficult to join our data together, filter between views and properly perform triggers/alerts. If you were unaware TYCHON is an OEM partner and we have a larger integration with Elastic that's currently deployed to millions of systems. This is a subset of that larger integration that uses these same fields, having these fields changed here will impact downstream to the rest.

@botelastic
Copy link

botelastic bot commented Nov 17, 2023

Hi! We just realized that we haven't looked into this PR in a while. We're sorry! We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1. Thank you for your contribution!

@botelastic botelastic bot added the Stalled label Nov 17, 2023
@botelastic
Copy link

botelastic bot commented Dec 17, 2023

Hi! This PR has been stale for a while and we're going to close it as part of our cleanup procedure. We appreciate your contribution and would like to apologize if we have not been able to review it, due to the current heavy load of the team. Feel free to re-open this PR if you think it should stay open and is worth rebasing. Thank you for your contribution!

@botelastic botelastic bot closed this Dec 17, 2023
@efd6 efd6 deleted the tychon_agentless_cleaned_arp branch February 5, 2025 22:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants