Skip to content
Open
Show file tree
Hide file tree
Changes from 21 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
f38a21e
commit to share and get feedback
brian-mckinney Mar 6, 2025
a7e3a1c
python script for reading package definitions and dumping to json
brian-mckinney Mar 7, 2025
edc8244
committing work for feedback
brian-mckinney Mar 10, 2025
c9f6d3d
remove accidental markdown commit
brian-mckinney Mar 10, 2025
5d68388
update output format
brian-mckinney Mar 10, 2025
ed5e013
updated output
brian-mckinney Mar 10, 2025
b6f7372
stashing
brian-mckinney Mar 21, 2025
8e768a1
stashing
brian-mckinney Mar 21, 2025
57b677f
custom documentation markdown
brian-mckinney Mar 25, 2025
16b62ba
move the samples
brian-mckinney Mar 25, 2025
645575d
sample readme
brian-mckinney Mar 25, 2025
a2d320d
Updated Documentation
brian-mckinney Mar 27, 2025
4acb79b
override file
brian-mckinney Mar 27, 2025
a4a294e
generation scripts
brian-mckinney Mar 27, 2025
4268dc6
Merge remote-tracking branch 'origin/main' into mckinney_custom_docs
brian-mckinney Mar 27, 2025
0e54605
documentation variant 1
brian-mckinney Mar 27, 2025
15b4ea0
updated variant
brian-mckinney Mar 27, 2025
c34576f
Readme, code cleanup, comments, etc
brian-mckinney Mar 28, 2025
7033137
update the generated markdown
brian-mckinney Mar 28, 2025
2dea8cf
don't include examples in the code PR
brian-mckinney Mar 28, 2025
3ea7f6e
cleanup
brian-mckinney Mar 28, 2025
6b628af
PR Feedback
brian-mckinney Apr 4, 2025
48f6d68
update markdown generation
brian-mckinney Apr 4, 2025
ff7a109
fix overrides, fix missing csv
brian-mckinney Apr 4, 2025
1657f15
Merge branch 'main' into mckinney_custom_docs
brian-mckinney Jun 2, 2025
cc0ffef
better markdown generation, updated overrides
brian-mckinney Jun 4, 2025
355f75f
some typing fixes
brian-mckinney Jun 4, 2025
db3b98f
Merge remote-tracking branch 'origin/main' into mckinney_custom_docs
brian-mckinney Jun 4, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,4 @@ vendor/
generated/
.DS_Store
*.swp
*.pyc
22 changes: 22 additions & 0 deletions custom_documentation/src/documentation_overrides.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
- name: Endpoint.policy.applied.artifacts.global.channel
overrides:
default:
description: The channel of the artifact.
example: default example
type: keyword
os:
linux:
description: The channel of the linux artifact.
example: stable
windows:
description: The channel of the windows artifact.
macos:
description: The channel of the macos artifact.
event:
linux_malicious_behavior_alert:
description: The channel of the artifact for linux malicious behavior alert.
example: stable
- name: agent.type
overrides:
default:
example: endpoint
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the overrides is a bad idea. If you'd like to change some description, please change it at the source. Correct me if I'm wrong, it's an endpoint-package, everything here is addressing Endpoint and should be consistent. BTW, I guess the filebeat was some copy-paste from ECS. Changing it LGTM, but you missed Filebeat in the description having this example.

I think we need something like documentation_supplement.yaml which will fill-in the gaps. The package contains only fields (and their descriptions, examples) which need to be index mapped so that they can be used in KQL and other places in Kibana. However custom documentation lists all possible fields per document, some of them may not be mapped. The supplement should error out if a conflict (override attempt) is detected.

I've tried running this utility with --csv missing.csv (thanks for this option!) but it didn't produce anything. Indeed I can't find an example field in the existing documentation which is not mapped. It looks like we eventually mapped everything, but I'm pretty sure it's not intended or required. @ferullo I think you are explained this to me

I'd say the script should indicate a failure if any field with missing description was generated. The description should be provided in the package or in the proposed supplement file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the feedback. I think this makes sense, but I was asked to add the override option. I'm not sure if there is a reason you and I are not thinking of right now.

Additionally, I think that adding this to the official elastic documentation would be great, but probably out of scope for this PR since this documentation is already referenced on the official docs page. I'd be happy to make a follow on issue for that though.

I'd like to hear what @ferullo @pzl and @nfritts think about these two things.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have an objection to the ability to override. For mapped fields that we have defined (i.e. they are not ecs), we are creating their description and info in custom_schemas. Overriding those don't make a lot of sense if we can put it in the source we control.. but if it gets us out of any jams to re-describe that field in a per-platform manner, then it seems fine.

For ecs-sourced fields, I'm also not opposed. ECS will be presenting very generalized information applicable to any integration or source. We will likely have more & specialized information about that field's usage in our context. Our example values can be much more informative.

We also haven't updated ECS since release v8.10.0. So the information is stale anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, if we want the override ability, I'd feel more comfortable with it if it was defined as override "what" "with", i.e. explicitly put in the override definition current expected value to be overridden with the new one (and error if it doesn't match). This way reading/reviewing the override will became easier.

also, additional thought about the non-mapped fields. If I'm right about them, then it would be nice to tag those explicitly in the documentation to make it clear you can't search for a document using this field and/or it's lost when the index is stripped to syntetic source.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

obviously we can add the asciidoc output later! just wanted to mention that documentation can be updated at any time, even retroactively, so we are able to generate our events documentation for past versions and put it online removing the link to git

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for overrides key? It seems like an unneeded level of nesting.

Standard documentation happens at the level of the index, so it can't be more specific than the index itself. Custom documentation's purpose is to document events based on each event type as a human understands them. To me it makes sense to be able to have more specific documentation for a given field when that documentation is for each event rather than the index level. (Also, can you override documentation for ECS fields in the package? I thought that had to match the ECS documentation that is "scaled" even broader to all indexes).

As a specific case, consider event.action. It's documentation is what's below, which is pretty broad.

The action captured by the event.

This describes the information in the event. It is more specific than event.category. Examples are group-add, process-started, file-created. The value is normally defined by the implementer.

When documenting it for Defend/Endpoint it could be re-written to drop "The value is normally defined by the implementer." or to list all possible actions values or to even list all action values per data stream if the package infrastructure allows it. But without overrides at the level here how would you document that in a Defend/Endpoint Windows Device Mount event that event.action is always "mount". In other words, the best documentation for a Windows Device Mount event's event.action field is "The action captured by the event. This will always be the string 'mount'".

I'm not saying that level of documentation work should be undertaken but I can't see why it would be undesirable to be able to have overrides like that when needed/appropriate.


Additionally, I think that adding this to the official elastic documentation would be great, but probably out of scope for this PR since this documentation is already referenced on the official docs page. I'd be happy to make a follow on issue for that though.

I don't know the original requirements but I agree it seems out of scope.

Copy link
Contributor

@intxgo intxgo Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

however, I'm still chewing on this. I'd lean towards keeping even the original ECS description, maybe tagging it also as ECS for clarity, and if really needed just append Endpoint specific documentation line and/or example. How that sounds?

EDIT, so let's draw an example:

image

Copy link
Contributor

@intxgo intxgo Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @ferullo I see we have similar thoughts, yup I've also just asked about the level of details we want in a follow up review comment just above this thread https://github.com/elastic/endpoint-package/pull/606/files#r2027242535
I like the idea of clarifying the Endpoint content of generic ECS fields you highlighted with examples.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I like the idea of keeping the original ECS description but also specify the the endpoint override. It seems pretty natural to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Endpoint specific fields, where there's no ECS counterpart, could be Origin | Endpoint package

we didn't clarify yet the need of ability to add supplement description for non-mapped fields. For this release it's not even needed as everything seems to be mapped, but I'm curious if it's something TODO, or old things are no longer applicable and everything is expected to be mapped

54 changes: 54 additions & 0 deletions scripts/generate-docs/pydocgen/Readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# Custom Documentation Generator

## Description

This module generates documentation for the custom endpoint fields defined in [custom_documentation](../../../custom_documentation/)

### Background

The fields defined in [custom_documentation](../../../custom_documentation/) do not have descriptions. They are simply the possible fields
of an event, including all the custom fields Endpoint uses but are not mapped.

The fields defined in [package](../../../package/) are the fields that are mapped into Kibana. These fields have descriptions and documentation.


### Implementation

This python module generates markdown for all of the fields in [custom_documentation](../../../custom_documentation/) by taking the following steps

1. Parses all of the mapped fields defined in [package](../../../package/), collecting descriptions, examples, and other metadata

2. Parses any override fields defined in [documentation_overrides.yaml](../../../custom_documentation/src/documentation_overrides.yaml)
- overrides can be set for any field. They can be set at the event level, the os level, or a default override that applies to all
instances of that field.
- See [documentation_overrides.yaml](../../../custom_documentation/src/documentation_overrides.yaml) for the format
- If overrides are updated, the documentation must be regenerated

3. Puts all of that data into an sqlite database

4. Parses all of the endpoint fields defined in [custom_documentation](../../../custom_documentation/)

5. Iterates over the custom_documentation data, filling out descriptions and examples pulled from the database that was just created.

### Example Usage
`python -m pydocgen --output-dir /path/to/output`

#### Help statement
```
usage: __main__.py [-h] [--database DATABASE] [--no-cache] [--output-dir OUTPUT_DIR] [-v] [-l {DEBUG,INFO,WARNING,ERROR,CRITICAL}] [--csv CSV]

Create markdown documentation for the fields defined in custom_documentation

options:
-h, --help show this help message and exit
--database DATABASE path to the database
--no-cache do not use cached database if it exists, always regenerate the database
--output-dir OUTPUT_DIR
output directory for markdown documentation
-v, --verbose Force maximum verbosity (DEBUG level + detailed output)
-l {DEBUG,INFO,WARNING,ERROR,CRITICAL}, --log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}
Set logging verbosity level
Comment on lines +49 to +50
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
-l {DEBUG,INFO,WARNING,ERROR,CRITICAL}, --log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}
Set logging verbosity level
-l, --log-level [DEBUG,INFO,WARNING,ERROR,CRITICAL]
Set logging verbosity level

nitipck

--csv CSV Path to CSV file for missing documentation fields (optional)

Example usage: python -m pydocgen --output-dir /path/to/output
```
Empty file.
104 changes: 104 additions & 0 deletions scripts/generate-docs/pydocgen/__main__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
import argparse
import logging
from logging import config
import pathlib
import traceback
import sys
import tempfile

from .markdown import generate_custom_documentation_markdown

from .models.custom_documentation import DocumentationOverrideMap

from typing import Literal


def configure_logging(
log_level: Literal["DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"],
verbose: bool
) -> None:
"""Configures the logging system with specified level and verbosity.

Args:
log_level: String representation of logging level (DEBUG, INFO, etc.)
verbose: Boolean flag to force maximum verbosity
"""
level = getattr(logging, log_level)

# If verbose is specified, override to DEBUG level
if verbose:
level = logging.DEBUG

# Basic config with both handlers
logging.basicConfig(
level=level,
format="%(asctime)s - %(levelname)-8s %(message)s",
datefmt="%Y-%m-%d %H:%M:%S",
)


def main():
parser = argparse.ArgumentParser(
description="Create markdown documentation for the fields defined in custom_documentation",
epilog="Example usage: python -m pydocgen --output-dir /path/to/output",
)

parser.add_argument(
"--database",
default=pathlib.Path(tempfile.gettempdir()) / "generate-docs.sqlite",
type=pathlib.Path,
help="path to the database",
)

parser.add_argument(
"--no-cache",
action="store_true",
help="do not use cached database if it exists, always regenerate the database",
)

parser.add_argument(
"--output-dir",
default=pathlib.Path.cwd().resolve() / "output",
type=pathlib.Path,
help="output directory for markdown documentation",
)
parser.add_argument(
"-v",
"--verbose",
action="store_true",
help="Force maximum verbosity (DEBUG level + detailed output)",
)

parser.add_argument(
"-l",
"--log-level",
type=str.upper,
choices=["DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"],
default="INFO",
help="Set logging verbosity level",
)

parser.add_argument(
"--csv",
type=pathlib.Path,
default=None,
help="Path to CSV file for missing documentation fields (optional)",
)

args = parser.parse_args()

configure_logging(args.log_level, args.verbose)

if args.no_cache and args.database.exists():
logging.info(f"Removing existing database {args.database} since --no-cache was specified")
args.database.unlink()

generate_custom_documentation_markdown(args.database, args.output_dir)
logging.info(f"Generated markdown documentation to {args.output_dir}")

if __name__ == "__main__":
try:
main()
except Exception as e:
traceback.print_exc()
sys.exit(1)
208 changes: 208 additions & 0 deletions scripts/generate-docs/pydocgen/database.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,208 @@
import pathlib
import logging

from sqlmodel import SQLModel, Field, create_engine, Session, select, Relationship
from sqlalchemy import Engine, Column, JSON

from .models.custom_documentation import DocumentationOverrideMap
from .models.packages import Package, PackageList

from typing import Optional


#
# These models represent the database tables for mapped fields
#
class PackageReference(SQLModel, table=True):
__tablename__ = "package_references"
id: Optional[int] = Field(default=None, primary_key=True)
package_data: Optional[str] = Field(default=None, sa_column=Column(JSON))


class PackageField(SQLModel, table=True):
"""
PackageField represents a specific field as defined in package/endpoint/datastream/{type}/fields/fields.yml
each in fields.yml has a name and description, this class holds the name, description, and reference to the parent package.
These fields will be used to provide descriptions for the fields in the custom documentation.

Note: this is the database table definition for the Package class defined in models/packages.py

Args:
SQLModel: this is a SQLModel class (database table)
table: Defaults to True.

Raises:
ValueError: _description_

Returns:
_description_
"""

__tablename__ = "package_fields"
id: Optional[int] = Field(default=None, primary_key=True)
name: str
description: str
example: Optional[str] = None
package_reference_id: Optional[int] = Field(foreign_key="package_references.id")
package_reference: Optional[PackageReference] = Relationship()

@property
def package(self) -> Package:
if not self.package_reference:
raise ValueError(f"PackageReference is not set for PackageField {self}")
return Package.model_validate_json(self.package_reference.package_data)


#
# These models reprensent the database tables for overrides
#
class OverrideField(SQLModel, table=True):
__tablename__ = "overrides"
id: Optional[int] = Field(default=None, primary_key=True)
description: Optional[str] = None
example: Optional[str] = None
type: Optional[str] = None


class OverrideRelationship(SQLModel, table=True):
__tablename__ = "override_relationships"
id: Optional[int] = Field(default=None, primary_key=True)
name: str
event: Optional[str] = None
os: Optional[str] = None
default: bool = False
override_id: int = Field(foreign_key="overrides.id")
override: OverrideField = Relationship(sa_relationship_kwargs={"lazy": "joined"})


def populate_overrides(session: Session):
dom = DocumentationOverrideMap.from_yaml()
for name, mapping in dom.items():
if mapping.os:
for os, override in mapping.os.items():
record = OverrideField(
description=override.description,
example=override.example,
type=override.type,
)
session.add(record)
session.flush()

related_record = OverrideRelationship(
name=name, os=os, override_id=record.id
)
session.add(related_record)

if mapping.event:
for event, override in mapping.event.items():

record = OverrideField(
description=override.description,
example=override.example,
type=override.type,
)
session.add(record)
session.flush()

related_record = OverrideRelationship(
name=name, event=event, override_id=record.id
)
session.add(related_record)

if mapping.default:
record = OverrideField(
description=mapping.default.description,
example=mapping.default.example,
type=mapping.default.type,
)
session.add(record)
session.flush()

related_record = OverrideRelationship(
name=name, default=True, override_id=record.id
)
session.add(related_record)

session.commit()


def populate_packages_fields(session: Session):
"""
populate_packages_fields populates the package fields in the database

Args:
session: database session
"""

def add_to_db(field: PackageField, session: Session):
existing_field = session.exec(
select(PackageField).where(PackageField.name == field.name)
).first()
if existing_field:
if existing_field.description != field.description:
raise ValueError(
f"Field {field.name} already exists with different description"
)
else:
logging.debug(f" Adding field {field.name}")
session.add(field)

package_list = PackageList.from_files()
for package in package_list:
logging.debug(f"Adding package fields for {package.filepath}")
package_ref = PackageReference(package_data=package.model_dump_json())
session.add(package_ref)
session.flush()
for field in package.fields:
if field.fields:
for sub_field in field.fields:
name = f"{field.name}.{sub_field.name}"
add_to_db(
PackageField(
name=name,
description=sub_field.description,
package_reference_id=package_ref.id,
example=sub_field.example,
),
session,
)
else:
add_to_db(
PackageField(
name=field.name,
description=field.description,
package_reference_id=package_ref.id,
example=field.example,
),
session,
)
session.commit()


def getDatabase(db_path: pathlib.Path) -> Engine:
"""
getDatabase creates a database if it does not exist, otherwise it uses the existing database

This stores the documentation in package/endpoint/data_stream in a lightweight SQLite database. We will
use this when generating markdown documentation for the fields defined in the custom_documentation.

overrides are also added to the database here.

Args:
db_path: path to the database

Returns:
database Engine
"""
if db_path.exists():
logging.info(f"Using existing database at {db_path}")
return create_engine(f"sqlite:///{db_path}")

logging.info(f"Creating database at {db_path}")
engine = create_engine(f"sqlite:///{db_path}")
SQLModel.metadata.create_all(engine)
with Session(engine) as session:
populate_packages_fields(session)
populate_overrides(session)
session.commit()
return engine
Loading