Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Map Extracted Files to Artifact Definitions in image_export.py #4949

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

sa3eed3ed
Copy link

@sa3eed3ed sa3eed3ed commented Dec 30, 2024

Feature: Map Extracted Files to Artifact Definitions in image_export.py

Description:

This PR adds an optional feature to Plaso's image_export.py tool to generate a JSON file mapping extracted files to the artifact definitions that led to their extraction. This mapping provides valuable context about the extracted files.

Functionality:

The new --enable_artifacts_map flag activates this feature. When enabled, the tool creates an artifacts_map.json file in the output directory. This file contains a dictionary where:

Keys: Artifact definition names (e.g., JupyterConfigFile, SshdConfigFile, WindowsEnvironmentVariableComSpec).
Values: Lists of extracted file paths (relative to the output directory) that matched the corresponding artifact definition.

plaso/scripts/image_export.py --artifact_filters JupyterConfigFile,SshdConfigFile \
--write /home/user/tmp --enable_artifacts_map --logfile /home/user/tmp/log.log \
--volumes all --partitions all /home/user/artifact_disk.dd

This command would produce an artifacts_map.json file similar to:

{
  "SshdConfigFile": ["etc/ssh/sshd_config"],
  "JupyterConfigFile": ["home/dummyuser/.jupyter/jupyter_notebook_config.py"]
}

This output indicates that the files etc/ssh/sshd_config and home/dummyuser/.jupyter/jupyter_notebook_config.py were extracted because they matched the SshdConfigFile and JupyterConfigFile artifact definitions, respectively.

Registry Artifacts:

For artifacts that rely on Windows Registry keys or values (e.g., WindowsEnvironmentVariableComSpec), the tool automatically extracts the relevant registry hive files (e.g., SYSTEM, SOFTWARE, NTUSER.DAT). The artifacts_map.json will map these hive files to both:

The artifact that directly triggered the hive's extraction (e.g., WindowsSystemRegistryFiles).
Any artifacts that rely on data within those hives (e.g., WindowsEnvironmentVariableComSpec).

Example with Registry Artifacts:
If you run image_export.py with --artifact_filters WindowsEnvironmentVariableComSpec, the artifacts_map.json might contain:

{
  "WindowsSystemRegistryFiles": [
    "System Volume Information/Syscache.hve",
    "Windows/System32/config/SAM",
    "Windows/System32/config/SECURITY",
    "Windows/System32/config/SOFTWARE",
    "Windows/System32/config/SYSTEM"
  ],
  "WindowsEnvironmentVariableComSpec": [
    "System Volume Information/Syscache.hve",
    "Users/Warren/AppData/Local/Microsoft/Windows/UsrClass.dat",
    "Users/Warren/NTUSER.DAT",
    "Windows/ServiceProfiles/LocalService/NTUSER.DAT",
    "Windows/ServiceProfiles/NetworkService/NTUSER.DAT",
    "Windows/System32/config/SAM",
    "Windows/System32/config/SECURITY",
    "Windows/System32/config/SOFTWARE",
    "Windows/System32/config/SYSTEM"
  ],
  "WindowsUserRegistryFiles": [
    "Users/Warren/AppData/Local/Microsoft/Windows/UsrClass.dat",
    "Users/Warren/NTUSER.DAT",
    "Windows/ServiceProfiles/LocalService/NTUSER.DAT",
    "Windows/ServiceProfiles/NetworkService/NTUSER.DAT"
  ]
}

This shows that the SYSTEM, SOFTWARE, and other hive files were extracted because of both WindowsSystemRegistryFiles and WindowsEnvironmentVariableComSpec, the mapped paths will be relative to the provided output path under the --write argument.

Technical Details:

The core of this feature is the ArtifactsTrie class, which stores artifact definition paths in a Trie (prefix tree) data structure.

Artifacts Trie Structure
  • Root Node: A special node that doesn't represent a path segment but has children for each unique path separator in the definitions.
  • Path Separator Nodes: Children of the root, representing path separators (e.g., /, ).
  • Other Nodes: Each node represents a path segment from an artifact definition.
  • Glob Handling: Glob patterns (like * and **) are stored as literal node keys.
  • Artifact Names: Nodes corresponding to the end of a valid artifact path store a list of associated artifact names in their artifacts_names attribute.
    Example Trie:
Root
├── / (path separator)
│   ├── Users
│   │   └── **
│   │       └── Downloads
│   │           └── *.pdf (artifacts_names: ["PDFDownloads"])
│   └── Windows
│       └── System32
│           └── config
│               └── SAM (artifacts_names: ["WindowsSAMRegistry"])
└── \ (path separator)
    └── Users
        └── *\
            └── AppData
                └── Local
                    └── test.ini (artifacts_names: ["LocalAppDataFiles"])
Matching Logic

Paths are normalized to use os.sep as the separator.
The GetMatchingArtifacts method traverses the Trie based on input path segments, using fnmatch.fnmatch for glob matching. ** is handled recursively to match zero or more directory levels.

Source Type Handling

When the input to the tool is:

  • Directory: A dfvfs.FileSystem object of type OS is created, with a dfvfs.FileSystemSearcher using the input directory as the mount point. The tool extracts files matching the FindSpec's criteria within this directory.
  • File: ExtractPathSpecs yields the input file path directly without searching, as it's assumed that a user-provided file path should be extracted.

Added safeguard check to exit and print if input is file, this tool can handle images, block devices and hierarchy of directories from the evidence system

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant