Feature: Map Extracted Files to Artifact Definitions in image_export.py #4949
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Feature: Map Extracted Files to Artifact Definitions in image_export.py
Description:
This PR adds an optional feature to Plaso's image_export.py tool to generate a JSON file mapping extracted files to the artifact definitions that led to their extraction. This mapping provides valuable context about the extracted files.
Functionality:
The new
--enable_artifacts_map
flag activates this feature. When enabled, the tool creates anartifacts_map.json
file in the output directory. This file contains a dictionary where:Keys: Artifact definition names (e.g.,
JupyterConfigFile
,SshdConfigFile
,WindowsEnvironmentVariableComSpec
).Values: Lists of extracted file paths (relative to the output directory) that matched the corresponding artifact definition.
This command would produce an
artifacts_map.json
file similar to:This output indicates that the files
etc/ssh/sshd_config
andhome/dummyuser/.jupyter/jupyter_notebook_config.py
were extracted because they matched theSshdConfigFile
andJupyterConfigFile
artifact definitions, respectively.Registry Artifacts:
For artifacts that rely on Windows Registry keys or values (e.g.,
WindowsEnvironmentVariableComSpec
), the tool automatically extracts the relevant registry hive files (e.g.,SYSTEM
,SOFTWARE
,NTUSER.DAT
). Theartifacts_map.json
will map these hive files to both:The artifact that directly triggered the hive's extraction (e.g.,
WindowsSystemRegistryFiles
).Any artifacts that rely on data within those hives (e.g.,
WindowsEnvironmentVariableComSpec
).Example with Registry Artifacts:
If you run
image_export.py
with--artifact_filters WindowsEnvironmentVariableComSpec
, theartifacts_map.json
might contain:This shows that the
SYSTEM
,SOFTWARE
, and other hive files were extracted because of bothWindowsSystemRegistryFiles
andWindowsEnvironmentVariableComSpec
, the mapped paths will be relative to the provided output path under the--write
argument.Technical Details:
The core of this feature is the ArtifactsTrie class, which stores artifact definition paths in a Trie (prefix tree) data structure.
Artifacts Trie Structure
Example Trie:
Matching Logic
Paths are normalized to use os.sep as the separator.
The
GetMatchingArtifacts
method traverses the Trie based on input path segments, usingfnmatch.fnmatch
for glob matching.**
is handled recursively to match zero or more directory levels.Source Type Handling
When the input to the tool is:
dfvfs.FileSystem
object of typeOS
is created, with adfvfs.FileSystemSearcher
using the input directory as the mount point. The tool extracts files matching the FindSpec's criteria within this directory.ExtractPathSpecs
yields the input file path directly without searching, as it's assumed that a user-provided file path should be extracted.Added safeguard check to exit and print if input is file, this tool can handle images, block devices and hierarchy of directories from the evidence system