-
Notifications
You must be signed in to change notification settings - Fork 31
Home
fact_extractor is offspring of FACT. Originally part of FACT_core the extraction was moved into its own project. The extraction is plugin based using a 1-n relationship regarding plugin and file_types. The idea is to have a custom utility for each container, archive or firmware update format.
The extractor then detects the file type automatically and chooses the correct unpack plugin.
The plugin concept is designed to make the extractor extendable. Depending on related work plugin code can be very short and simple. This can be seen for examples where a python library or binary tool exists, that already handles the extraction. The more complex part in these cases is to add a file type signature, where it does not exists in the standard linux file magic library.
See the readme for setup instructions.
See the readme for usage instructions.
❗ Before you start developing your own plug-ins, have a look at the FACT coding guidlines.
❓ If you have any questions or problems regarding plug-in development, do not hesitate to ask here or here.
All important information regarding coding of new plugins is collected in the plugin development wiki. If you like to contribute a plugin, you can simply fork fact_extractor and develop your plugin there. Alternatively you can also develop in private and later add the plugin as git submodule on other installations. Or develop using your own favourite license on GitHub. Adding submodules is a one liner:
git submodule add https://github.com/YOUR_REPO_PATH.git fact_extractor/plugins/unpacker/NAME_OF_YOUR_PLUG-IN
You can use your custom extraction plug-in within FACT by building your own docker image:
docker build -t fkiecad/fact_extractor:latest .
If your case needs a new file type signature, the fact type library has to be extended. This is typically the case if an unpacker for a firmware update format is developed. Common container and compression formats on the other hand are generally covered by the existing file library. Given that the fact type library is installed (via pip git+https) you can check if your file is detected with
$ python3 -c "from fact_helper_file import get_file_type_from_path;print(get_file_type_from_path('<path_to_your_file>'))"
Development of file signatures has to be done according to the magic man page. Signatures are stored in the fact_helper_file/mime
directory of the library.
A typical workflow then includes the steps:
- Detect a file format that can't be extracted with fact_extractor yet
- Reverse the format to find actionable magic for signature
- Develop a signature according to these guidelines
- Push to library (via fork)
- Increment library version
-
pip install -U git+https://github.com/<your_fork>/fact_helper_file.git
- (Optional) Create pull request to
fkie-cad/fact_helper_file
- (Optional) Create pull request to
💡 Some further help is provided in the libraries wiki
No one yet:
fact_extractor is so neat, I use it everyday
Some person:
Finally a usable extractor that does not pollute my system
Maybe a relative (?):
FACT was already cool, and now I can also use the extraction standalone? Whoa!