-
Notifications
You must be signed in to change notification settings - Fork 361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hash_tagging: Undefined behavior within Python threading can cause incomplete processing #4050
Comments
@william-billaud thx for the report, will take a closer look when time permits. |
It would be nice if you can handle the issue. It would help to support hashlookup integration in plaso. |
@adulau limited bandwidth atm, what is "hashlookup integration" ? |
It this is related to https://github.com/hashlookup/PyHashlookup it is not compatible, see https://github.com/log2timeline/l2tdocs/blob/main/process/Dependencies.md Also https://github.com/adulau/hashlookup-server, since this is AGPL we are unable to do end-to-end tests. |
Yes but there is no need to use the libraries mentioned and the server to do test and even use it. I wrote an example in BSD 2-clauses using the Bloom filter provided to avoid remote lookup. https://github.com/hashlookup/hashlookup-forensic-analyser Let me know if there is any issue. |
As long as the API stays the same of the server and what the end-to-end tests use |
I suppose we can close it. |
Why ? |
Description of problem:
In the hash_plugins analyzer, threads are not instantiated in the same process as the one in which they are executed. This pattern causes an undefined behavior of the
is_alive
function, depending on the OS/Python version (see snippets below)Therefore, in hash_tagging plugins, if the analysis queue is empty but the analyzer still has work to do, the analyzer will be killed, resulting in a partially executed task (for example, when the plaso database is really small, the analysis is not executed).
In my opinion, the best option is to instantiate the thread class just before it starts
plaso/plaso/analysis/hash_tagging.py
Lines 265 to 267 in f6a18bc
plaso/plaso/analysis/hash_tagging.py
Line 248 in f6a18bc
.The main disadvantage is that the TestConnection function called in the cli.helper.* classes would have to call class/static method.
Further description of the undefinied beahviour :
For example the following snippet (from https://stackoverflow.com/questions/57814933/is-alive-always-returns-false-when-called-on-a-thread-from-inside-multiprocess) result in different result deping of the os :
Command line and arguments:
psort.py --analysis nsrlsvr test.plaso
Plaso version:
Operating system Plaso is running on:
Installation method:
The text was updated successfully, but these errors were encountered: