Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fileformat does not generate imphash for ELF binaries #286

Closed
metthal opened this issue Apr 30, 2018 · 4 comments
Closed

Fileformat does not generate imphash for ELF binaries #286

metthal opened this issue Apr 30, 2018 · 4 comments

Comments

@metthal
Copy link
Member

metthal commented Apr 30, 2018

The problem with generating imphash for ELFs lies in the absence of information about imported libraries. The only information we can obtain is the name of the symbol. Since there are no library names, no imphash is generated because it relies on it. Generating imphash for ELFs requires appropriate modification of the algorithm while retaining the old behavior for PEs and Mach-Os (#285).

I propose leaving out the library name out of the hashed data together with the separator (I think it's .). I advise to taking care of #285 before this ticket.

@HoundThe
Copy link
Member

I was researching the ELF imphash. From what I've found, this seems like a well-specified algorithm, that is being already used (by Virustotal for example) - Telfhash
Is that a good option to implement into a Fileinfo?

@s3rvac
Copy link
Member

s3rvac commented Jan 2, 2021

It is certainly possible. However, in that case, I would probably vote for generating it as a separate type of a hash, which would allow us to have both imphash (similar to all file types) and telfhash (only for ELFs). Also, whether supporting telfhash is worth the effort is up to discussion.

FWIW, here is a Python implementation, which internally uses tlsh (a C++ library and tools, including Python bindings). If we decide to give telfhash a try, we should consider using tlsh so that we do not have to re-implement everything by ourselves.

@HoundThe
Copy link
Member

HoundThe commented Jan 2, 2021

Sure, we can use the telfhash algorithm to create the import string, then hash with already used hashing functions and if it's going to be worth it, implement also the tlsh or use the existing library. I am unsure if the tlsh license can be an issue.

PeterMatula added a commit to avast/retdec-regression-tests that referenced this issue Apr 14, 2021
@PeterMatula
Copy link
Collaborator

Solved in #936.

Added hashes:

  • The main telfhash on file level - because this is equivalent to Trend Micro's algorithm also used on VT, which computes it from all the symbols (normalized!), not only imports:
    image
  • Specific ELF import table hashes implementation @ elf_import_table.cpp. Including another tlsh hash, which is different from the VT compatible hash - because we can get more imports than the original Trend Micro implementation, this hash can be present even when the original is not, also exports are not used here, etc. It was not prescribed how to normalize these data, so we can change it if you don't like this. Other hashes are also computed here.
    image
  • tlsh import table hash is also computed in the original PE/Mach-O implementation, because why not, but I don't think this is that useful.

See these 2 tests for JSON structure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants