Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add total_diff to tlsh.c and tlsh.h #1

Open
dmknght opened this issue Jul 20, 2023 · 3 comments
Open

Add total_diff to tlsh.c and tlsh.h #1

dmknght opened this issue Jul 20, 2023 · 3 comments

Comments

@dmknght
Copy link

dmknght commented Jul 20, 2023

tlsh_impl_total_diff is a function to compare 2 tlsh hashes and gives the similarity score. In original code, it was declared in imp_tlsh.cpp and being called like this
image

In Avast's version, it's not being called. The feature is missing completely.

@HoundThe
Copy link
Member

Thanks for the report. You are right, this port is very trimmed version of the original implementation, mainly for purposes of adding the TLSH computation to YARA. This being said, I am not 100% sure if this and #2 features are wanted, as they are not important for TLSH computation, the scenarios I can now think of are:

  1. Processing and analysis later. For those goals, the original implementation can be used and would probably be more appropriate.
  2. Adding another function to YARA to make rules based on the hash distance instead of the exact match. If something like this is wanted by the community etc. I will add the functions to support that.

@dmknght
Copy link
Author

dmknght commented Jul 24, 2023

IMO these 2 features are important for hash comparison. I mean if I'm not getting the exact match wrong, it will be like the tlsh's score <= 0? So the result will be like this
image

Meanwhile, the diff score detection result is like this
image

So exact match is against the idea of finding similarity files, isn't it?

I think the yara rule can have API like this:
tlsh_diff(<hash from rule>, <score to compare>)

And because ELF module generates the TLsh when it parses the file, as I understand when I read Yara's source code, the pseudo code can be like this:

tlsh_diff(string: hash, int32: score)
{
struct Tlsh tlsh;
struct Tlsh yara_tlsh; // computed by Yara's Elf module
if tlsh.fromTlshString(hash) == 0:
  if tlsh.totalDiff(tlsh, &yara,tlsh) <= score:
     return YARA_TRUE;
return YARA_FALSE;
}

@dmknght
Copy link
Author

dmknght commented Sep 13, 2023

I forked the project and added 2 missing functions. Since Avast devs don't really interested in adding them, I hope my fork is useful for somebody
https://github.com/dmknght/tlshc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants