Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fileinfo produces different order of detected tools on macOS #262

Closed
s3rvac opened this issue Mar 31, 2018 · 4 comments
Closed

fileinfo produces different order of detected tools on macOS #262

s3rvac opened this issue Mar 31, 2018 · 4 comments

Comments

@s3rvac
Copy link
Member

s3rvac commented Mar 31, 2018

fileinfo produces different order of detected tools on macOS than on Linux/Windows.

Input

Run

$ retdec-fileinfo UPX121_C_small.7

where UPX121_C_small.7 comes from our regression-test suite.

Output

Linux:

...
Detected tool            : UPX (1.21 [NRV2B] [Filter: 0x26, Param: 0x7]) (packer), strings heuristic
Detected tool            : Microsoft Linker (6.0) (linker), combined heuristic
...

macOS:

...
Detected tool            : Microsoft Linker (6.0) (linker), combined heuristic
Detected tool            : UPX (1.21 [NRV2B] [Filter: 0x26, Param: 0x7]) (packer), strings heuristic
...

As you can see, the two lines are swapped.

Expected output

Linux:

...
Detected tool            : UPX (1.21 [NRV2B] [Filter: 0x26, Param: 0x7]) (packer), strings heuristic
Detected tool            : Microsoft Linker (6.0) (linker), combined heuristic
...

macOS:

...
Detected tool            : UPX (1.21 [NRV2B] [Filter: 0x26, Param: 0x7]) (packer), strings heuristic
Detected tool            : Microsoft Linker (6.0) (linker), combined heuristic
...

That is, the output should be the same on all systems.

Notes

This difference in the order of the detected tools is the reason why the following regression test fails on macOS:

======================================================================
FAIL: test_correctly_analyzes_input_file (tools.fileinfo.detection.packers.upx-older.Upx121Test)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "retdec-regression-tests/tools/fileinfo/detection/packers/upx-older/test.py", line 13, in test_correctly_analyzes_input_file
	self.assertEqual(self.fileinfo.output['Detected tool'][0], 'UPX (1.21 [NRV2B] [Filter: 0x26, Param: 0x7]) (packer), strings heuristic')
AssertionError: 'Microsoft Linker (6.0) (linker), combined heuristic' != 'UPX (1.21 [NRV2B] [Filter: 0x26, Param: 0x7]) (packer), strings heuristic'
- Microsoft Linker (6.0) (linker), combined heuristic
+ UPX (1.21 [NRV2B] [Filter: 0x26, Param: 0x7]) (packer), strings heuristic

Configuration

  • Commit: ae9a6d4 (current master)
  • 64b Arch Linux, GCC 7.3.1, Release build of RetDec
  • mac OS 10.13, Apple LLVM version 9.0.0, Release build of RetDec
@s3rvac
Copy link
Member Author

s3rvac commented Apr 1, 2018

I have analyzed this. During the ordering of the detected tools by compareForSort() in src/cpdetect/compiler_detector/compiler_detector.cpp, when UPX 1.21 [NRV2B] [Filter: 0x26, Param: 0x7] is compared to Microsoft Linker 6.0, it returns 0 on Linux (correct) but 1 on macOS (incorrect). This result is obtained by the following comparison:

131   // Prefer heuristic
132   return b.source == DetectionMethod::SIGNATURE;

Here are the values of source for both of the tools:

UPX 1.21 [NRV2B] [Filter: 0x26, Param: 0x7]      DetectionMethod::STRING_SEARCH_H
Microsoft Linker 6.0                             DetectionMethod::COMBINED

What seems fishy to me is that the comparison result of two tools (which is, in fact, the result of a < b, where a and b are the two compared tools) depends only on of the tools (in this case b). @mbandzi, shouldn't the comparison take into account both of the tools? Otherwise, the result heavily depends on the order of the tools passed to compareForSort(). Since the standard-library implementations on Linux and macOS use different sorting algorithms for std::sort(), we get different orderings of the tools as on Linux, UPX is a, but on macOS, UPX is b.

@mbandzi
Copy link
Contributor

mbandzi commented Apr 1, 2018

This function is quite problematic indeed, as it considers a lot of attributes with different priorities which leads to huge branching.

I believe in this case we consider both detections equal so both a < b and b < a should return false. We do not care what a is unless it is DetectionMethod::SIGNATURE in which case program takes other branch and should not reach this point.

I can think of two potential fixes:

  1. One of the systems does not use stable sorting algorithm so std::stable_sort could help here.
  2. Both systems use stable algorithm and source of non-determinism is somewhere else and should be fixed there.

@s3rvac
Copy link
Member Author

s3rvac commented Apr 1, 2018

One of the systems does not use stable sorting algorithm so std::stable_sort could help here.

Yes, using std::stable_sort() instead of std::sort() leads to the same results on both systems. So, if stability is desirable, I suggest switching to std::stable_sort() as std::sort() is not guaranteed to be stable.

@mbandzi mbandzi self-assigned this Apr 1, 2018
@mbandzi
Copy link
Contributor

mbandzi commented Apr 1, 2018

Fixed in ce25b14ce.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants