Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fileinfo produces output that cannot be decoded with UTF-8 #82

Closed
s3rvac opened this issue Jan 8, 2018 · 2 comments
Closed

fileinfo produces output that cannot be decoded with UTF-8 #82

s3rvac opened this issue Jan 8, 2018 · 2 comments

Comments

@s3rvac
Copy link
Member

s3rvac commented Jan 8, 2018

When fileinfo is run over the attached 64-bit ELF in the verbose mode, it produces output that cannot be decoded with UTF-8.

Input

$ fileinfo -v FILE > output.txt

where FILE is 1F9AEA9C8C3A952C86E70BB1BB086AD3FA763F6231C5CDCBA887681403030582.

Output

When I try to decode the produced output with UTF-8, the decoding fails as there is an invalid character:

$ python3.6
>>> open('output.txt', 'r', encoding='utf-8').read()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.6/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 37539: invalid start byte

The invalid character is the last non-whitespace character on the following line (below, it is decoded with CP1250):

16   String table offset of name of needed library (DT_NEEDED)   0x09b0    libm.s’

It's ordinal value is 146 (0x92 hexa).

Expected output

fileinfo produces output that can be decoded with UTF-8. Currently, it is supposed to always produce ASCII output, which is decodable with UTF-8.

Configuration

  • Commit: 8c4b23d (current master)
  • 64b Arch Linux, GCC 7.2.1, Debug build of RetDec
@s3rvac
Copy link
Member Author

s3rvac commented Jan 29, 2018

@mbandzi mbandzi closed this as completed Mar 2, 2018
@mbandzi mbandzi reopened this Mar 2, 2018
@mbandzi
Copy link
Contributor

mbandzi commented Mar 2, 2018

Non-printable chars are now replaced with hexadecimal codes as in other parts of retdec-fileinfo.
Fixed in 5eab1d53d7.

@mbandzi mbandzi closed this as completed Mar 2, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants