Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bash_history parser failing with 'utf-8' codec can't decode byte 0xba in position #3298

Closed
madsumm opened this issue Nov 9, 2020 · 12 comments
Assignees
Labels

Comments

@madsumm
Copy link

madsumm commented Nov 9, 2020

Description of problem:

Linux parsers used or individual parser ("bash_history") used on a linux E01 file.
Other user's bash-history is passed but not ROOT.

Command line and arguments:

log2timeline.py --parsers linux (or bash_history) --process-archives <plasofile> <sourceE01>

Plaso version:

For example 20201007

Operating system Plaso is running on:

Ubuntu 20.04.1 Desktop version

Installation method:

Standard add-repository and "apt install plaso-tools"

**Others

Tried earlier versions (late 2019), similar results.
The E01 files are manually extracted from LVM2 volumes and can be mounted as in Linux.
Retried plaso on directory as source, result is the same.

@joachimmetz
Copy link
Member

@madsumm your description is very hard to follow, can you take a bit of time to explain your issue.

Do all the bash_history files follow the supported format? Also see: https://github.com/log2timeline/plaso/blob/master/test_data/bash_history

@joachimmetz
Copy link
Member

Tried earlier versions (late 2019), similar results.

Also note that Plaso versions older than 6 months are considered out dated.

@madsumm
Copy link
Author

madsumm commented Nov 9, 2020

@madsumm your description is very hard to follow, can you take a bit of time to explain your issue.

Do all the bash_history files follow the supported format? Also see: https://github.com/log2timeline/plaso/blob/master/test_data/bash_history

@joachimmetz , the format is the same as you had stated.
When I look at the ingest, I did see it saw the file /root/.bash_history, but did not parse it.

However, in the same partition, the /home//.bash_history is properly parsed.

#1590154640
cd /config/
#1590154641
ls

Tried earlier versions (late 2019), similar results.

Also note that Plaso versions older than 6 months are considered out dated.

I did the previous version just to test the behaviour.

@joachimmetz
Copy link
Member

I did see it saw the file /root/.bash_history, but did not parse it.

Any warnings about why? Did you mount the file system via the operation system? Do you have access to the files?

@madsumm
Copy link
Author

madsumm commented Nov 9, 2020

I did see it saw the file /root/.bash_history, but did not parse it.

Any warnings about why? Did you mount the file system via the operation system? Do you have access to the files?

Yes I mounted the E01 volume and was able to see the content of the file. No issues.
The above 4 lines are the 1st 4 lines of the /root/.bash_history file.

@joachimmetz
Copy link
Member

joachimmetz commented Nov 9, 2020

The above 4 lines are the 1st 4 lines of the /root/.bash_history file.

Tested the example in isolation:

2020-05-22T13:37:20+00:00,Content Modification Time,LOG,Bash History,Command executed: cd /config/,bash_history,OS:bash_history.test,-
2020-05-22T13:37:21+00:00,Content Modification Time,LOG,Bash History,Command executed: ls,bash_history,OS:bash_history.test,-
2020-11-09T09:19:49.428684+00:00,Content Modification Time,FILE,File stat,OS:bash_history.test Type: file,filestat,OS:bash_history.test,-
2020-11-09T09:19:49.428684+00:00,Metadata Modification Time,FILE,File stat,OS:bash_history.test Type: file,filestat,OS:bash_history.test,-
2020-11-09T09:21:07.223733+00:00,Last Access Time,FILE,File stat,OS:bash_history.test Type: file,filestat,OS:bash_history.test,-

Based on the first 2 lines it looks like the parser picks up on it. So there must be some other reason why these files are not being processed on your end.

Can you provide us with debug logs of the main and worker processes and other relevant troubleshooting information (also see https://plaso.readthedocs.io/en/latest/sources/Troubleshooting.html)

@madsumm
Copy link
Author

madsumm commented Nov 9, 2020

Hi,

Interestingly I found the following in the log:

2020-11-09 12:26:21,315 [DEBUG] (MainProcess) PID:5593 <extractors> [ParseFileEntryWithParsers] parsing file: OS:/mnt/linux/root/.bash_profile with parser: bash_history
2020-11-09 12:26:21,315 [DEBUG] (MainProcess) PID:5593 <extractors> bash_history unable to parse file: OS:/mnt/linux/root/.bash_profile with error: Wrong file structure.

and

2020-11-09 12:22:15,978 [DEBUG] (Worker_01 ) PID:5426 <worker> [ProcessFileEntry] processing file entry: OS:/mnt/linux/root/.bash_history
2020-11-09 12:22:15,981 [DEBUG] (Worker_01 ) PID:5426 <worker> [ProcessFileEntryDataStream] processing data stream: "" of file entry: OS:/mnt/linux/root/.bash_history
2020-11-09 12:22:15,982 [DEBUG] (Worker_01 ) PID:5426 <worker> [AnalyzeDataStream] analyzing file: OS:/mnt/linux/root/.bash_history
2020-11-09 12:22:15,982 [DEBUG] (Worker_01 ) PID:5426 <hashing_analyzer> Processing results for hasher sha256
2020-11-09 12:22:15,982 [DEBUG] (Worker_01 ) PID:5426 <worker> [AnalyzeFileObject] attribute sha256_hash:117c838478f537a64b55c096cfd6f9dcd0e103c329ec54c9adf141dc966f9e1a calculated for file: OS:/mnt/linux/root/.bash_history.
2020-11-09 12:22:15,983 [DEBUG] (Worker_01 ) PID:5426 <worker> [AnalyzeDataStream] completed analyzing file: OS:/mnt/linux/root/.bash_history
2020-11-09 12:22:15,983 [DEBUG] (Worker_01 ) PID:5426 <worker> [ExtractMetadataFromFileEntry] processing file entry: OS:/mnt/linux/root/.bash_history
2020-11-09 12:22:15,984 [DEBUG] (Worker_01 ) PID:5426 <extractors> [ParseFileEntryWithParsers] parsing file: OS:/mnt/linux/root/.bash_history with parser: bash_history
2020-11-09 12:22:15,984 [DEBUG] (Worker_01 ) PID:5426 <extractors> bash_history unable to parse file: OS:/mnt/linux/root/.bash_history with error: Not a text file, with error: 'utf-8' codec can't decode byte 0xba in position 1819: invalid start byte
2020-11-09 12:22:15,984 [DEBUG] (Worker_01 ) PID:5426 <worker> [ProcessFileEntry] done processing file entry: OS:/mnt/linux/root/.bash_history

looks like the file seems different??
Weird as other bash_history works from the same system...

Rgds

@joachimmetz
Copy link
Member

2020-11-09 12:22:15,984 [DEBUG] (Worker_01 ) PID:5426 <extractors> bash_history unable to parse file: OS:/mnt/linux/root/.bash_history with error: Not a text file, with error: 'utf-8' codec can't decode byte 0xba in position 1819: invalid start byte

looks like the file seems different??

An UTF-8 decoding error is causing the parser to fail.

Would be interesting to know why this bash_history file has an encoding error. I'll need to give it some thought on how to properly handle such a scenario without having everything parsed as bash_history

@joachimmetz joachimmetz self-assigned this Nov 9, 2020
@joachimmetz joachimmetz changed the title Linux Parsers not parsing through all .bash_history bash_history parser failing with 'utf-8' codec can't decode byte 0xba in position Nov 9, 2020
@madsumm
Copy link
Author

madsumm commented Nov 9, 2020

Hi, it seems there are some strange char on some lines from the logs... encrypted?

#1595324530
<snip>√ü˙∆ı·∑ƒÆLˇìò<snip>

It is good to ignore any such errors maybe from the logs due to uncertainty of the format sometimes by various devices?

@joachimmetz
Copy link
Member

joachimmetz commented Nov 10, 2020

encrypted?

Likely more some random data as input, hard to say without the full context.

It is good to ignore any such errors maybe from the logs due to uncertainty of the format sometimes by various devices?

Not entirely sure what you are exactly trying to say here. IMHO "ignoring" would not be the proper approach, maybe a better approach is generate a processing warning and fall back to an encoding method that replaces the unsupported characters.

@madsumm
Copy link
Author

madsumm commented Nov 10, 2020

encrypted?

Likely more some random data as input, hard to say without the full context.

There are a few lines I observed to have random characters like these... not much though

It is good to ignore any such errors maybe from the logs due to uncertainty of the format sometimes by various devices?

Not entirely sure what you are exactly trying to say here. IMHO "ignoring" would not be the proper approach, maybe a better approach is generate a processing warning and fall back to an encoding method that replaces the unsupported characters.

Yep, you are right. "Ignoring" is not the right word for it though.
Not sure which encoding methods for this case.

@joachimmetz
Copy link
Member

Tracking work on a possible solution in #3301, closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants