Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FatalError with trace, no specific information, --verbose does nothing #106

Open
jakecoble opened this issue Jul 18, 2021 · 30 comments
Open
Labels

Comments

@jakecoble
Copy link

 △ ~ aconfmgr --verbose save                                                                                                                                                                                  
: Collecting data...
:: Compiling user configuration...
::: Using configuration in /home/jake/.config/aconfmgr
::: Sourcing /home/jake/.config/aconfmgr/00-config.sh...
::: Done (0 native packages, 0 foreign packages, 0 files).
:: Inspecting system state...
::: Querying package list...
:::: Done.
::: Enumerating owned files...
:::: Done.
::: Searching for stray files...
:::: Fatal error! Stack trace:
::::: /usr/lib/aconfmgr/common.bash:330 [AconfCompileSystem]
::::: /usr/lib/aconfmgr/common.bash:883 [AconfCompile]
::::: /usr/lib/aconfmgr/save.bash:9 [AconfSave]
::::: /usr/lib/aconfmgr/main.bash:185 [Main]
::::: /usr/lib/aconfmgr/main.bash:205 [source]
::::: /usr/bin/aconfmgr:26 [main]
:::: Fatal error! Stack trace:
::::: /usr/lib/aconfmgr/common.bash:332 [AconfCompileSystem]
::::: /usr/lib/aconfmgr/common.bash:883 [AconfCompile]
::::: /usr/lib/aconfmgr/save.bash:9 [AconfSave]
::::: /usr/lib/aconfmgr/main.bash:185 [Main]
::::: /usr/lib/aconfmgr/main.bash:205 [source]
::::: /usr/bin/aconfmgr:26 [main]

Latest version from the AUR. Only thing in 00-config.sh is verbose=1. No other config files.

What can I do to get more information on the error?

@CyberShadow
Copy link
Owner

Could you please run aconfmgr with the -x shell flag (e.g. bash -x /usr/bin/aconfmgr), and post the output? That should, at least, allow finding the failing command.

@jakecoble
Copy link
Author

Seems to be SIGPIPE, so one of the commands involved must be trying to write to a pipe after a command exited. That area of the code is full of nested subshells, so it's tough to track down exactly which command is failing.

@CyberShadow
Copy link
Owner

So, how can we act on this? Do you have a way to reproduce the problem?

@jakecoble
Copy link
Author

I suspect some subtle misconfiguration of my system at play here, so I'll close this and open a new issue if I find something actually wrong with aconfmgr.

@gardar
Copy link

gardar commented Feb 1, 2022

@jakecoble did you ever manage to trace the source of this issue? I'm facing a similar issue myself.

I've got a fresh install of aconfmgr but a not-so-fresh install of Arch 😄

:: ~ » aconfmgr --verbose save
: Collecting data...
:: Compiling user configuration...
::: Using configuration in /home/g/.config/aconfmgr
::: Done (configuration not found).
:: Inspecting system state...
::: Querying package list...
:::: Done.
::: Enumerating owned files...
:::: Done.
::: Searching for stray files...
:::: Fatal error! Stack trace:
::::: /usr/lib/aconfmgr/common.bash:346 [AconfCompileSystem]
::::: /usr/lib/aconfmgr/common.bash:900 [AconfCompile]
::::: /usr/lib/aconfmgr/save.bash:9 [AconfSave]
::::: /usr/lib/aconfmgr/main.bash:185 [Main]
::::: /usr/lib/aconfmgr/main.bash:205 [source]
::::: /usr/bin/aconfmgr:26 [main]
:::: Fatal error! Stack trace:
::::: /usr/lib/aconfmgr/common.bash:348 [AconfCompileSystem]
::::: /usr/lib/aconfmgr/common.bash:900 [AconfCompile]
::::: /usr/lib/aconfmgr/save.bash:9 [AconfSave]
::::: /usr/lib/aconfmgr/main.bash:185 [Main]
::::: /usr/lib/aconfmgr/main.bash:205 [source]
::::: /usr/bin/aconfmgr:26 [main]

( Let me know if you want me to create a new issue for this @CyberShadow )

@CyberShadow CyberShadow reopened this Feb 1, 2022
@CyberShadow
Copy link
Owner

It looks like the exact same issue.

Could you please run with -x, and either post the output or try to find the failing command?

@CyberShadow CyberShadow added the bug label Feb 1, 2022
@gardar
Copy link

gardar commented Feb 1, 2022

Sure thing, here's the full output with -x:

https://gist.github.com/gardar/892d26ec4de0f7bace104deadff031d8

@CyberShadow
Copy link
Owner

Thanks. Unfortunately I can't tell what is failing from the log.

Could you please try this patch (without -x), and post the output: https://github.com/CyberShadow/aconfmgr/compare/debug-find

@jakecoble
Copy link
Author

jakecoble commented Feb 1, 2022 via email

@CyberShadow
Copy link
Owner

IIRC it turned out that I had a file path on my system with some odd special characters in it. The script was choking on that.

Any hints for how to recreate this problem? (I've been keeping a file with all the special characters I could think of on my real system for testing...)

@gardar
Copy link

gardar commented Feb 1, 2022

Thanks. Unfortunately I can't tell what is failing from the log.

Could you please try this patch (without -x), and post the output: https://github.com/CyberShadow/aconfmgr/compare/debug-find

Here's the output from that branch:

: Collecting data...
:: Compiling user configuration...
::: Using configuration in /home/g/.config/aconfmgr
::: Done (configuration not found).
:: Inspecting system state...
::: Querying package list...
:::: Done.
::: Enumerating owned files...
:::: Done.
::: Searching for stray files...
:::: tee failed!
:::: Fatal error! Stack trace:
::::: /usr/lib/aconfmgr/common.bash:1673 [FatalError]
::::: /usr/lib/aconfmgr/common.bash:339 [AconfCompileSystem]
::::: /usr/lib/aconfmgr/common.bash:903 [AconfCompile]
::::: /usr/lib/aconfmgr/save.bash:9 [AconfSave]
::::: /usr/lib/aconfmgr/main.bash:185 [Main]
::::: /usr/lib/aconfmgr/main.bash:205 [source]
::::: /usr/bin/aconfmgr:26 [main]
:::: find failed!
:::: Fatal error! Stack trace:
::::: /usr/lib/aconfmgr/common.bash:1673 [FatalError]
::::: /usr/lib/aconfmgr/common.bash:336 [AconfCompileSystem]
::::: /usr/lib/aconfmgr/common.bash:903 [AconfCompile]
::::: /usr/lib/aconfmgr/save.bash:9 [AconfSave]
::::: /usr/lib/aconfmgr/main.bash:185 [Main]
::::: /usr/lib/aconfmgr/main.bash:205 [source]
::::: /usr/bin/aconfmgr:26 [main]
:::: Fatal error! Stack trace:
::::: /usr/lib/aconfmgr/common.bash:349 [AconfCompileSystem]
::::: /usr/lib/aconfmgr/common.bash:903 [AconfCompile]
::::: /usr/lib/aconfmgr/save.bash:9 [AconfSave]
::::: /usr/lib/aconfmgr/main.bash:185 [Main]
::::: /usr/lib/aconfmgr/main.bash:205 [source]
::::: /usr/bin/aconfmgr:26 [main]
:::: Fatal error! Stack trace:
::::: /usr/lib/aconfmgr/common.bash:351 [AconfCompileSystem]
::::: /usr/lib/aconfmgr/common.bash:903 [AconfCompile]
::::: /usr/lib/aconfmgr/save.bash:9 [AconfSave]
::::: /usr/lib/aconfmgr/main.bash:185 [Main]
::::: /usr/lib/aconfmgr/main.bash:205 [source]
::::: /usr/bin/aconfmgr:26 [main]

Forgot about this issue! IIRC it turned out that I had a file path on my system with some odd special characters in it. The script was choking on that.

Interesting, since /home is ignored I would have thought such cases would be unlikely to occur.

The only uncommon/odd thing about my paths/filesystem that I can think of is that I'm using zfs for my root, and I have some snapshots and a few zfs volumes mounted, for containers and such. Could aconfmgr be choking on that?

@CyberShadow
Copy link
Owner

It's certainly possible, if GNU find is unequipped to deal with such special filesystem entries.

We can test that theory - run:

find / -regextype posix-extended -not '(' '(' -regex '/dev|/home|/media|/mnt|/proc|/root|/run|/sys|/tmp|/var/cache' -o -false ')' -printf I -print0 -prune ')' -printf O -print0 > /dev/null ; echo $?

It should print 0 on success.

You could also try ignoring the root of these snapshots in the aconfmgr configuration, and see if that makes any difference.

@gardar
Copy link

gardar commented Feb 1, 2022

I'm pretty certain GNU find can handle it just fine...
I tried the find command and it ran successfully (printed 0 )

I tried adding all mounts except / to IgnorePath with the same result.

I might go ahead and try to add just about everything to IgnorePath and then work my way up to find the problematic path (if there is one).
Unless you have some ideas that might get us that result quicker?

@CyberShadow
Copy link
Owner

I'm pretty certain GNU find can handle it just fine... I tried the find command and it ran successfully (printed 0 )

I tried adding all mounts except / to IgnorePath with the same result.

That is interesting, and I'm stumped again.

I might go ahead and try to add just about everything to IgnorePath and then work my way up to find the problematic path (if there is one).

That might be the simplest way forward.

@gardar
Copy link

gardar commented Feb 2, 2022

After tracing down a lot of directories to add to IgnorePath I figured out what the issue is!

grep is eating all my ram and getting oom killed.
I watched my ram go from 1.8gb usage to full 16gb usage in just a few seconds when running aconfmgr.

@CyberShadow
Copy link
Owner

Thanks, that's interesting!

How big is /tmp/aconfmgr-$UID/owned-files? (or ./tmp/owned-files if running from a checkout)

@gardar
Copy link

gardar commented Feb 3, 2022

63M - 892248 lines

@CyberShadow
Copy link
Owner

So far, unable to reproduce - grep uses a constant 631MB of RSS with 1M filter lines, no matter how much data I pipe into it.

I found this report about a memory leak in grep using a similar usage as ours (reading patterns from file):

https://www.mail-archive.com/[email protected]/msg07422.html

But it seems to have been in grep 3.4, but Arch is at 3.7.

@gardar
Copy link

gardar commented Feb 3, 2022

Strange! Could there be something else at play? If this was an issue with grep I would suspect others would be affected by this issue too.

Here's the oom dmesg in case it gives you any hints:
https://gist.github.com/gardar/c2e7cd37382289ba3621373c9acd03e3

@CyberShadow
Copy link
Owner

One possible way we can try to make progress is to reproduce it in isolation. Here's my attempt to extract the invocation in question:

pacman --query --list --quiet | sed 's#\/$##' | sort --unique > owned-files

sudo find / -regextype posix-extended -not '(' '(' -regex '/dev|/home|/media|/mnt|/proc|/root|/run|/sys|/tmp|/var/cache' -o -false ')' -printf I -print0 -prune ')' -printf O -print0 | \
grep --null --null-data --invert-match --fixed-strings --line-regexp --file <( < owned-files sed -e 's#^#O#')

Does this gobble memory too?

@gardar
Copy link

gardar commented Feb 3, 2022

Yep, eats the memory too.

I tried it on two other machines I have that are similarily set up, and both of them seem to be unaffected by this issue.

@CyberShadow
Copy link
Owner

I tried it on two other machines I have that are similarily set up, and both of them seem to be unaffected by this issue.

What if you copy the input files over to the other machines?

I.e., save owned-files and the output of find to a file, and then run grep with that input.

@gardar
Copy link

gardar commented Feb 3, 2022

Found the culpit I think.
Tried using ripgrep instead of grep but it failed with the following error:

/dev/fd/63:214314: found invalid UTF-8 in pattern at byte offset 22: O/usr/lib/aspell-0.60/\xEDslenska.alias (disable Unicode mode and use hex escape sequences to match arbitrary bytes in a pattern, e.g., '(?-u)\xFF')

Looked at the file and found this line:
/usr/lib/aspell-0.60/íslenska.alias

After I removed it from the file the grep runs just fine and the ram doesn't spike (the usage just goes up by 1gb)

@CyberShadow
Copy link
Owner

CyberShadow commented Feb 3, 2022

Neither tools should be doing UTF-8 decoding for case-sensitive fixed strings.

I'm glad you got the problem sorted :) But, I still can't reproduce this.

By any chance have you kept a copy of the files that exhibit the grep problem?

@gardar
Copy link

gardar commented Feb 3, 2022

What about if you install https://aur.archlinux.org/packages/aspell-is/ ?

I can regenerate the file, I've done so few times already.
Do you want me to get that file to you? It's too big to paste here.

@CyberShadow
Copy link
Owner

Do you want me to get that file to you? It's too big to paste here.

Yes, please!

@gardar
Copy link

gardar commented Feb 3, 2022

I managed to push the file to my previous gist, see if you can download it from there (scroll down to the end of the page)
https://gist.github.com/gardar/c2e7cd37382289ba3621373c9acd03e3

@gardar
Copy link

gardar commented Feb 3, 2022

Did you manage to replicate the issue with my filelist?

I removed the aspell-is package and the issue seems to be gone, but ideally this should be fixed or at least detected and give a error that indicates what file/package needs to be removed.

@CyberShadow
Copy link
Owner

CyberShadow commented Feb 3, 2022

Did you manage to replicate the issue with my filelist?

I did, thank you. Crashed my whole computer and everything. :D

I removed the aspell-is package and the issue seems to be gone, but ideally this should be fixed or at least detected and give a error that indicates what file/package needs to be removed.

Yep, ideally it should be fixed in GNU grep. I'll see if I can narrow it down to an exemplary test case.

@gardar
Copy link

gardar commented Feb 3, 2022

Hah ok so it's definitely an issue with grep and not just an issue how grep is used in this case?
I'll be damned, it's not everyday you find a bug in core gnu utils like grep! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants