Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segfault in drivers::run_tests::drive at line 297 #156

Closed
asomers opened this issue Feb 18, 2016 · 14 comments
Closed

segfault in drivers::run_tests::drive at line 297 #156

asomers opened this issue Feb 18, 2016 · 14 comments
Labels

Comments

@asomers
Copy link
Member

asomers commented Feb 18, 2016

Kyua 0.12 on FreeBSD HEAD or 10.1 segfaults in drivers/run_tests.cpp at line 297. It happens in a different test program each time, but it usually segfaults after about an hour.

Here's an example of the console output from a crash:

[192.168.10.2] out: local/kyua/integration/cmd_list_test:one_arg__subdir  ->  passed  [0.160s]
[192.168.10.2] out: local/kyua/integration/cmd_list_test:one_arg__test_case  ->  passed  [0.104s]
[192.168.10.2] out: local/kyua/integration/cmd_list_test:one_arg__test_program  ->  passed  [0.084s]
[192.168.10.2] out: local/kyua/integration/cmd_list_test:only_load_used_test_programs  ->  *** Fatal signal 11 received
[192.168.10.2] out: *** Log file is /root/.kyua/logs/kyua.20160210-175558.log
[192.168.10.2] out: *** Please report this problem to [email protected] detailing what you were doing before the crash happened; if possible, include the log file mentioned above
[192.168.10.2] out: Segmentation fault (core dumped)
[192.168.10.2] out: 

(original source) https://jenkins.freebsd.org/job/FreeBSD_HEAD/157/console

And here's a stack trace:

#0  0x0000000000412d55 in std::__1::__tree_min<std::__1::__tree_node_base<void*>*> (__x=0x1) at /usr/include/c++/v1/__tree:134
#1  std::__1::__tree_next<std::__1::__tree_node_base<void*>*> (
    __x=0x7fffffffb7f0) at /usr/include/c++/v1/__tree:158
#2  0x0000000000472272 in std::__1::__tree_iterator<std::__1::__value_type<int, long>, std::__1::__tree_node<std::__1::__value_type<int, long>, void*>*, long>::operator++ (this=0x7fffffffb478) at /usr/include/c++/v1/__tree:656
#3  std::__1::__tree<std::__1::__value_type<int, long>, std::__1::__map_value_compare<int, std::__1::__value_type<int, long>, std::__1::less<int>, true>, std::__1::allocator<std::__1::__value_type<int, long> > >::erase (
    this=0x7fffffffb7e8, __p=...) at /usr/include/c++/v1/__tree:1978
#4  0x0000000000470619 in std::__1::__map_const_iterator<std::__1::__tree_const_iterator<std::__1::__value_type<int, long>, std::__1::__tree_node<std::__1::__value_type<int, long>, void*>*, long> >::__map_const_iterator (
    this=0x7fffffffb7e8, this=0x7fffffffb7e8, __i=[43985152] 81, 
    __p=[43985152] 81) at /usr/include/c++/v1/map:1067
#5  drivers::run_tests::drive (kyuafile_path=..., build_root=..., 
    store_path=..., filters=empty std::__1::set, user_config=..., hooks=...)
    at drivers/run_tests.cpp:297
#6  0x0000000000457605 in cli::cmd_test::run (this=0x802822880, ui=
    0x7fffffffe4d8, cmdline=..., user_config=...) at cli/cmd_test.cpp:158
#7  0x0000000000413354 in utils::cmdline::base_command<utils::config::tree>::main (this=0x802822880, ui=0x7fffffffe4d8, 
    args=std::__1::vector (length=1, capacity=1) = {...}, data=...)
    at ./utils/cmdline/base_command.ipp:96
#8  0x000000000040d57a in (anonymous namespace)::run_subcommand (
    ui=0x7fffffffe4d8, command=0x802822880, 
    args=std::__1::vector (length=1, capacity=1) = {...}, user_config=...)
    at cli/main.cpp:139
#9  0x000000000040c30b in (anonymous namespace)::safe_main (ui=0x7fffffffe4d8, 
    argc=3, argv=0x7fffffffeb18, mock_command=...) at cli/main.cpp:228
#10 0x0000000000408ed2 in cli::main (ui=0x7fffffffe4d8, argc=3, 
    argv=0x7fffffffeb18, mock_command=...) at cli/main.cpp:280
#11 0x000000000040cd74 in cli::main (argc=3, argv=0x7fffffffeb18)
    at cli/main.cpp:353
#12 0x0000000000408742 in main (argc=3, argv=0x7fffffffeb18) at main.cpp:49

I can provide a core file, if needed. And I can also test patches or help diagnose the problem, if you can suggest anything.

@jmmv
Copy link
Member

jmmv commented Feb 19, 2016

Is this with parallel execution enabled?

Has this just started happening? Note that 0.12 has been out for a while already, so if this is a new issue, I suspect it was triggered by an environment change.

@asomers
Copy link
Member Author

asomers commented Feb 19, 2016

At $WORK, it started happening as soon as we upgraded to Kyua 0.12. On the FreeBSD Jenkins server, which I linked to, it happened as far back as Feb 2, and there's no earlier history. Here at $WORK, we didn't explicitly set parallelism in kyua.conf. It defaults to serial execution, right?

@jmmv
Copy link
Member

jmmv commented Feb 19, 2016

Correct; the default is still serial.

OK, so if it's with the upgrade then my theory on an environment change is unfounded.

I assume this happens when running the full FreeBSD test suite, right?

You also mention "under an hour" but in my experience FreeBSD test runs are faster than that. Is this within a VM maybe?

@asomers
Copy link
Member Author

asomers commented Feb 19, 2016

More details about the upgrade: we're running a custom version of FreeBSD that was forked from stable/10 at version 277174. The crash happened when we tried a general ports upgrade, which included upgrading Kyua from 0.11 to 0.12, but also included upgrading many other ports. Given kyua's small list of dependencies and the location of the stack trace, I'm skeptical that any of those other ports could be responsible. But for completeness's sake, here's the full list of kyua's dependent ports and their versions:
kyua 0.11 -> 0.12_1
sqlite3 3.8.10.1 -> 3.10.2_1
atf-0.21 -> 0.21 (no change)
lutok-0.4_6 -> 0.4_6 (no change)
pkgconf-0.9.7 > 0.9.12_1

The crash happens during a full test of the FreeBSD test suite, plus our proprietary tests. In particular, the test run includes everything from https://svnweb.freebsd.org/base/projects/zfsd/head/tests/sys/cddl/zfs/tests/ . All together, these test runs last about 5 hours. The crash can happen during any test. Usually it happens during one of the ZFS tests, simply because they take the most time. But it's also happened during the pw(8) tests. We're running on bare metal on a 2 socket Haswell system.

@brd
Copy link
Member

brd commented Feb 24, 2016

Since this is happening in the FreeBSD Jenkins I'd like to revert the upgrade to 0.12 from the FreeBSD port until this gets fixed.

@jmmv
Copy link
Member

jmmv commented Feb 24, 2016

Reverting the port sounds reasonable. I may be able to look at it later today, but if you have an earlier chance, feel free to proceed.

@jmmv
Copy link
Member

jmmv commented Feb 27, 2016

This is very strange. I've been looking at the code and I cannot tell why erase() on that line would fail; what's doing is not complicated and I cannot see why the iterator would be invalid. Also, I cannot reproduce this either so it'll be hard to track down.

It'd be that the pid_t to int conversion is messing things up, though I don't think so because pid_t is defined as int32_t on amd64.

I'm also looking at running a full test run with Valgrind to see if that brings some light (of any form).

@jmmv
Copy link
Member

jmmv commented Feb 27, 2016

Valgrind reported nothing interesting. Also tried an AddressSanitizer-enabled build and, while this reports a bunch of issues in test programs (most of which come from atf-c++), the main kyua binary didn't report any.

@brd
Copy link
Member

brd commented Mar 30, 2016

@jmmv Thank you for researching it, but I would like to go ahead and roll the port back. Is that OK with you?

https://people.freebsd.org/~brd/patches/kyua-rollback.diff

@jmmv
Copy link
Member

jmmv commented Mar 30, 2016

Yes. Sorry I didn't get to doing this earlier...

uqs pushed a commit to freebsd/freebsd-ports that referenced this issue Mar 30, 2016
See: freebsd/kyua#156

Approved by:	jmmv (maintainer), bdrewery (mentor)


git-svn-id: svn+ssh://svn.freebsd.org/ports/head@412176 35697150-7ecd-e111-bb59-0022644237b5
uqs pushed a commit to freebsd/freebsd-ports that referenced this issue Mar 30, 2016
See: freebsd/kyua#156

Approved by:	jmmv (maintainer), bdrewery (mentor)
@brd
Copy link
Member

brd commented Mar 30, 2016

No worries, thank you!

tota pushed a commit to tota/freebsd-ports that referenced this issue Apr 11, 2016
See: freebsd/kyua#156

Approved by:	jmmv (maintainer), bdrewery (mentor)


git-svn-id: svn+ssh://svn.freebsd.org/ports/head@412176 35697150-7ecd-e111-bb59-0022644237b5
@jmmv
Copy link
Member

jmmv commented Jul 8, 2016

Excellent. I have been finally able to reproduce this locally: created a fake test suite composed of symlinks to the Kyua tests 100 times, and then ran kyua test on it. The crash comes after about 10 minutes in this case, with the same stack trace as above.

@jmmv jmmv closed this as completed in 1ac9eda Jul 11, 2016
@brd
Copy link
Member

brd commented Jul 11, 2016

Thanks @jmmv, I will try and get this commit imported to test it on our local repo.

Do you have any thoughts on when you will do a release with this in it?

@jmmv
Copy link
Member

jmmv commented Jul 11, 2016

If you can test it, great!

Regarding a new release: soon, likely. I would like to look at a couple more bug reports (like the junit one) before doing one but won't add any new features into it.

@jmmv jmmv added the bug label Jul 14, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants