Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cli_tests memory access violation at address: no mapping at fault address #1303

Closed
2 of 9 tasks
abitmore opened this issue Sep 3, 2018 · 21 comments
Closed
2 of 9 tasks
Assignees
Labels
1b User Story The User Story details a requirement. It may reference a parent Epic. It may reference child Task(s) 2a Discussion Needed Prompt for team to discuss at next stand up. 3d Bug Classification indicating the existing implementation does not match the intention of the design 4a Low Priority Priority indicating minimal impact to system/user -OR- an inexpensive workaround exists

Comments

@abitmore
Copy link
Member

abitmore commented Sep 3, 2018

Bug Description
Sometimes (not always reproduce-able) when running tests/cli_tests, I got this message:

unknown location(0): fatal error in "cli_quit": memory access violation at address: 0x00000058: no mapping at fault address

Note: IIRC I've encountered it not only in cli_quit.

Expected Behavior
No error.

Impacts

  • Other (please add below)
    • testing

Host Environment
Please provide details about the host environment. Much of this information can be found running: witness_node --version.

  • Host OS: Ubuntu 16.04
  • Host Physical RAM 8G
  • BitShares Version: develop
  • OpenSSL Version: 1.0.2
  • Boost Version: 1.58

CORE TEAM TASK LIST

  • Evaluate / Prioritize Bug Report
  • Refine User Stories / Requirements
  • Define Test Cases
  • Design / Develop Solution
  • Perform QA/Testing
  • Update Documentation
@cogutvalera
Copy link
Member

are there any other related logs about stack execution ? more error logs will help very much.

We have next code there:

https://github.com/bitshares/bitshares-core/blob/develop/tests/cli/main.cpp#L275-L279

BOOST_FIXTURE_TEST_CASE( cli_quit, cli_fixture )
{
   BOOST_TEST_MESSAGE("Testing wallet connection and quit command.");
   BOOST_CHECK_THROW( con.wallet_api_ptr->quit(), fc::canceled_exception );
}

https://github.com/bitshares/bitshares-core/blob/develop/tests/cli/main.cpp#L212-L257

struct cli_fixture
{
   int server_port_number;
   fc::temp_directory app_dir;
   std::shared_ptr<graphene::app::application> app1;
   client_connection con;
   std::vector<std::string> nathan_keys;

   cli_fixture() : 
      server_port_number(0),
      app_dir( graphene::utilities::temp_directory_path() ),
      app1( start_application(app_dir, server_port_number) ), 
      con( app1, app_dir, server_port_number ),
      nathan_keys( {"5KQwrPbwdL6PhXujxW37FSSQZ1JiwsST4cqQzDeyXtP79zkvFD3"} )
   {
      BOOST_TEST_MESSAGE("Setup cli_wallet::boost_fixture_test_case");

      using namespace graphene::chain;
      using namespace graphene::app;

      try 
      {
         BOOST_TEST_MESSAGE("Setting wallet password");
         con.wallet_api_ptr->set_password("supersecret");
         con.wallet_api_ptr->unlock("supersecret");

         // import Nathan account
         BOOST_TEST_MESSAGE("Importing nathan key");
         BOOST_CHECK_EQUAL(nathan_keys[0], "5KQwrPbwdL6PhXujxW37FSSQZ1JiwsST4cqQzDeyXtP79zkvFD3");
         BOOST_CHECK(con.wallet_api_ptr->import_key("nathan", nathan_keys[0]));
      } catch( fc::exception& e ) {
         edump((e.to_detail_string()));
         throw;
      }
   }

   ~cli_fixture()
   {
      BOOST_TEST_MESSAGE("Cleanup cli_wallet::boost_fixture_test_case");

      // wait for everything to finish up
      fc::usleep(fc::seconds(1));
        
      app1->shutdown();
   }
};

My thoughts are that there is something wrong maybe in cli_fixture code or maybe some strange behavior with BOOST_FIXTURE_TEST_CASE memory allocation.

I believe the problem with memory access violation happens with next members (more error logs will help much more to understand this issue more correctly):

   int server_port_number;
   fc::temp_directory app_dir;
   std::shared_ptr<graphene::app::application> app1;
   client_connection con;
   std::vector<std::string> nathan_keys;

Thanks !

@cogutvalera
Copy link
Member

cogutvalera commented Sep 3, 2018

@ryanRfox assign this issue to me please and I will try to find the solution. My estimation is approximately 10 hours (because this issue not always reproduce-able and it is hard to catch it), if I will have some troubles and will need more time then I will notify you about that.

Thanks !

@ryanRfox ryanRfox added 1b User Story The User Story details a requirement. It may reference a parent Epic. It may reference child Task(s) 2a Discussion Needed Prompt for team to discuss at next stand up. 3d Bug Classification indicating the existing implementation does not match the intention of the design labels Sep 3, 2018
@ryanRfox
Copy link
Contributor

ryanRfox commented Sep 3, 2018

I assigned @cogutvalera and added the estimation of 10 hours. May I request @abitmore to comment on the estimate?

@abitmore
Copy link
Member Author

abitmore commented Sep 3, 2018

@ryanRfox IMHO better focus on more important things first. This is a minor issue so far.

@cogutvalera
Copy link
Member

cogutvalera commented Sep 4, 2018

@abitmore and @ryanRfox Thank you very much ! Guys, I will do more important things first, and later will try to finish this issue, maybe soon at that time we can catch more error logs about this issue which will help to solve it.

@abitmore and @ryanRfox can you help me please how to determine which issues are more important with highest priority ? Where can I see priorities from highest to lowest ? In project backlog only ?

Thanks !

@ryanRfox ryanRfox added the 4a Low Priority Priority indicating minimal impact to system/user -OR- an inexpensive workaround exists label Sep 4, 2018
@ryanRfox
Copy link
Contributor

ryanRfox commented Sep 4, 2018

@cogutvalera in general I hope you will review the Project Backlog to identify Unassigned - Bugs then Unassigned - Features & Enhancements (rather than New - Awaiting Core Team Evaluation. As a Team, we are working to prioritize the Issues within each bucket (column heading). We also use the 4 Priority Labels to help as well. If you are unable to find an Unassigned Issue, please look to the New Issues and reach out to me. I can work with the Core Team to review, prioritize and estimate Issues which Community Members have interest in working on.

@cogutvalera
Copy link
Member

@ryanRfox thank you very much for so detailed information !

@abitmore
Copy link
Member Author

https://travis-ci.org/bitshares/bitshares-core/builds/428368997

1519615ms th_a       object_database.cpp:106       open                 ] Opening object database from /tmp/graphene-tmp/1029-0ca3-62aa-f1f1/blockchain ...
1519615ms th_a       object_database.cpp:111       open                 ] Done opening object database.
1519615ms th_a       application.cpp:324           operator()           ] Initializing database...
Running 8 test cases...
unknown location(0): fatal error in "cli_vote_for_2_witnesses": memory access violation at address: 0x000000f7: no mapping at fault address
/home/travis/build/bitshares/bitshares-core/tests/cli/main.cpp(326): last checkpoint
*** 1 failure detected in test suite "Test Application"

@abitmore abitmore added this to the Future Feature Release milestone Sep 14, 2018
@cogutvalera
Copy link
Member

@abitmore Thanks for logs !

@abitmore
Copy link
Member Author

2880718ms th_a       object_database.cpp:106       open                 ] Opening object database from /tmp/graphene-tmp/8fcb-00e6-d9de-cf2a/blockchain ...
2880718ms th_a       object_database.cpp:111       open                 ] Done opening object database.
2880718ms th_a       application.cpp:324           operator()           ] Initializing database...
unknown location(0): fatal error in "create_new_account": memory access violation at address: 0x0000002f: no mapping at fault address
/app/bts-dev4/tests/cli/main.cpp(340): last checkpoint

*** 1 failure detected in test suite "Test Application"

The program then hang.

@abitmore
Copy link
Member Author

3133187ms th_a       object_database.cpp:106       open                 ] Opening object database from /tmp/graphene-tmp/591b-1052-9a7
d-6aa0/blockchain ...
3133187ms th_a       object_database.cpp:111       open                 ] Done opening object database.
3133187ms th_a       application.cpp:324           operator()           ] Initializing database...
unknown location(0): fatal error in "upgrade_nathan_account": signal: SIGSEGV, si_code: 0 (memory access violation at address: 0x00000
080)
/app/bts-dev4/tests/cli/main.cpp(312): last checkpoint

*** 1 failure detected in test suite "Test Application"

@abitmore
Copy link
Member Author

Fired in gdb, hang.

3230357ms th_a       database_api.cpp:311          ~database_api_impl   ] freeing database api 36621920
3230357ms th_a       database_api.cpp:311          ~database_api_impl   ] freeing database api 36665632
3230357ms th_a       database_api.cpp:311          ~database_api_impl   ] freeing database api 36656336
3230358ms p2p        thread.cpp:252                exec                 ] thread canceled: 9 canceled_exception: Canceled

    {}
    p2p  thread_d.hpp:471 start_next_fiber
[Thread 0x7ffff686a700 (LWP 2475) exited]

*** No errors detected
[Thread 0x7fffe7fff700 (LWP 2459) exited]
[Thread 0x7fffe6ffd700 (LWP 2461) exited]
[Thread 0x7fffe67fc700 (LWP 2462) exited]
[Thread 0x7fffe77fe700 (LWP 2460) exited]
[Thread 0x7ffff4866700 (LWP 2458) exited]
[Thread 0x7ffff5067700 (LWP 2457) exited]
[Thread 0x7ffff5868700 (LWP 2456) exited]



^C
Thread 1 "cli_test" received signal SIGINT, Interrupt.
pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
185     ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S: No such file or directory.
(gdb) bt
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x0000000001796963 in boost::thread::join_noexcept() ()
#2  0x00000000015ff8b0 in fc::asio::default_io_service_scope::~default_io_service_scope() ()
#3  0x00007ffff6b7cff8 in __run_exit_handlers (status=0, listp=0x7ffff6f075f8 <__exit_funcs>,
    run_list_atexit=run_list_atexit@entry=true) at exit.c:82
#4  0x00007ffff6b7d045 in __GI_exit (status=<optimized out>) at exit.c:104
#5  0x00007ffff6b63837 in __libc_start_main (main=0xb33f90 <main>, argc=1, argv=0x7fffffffe518, init=<optimized out>,
    fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe508) at ../csu/libc-start.c:325
#6  0x0000000000b36779 in _start ()

Thread 11 (Thread 0x7fffe5ffb700 (LWP 2463)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00000000016037d1 in boost::asio::detail::task_io_service::run(boost::system::error_code&) ()
#2  0x00000000016068d9 in boost_asio_detail_posix_thread_function ()
#3  0x00007ffff7bc16ba in start_thread (arg=0x7fffe5ffb700) at pthread_create.c:333
#4  0x00007ffff6c4a41d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 3 (Thread 0x7ffff6069700 (LWP 2455)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00000000015f9e1e in fc::thread_d::process_tasks() ()
#2  0x00000000015fa024 in fc::thread_d::start_process_tasks(long) ()
#3  0x00000000017c9961 in make_fcontext ()
#4  0x0000000000000000 in ?? ()

@abitmore
Copy link
Member Author

Fired in gdb again, crashed.

3441356ms th_a       object_database.cpp:106       open                 ] Opening object database from /tmp/graphene-tmp/4eb4-90c1-45d5-e843/blockchain ...
3441356ms th_a       object_database.cpp:111       open                 ] Done opening object database.
3441356ms th_a       application.cpp:324           operator()           ] Initializing database...
[New Thread 0x7ffff686a700 (LWP 2519)]

Thread 1 "cli_test" received signal SIGSEGV, Segmentation fault.
0x00000000016c28ac in fc::http::detail::websocket_server_impl::websocket_server_impl()::{lambda(std::weak_ptr<void>)#6}::operator()(std::weak_ptr<void>) const::{lambda()#1}::operator()() const ()
(gdb) bt
#0  0x00000000016c28ac in fc::http::detail::websocket_server_impl::websocket_server_impl()::{lambda(std::weak_ptr<void>)#6}::operator()(std::weak_ptr<void>) const::{lambda()#1}::operator()() const ()
#1  0x00000000016c2b59 in fc::detail::void_functor_run<fc::http::detail::websocket_server_impl::websocket_server_impl()::{lambda(std::weak_ptr<void>)#6}::operator()(std::weak_ptr<void>) const::{lambda()#1}>::run(void*, fc::http::detail::websocket_server_impl::websocket_server_impl()::{lambda(std::weak_ptr<void>)#6}::operator()(std::weak_ptr<void>) const::{lambda()#1}) ()
#2  0x00000000015fbbf2 in fc::task_base::run_impl() ()
#3  0x00000000015f991d in fc::thread_d::process_tasks() ()
#4  0x00000000015fa024 in fc::thread_d::start_process_tasks(long) ()
#5  0x00000000017c9961 in make_fcontext ()
#6  0x0000000000000000 in ?? ()

Thread 18 (Thread 0x7ffff686a700 (LWP 2519)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00000000015f9e1e in fc::thread_d::process_tasks() ()
#2  0x00000000015f25bb in fc::thread::exec() ()
#3  0x00000000015f2ac1 in fc::thread::thread(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::{lambda()#1}::operator()() const [clone .constprop.508] ()
#4  0x0000000001794e05 in thread_proxy ()
#5  0x00007ffff7bc16ba in start_thread (arg=0x7ffff686a700) at pthread_create.c:333
#6  0x00007ffff6c4a41d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 11 (Thread 0x7fffe5ffb700 (LWP 2512)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00000000016037d1 in boost::asio::detail::task_io_service::run(boost::system::error_code&) ()
#2  0x00000000016068d9 in boost_asio_detail_posix_thread_function ()
#3  0x00007ffff7bc16ba in start_thread (arg=0x7fffe5ffb700) at pthread_create.c:333
#4  0x00007ffff6c4a41d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 10 (Thread 0x7fffe67fc700 (LWP 2511)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00000000015fe97b in fc::asio::default_io_service_scope::default_io_service_scope()::{lambda()#1}::operator()() const [clone .constprop.586] ()
#2  0x0000000001794e05 in thread_proxy ()
#3  0x00007ffff7bc16ba in start_thread (arg=0x7fffe67fc700) at pthread_create.c:333
#4  0x00007ffff6c4a41d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 9 (Thread 0x7fffe6ffd700 (LWP 2510)):
#0  0x00007ffff6c4aa13 in epoll_wait () at ../sysdeps/unix/syscall-template.S:84
#1  0x00000000015fe5c4 in fc::asio::default_io_service_scope::default_io_service_scope()::{lambda()#1}::operator()() const [clone .con
stprop.586] ()
#2  0x0000000001794e05 in thread_proxy ()
#3  0x00007ffff7bc16ba in start_thread (arg=0x7fffe6ffd700) at pthread_create.c:333
---Type <return> to continue, or q <return> to quit---
#4  0x00007ffff6c4a41d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 8 (Thread 0x7fffe77fe700 (LWP 2509)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00000000015fe97b in fc::asio::default_io_service_scope::default_io_service_scope()::{lambda()#1}::operator()() const [clone .con
stprop.586] ()
#2  0x0000000001794e05 in thread_proxy ()
#3  0x00007ffff7bc16ba in start_thread (arg=0x7fffe77fe700) at pthread_create.c:333
#4  0x00007ffff6c4a41d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 7 (Thread 0x7fffe7fff700 (LWP 2508)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00000000015fe97b in fc::asio::default_io_service_scope::default_io_service_scope()::{lambda()#1}::operator()() const [clone .con
stprop.586] ()
#2  0x0000000001794e05 in thread_proxy ()
#3  0x00007ffff7bc16ba in start_thread (arg=0x7fffe7fff700) at pthread_create.c:333
#4  0x00007ffff6c4a41d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 6 (Thread 0x7ffff4866700 (LWP 2507)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00000000015f9e1e in fc::thread_d::process_tasks() ()
#2  0x00000000015fa024 in fc::thread_d::start_process_tasks(long) ()
#3  0x00000000017c9961 in make_fcontext ()
#4  0x0000000000000000 in ?? ()

Thread 5 (Thread 0x7ffff5067700 (LWP 2506)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00000000015fe97b in fc::asio::default_io_service_scope::default_io_service_scope()::{lambda()#1}::operator()() const [clone .con
stprop.586] ()
#2  0x0000000001794e05 in thread_proxy ()
#3  0x00007ffff7bc16ba in start_thread (arg=0x7ffff5067700) at pthread_create.c:333
#4  0x00007ffff6c4a41d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
---Type <return> to continue, or q <return> to quit---

Thread 4 (Thread 0x7ffff5868700 (LWP 2505)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00000000015fe97b in fc::asio::default_io_service_scope::default_io_service_scope()::{lambda()#1}::operator()() const [clone .con
stprop.586] ()
#2  0x0000000001794e05 in thread_proxy ()
#3  0x00007ffff7bc16ba in start_thread (arg=0x7ffff5868700) at pthread_create.c:333
#4  0x00007ffff6c4a41d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 3 (Thread 0x7ffff6069700 (LWP 2504)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00000000015fe97b in fc::asio::default_io_service_scope::default_io_service_scope()::{lambda()#1}::operator()() const [clone .con
stprop.586] ()
#2  0x0000000001794e05 in thread_proxy ()
#3  0x00007ffff7bc16ba in start_thread (arg=0x7ffff6069700) at pthread_create.c:333
#4  0x00007ffff6c4a41d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 1 (Thread 0x7ffff7fe7740 (LWP 2501)):
#0  0x00000000016c28ac in fc::http::detail::websocket_server_impl::websocket_server_impl()::{lambda(std::weak_ptr<void>)#6}::operator(
)(std::weak_ptr<void>) const::{lambda()#1}::operator()() const ()
#1  0x00000000016c2b59 in fc::detail::void_functor_run<fc::http::detail::websocket_server_impl::websocket_server_impl()::{lambda(std::
weak_ptr<void>)#6}::operator()(std::weak_ptr<void>) const::{lambda()#1}>::run(void*, fc::http::detail::websocket_server_impl::websocke
t_server_impl()::{lambda(std::weak_ptr<void>)#6}::operator()(std::weak_ptr<void>) const::{lambda()#1}) ()
#2  0x00000000015fbbf2 in fc::task_base::run_impl() ()
#3  0x00000000015f991d in fc::thread_d::process_tasks() ()
#4  0x00000000015fa024 in fc::thread_d::start_process_tasks(long) ()
#5  0x00000000017c9961 in make_fcontext ()
#6  0x0000000000000000 in ?? ()

@cogutvalera
Copy link
Member

These logs may help a lot !
Thanks !

@ryanRfox
Copy link
Contributor

ryanRfox commented Feb 1, 2019

@abitmore or @pmconrad : May I request your confirmation that this Issue can be closed? I believe it was fixed in #1314 but I'm not sure.

@abitmore
Copy link
Member Author

abitmore commented Feb 1, 2019

@ryanRfox this issue hasn't been fixed.

@abitmore
Copy link
Member Author

Thread 1 "cli_test" received signal SIGSEGV, Segmentation fault.
0x000000000493e242 in std::less<std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>*>::operator()(std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>* const&, std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>* const&) const ()

(gdb) bt
#0  0x000000000493e242 in std::less<std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>*>::operator()(std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>* const&, std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>* const&) const ()
#1  0x000000000492ee54 in std::__weak_count<(__gnu_cxx::_Lock_policy)2>::_M_less(std::__weak_count<(__gnu_cxx::_Lock_policy)2> const&) const ()
#2  0x0000000004920fe5 in bool std::__weak_ptr<void, (__gnu_cxx::_Lock_policy)2>::owner_before<void>(std::__weak_ptr<void, (__gnu_cxx::_Lock_policy)2> const&) const
    ()
#3  0x0000000004912969 in std::_Sp_owner_less<std::weak_ptr<void>, std::shared_ptr<void> >::operator()(std::weak_ptr<void> const&, std::weak_ptr<void> const&) const
    ()
#4  0x00000000049218a0 in std::_Rb_tree<std::weak_ptr<void>, std::pair<std::weak_ptr<void> const, std::shared_ptr<fc::http::websocket_connection> >, std::_Select1st<std::pair<std::weak_ptr<void> const, std::shared_ptr<fc::http::websocket_connection> > >, std::owner_less<std::weak_ptr<void> >, std::allocator<std::pair<std::weak_ptr<void> const, std::shared_ptr<fc::http::websocket_connection> > > >::_M_lower_bound(std::_Rb_tree_node<std::pair<std::weak_ptr<void> const, std::shared_ptr<fc::http::websocket_connection> > >*, std::_Rb_tree_node<std::pair<std::weak_ptr<void> const, std::shared_ptr<fc::http::websocket_connection> > >*, std::weak_ptr<void> const&) ()
#5  0x0000000004913052 in std::_Rb_tree<std::weak_ptr<void>, std::pair<std::weak_ptr<void> const, std::shared_ptr<fc::http::websocket_connection> >, std::_Select1st<std::pair<std::weak_ptr<void> const, std::shared_ptr<fc::http::websocket_connection> > >, std::owner_less<std::weak_ptr<void> >, std::allocator<std::pair<std::weak_ptr<void> const, std::shared_ptr<fc::http::websocket_connection> > > >::find(std::weak_ptr<void> const&) ()
#6  0x0000000004908a17 in std::map<std::weak_ptr<void>, std::shared_ptr<fc::http::websocket_connection>, std::owner_less<std::weak_ptr<void> >, std::allocator<std::pair<std::weak_ptr<void> const, std::shared_ptr<fc::http::websocket_connection> > > >::find(std::weak_ptr<void> const&) ()
#7  0x00000000048fc890 in fc::http::detail::websocket_server_impl::websocket_server_impl()::{lambda(std::weak_ptr<void>)#6}::operator()(std::weak_ptr<void>) const::{lambda()#1}::operator()() const ()
#8  0x000000000492306e in fc::detail::void_functor_run<fc::http::detail::websocket_server_impl::websocket_server_impl()::{lambda(std::weak_ptr<void>)#6}::operator()(std::weak_ptr<void>) const::{lambda()#1}>::run(void*, fc::http::detail::websocket_server_impl::websocket_server_impl()::{lambda(std::weak_ptr<void>)#6}::operator()(std::weak_ptr<void>) const::{lambda()#1}) ()
#9  0x0000000004823a04 in fc::task_base::run_impl() ()
#10 0x0000000004823984 in fc::task_base::run() ()
#11 0x0000000004816908 in fc::thread_d::run_next_task() ()
#12 0x0000000004816e8d in fc::thread_d::process_tasks() ()
#13 0x00000000048163b7 in fc::thread_d::start_process_tasks(long) ()
#14 0x0000000004b388b1 in make_fcontext ()
#15 0x0000000000000000 in ?? ()

@abitmore
Copy link
Member Author

It seems that here is a threading issue: https://github.com/bitshares/bitshares-fc/blob/b29da15d519d04651b07831f746f864fbac5190d/src/network/http/websocket.cpp#L265

                          if( _connections.find(hdl) != _connections.end() )
                          {
                             _connections[hdl]->closed();
                             _connections.erase( hdl );
                          }

Also there are similar code in the same function.

I guess the situation is similar to #1256 .

@abitmore
Copy link
Member Author

abitmore commented Feb 27, 2019

A normal run:

...
226957ms th_a       application.cpp:1079          startup_plugins      ] Plugin account_history started
226958ms th_a       application.cpp:1079          startup_plugins      ] Plugin grouped_orders started
226958ms th_a       application.cpp:1079          startup_plugins      ] Plugin market_history started
226958ms th_a       witness.cpp:117               plugin_startup       ] witness plugin:  plugin_startup() begin
226958ms th_a       witness.cpp:138               plugin_startup       ] No witness configured.
226958ms th_a       witness.cpp:140               plugin_startup       ] witness plugin:  plugin_startup() end
226958ms th_a       application.cpp:1079          startup_plugins      ] Plugin witness started
226958ms th_a       application.cpp:82            create_example_genes ] Allocating all stake to 5KQwrPbwdL6PhXujxW37FSSQZ1JiwsST4cqQzDeyXtP79zkvFD3
226961ms th_a       witness.cpp:84                plugin_initialize    ] witness plugin:  plugin_initialize() begin
226961ms th_a       witness.cpp:112               plugin_initialize    ] witness plugin:  plugin_initialize() end
226961ms th_a       application.cpp:1079          startup_plugins      ] Plugin account_history started
226961ms th_a       application.cpp:1079          startup_plugins      ] Plugin grouped_orders started
226961ms th_a       application.cpp:1079          startup_plugins      ] Plugin market_history started
226961ms th_a       witness.cpp:117               plugin_startup       ] witness plugin:  plugin_startup() begin
226961ms th_a       witness.cpp:138               plugin_startup       ] No witness configured.
226961ms th_a       witness.cpp:140               plugin_startup       ] witness plugin:  plugin_startup() end
226961ms th_a       application.cpp:1079          startup_plugins      ] Plugin witness started
226961ms th_a       db_management.cpp:186         open                 ] Wiping object_database due to missing or wrong version
226961ms th_a       object_database.cpp:100       wipe                 ] Wiping object database...
226962ms th_a       object_database.cpp:102       wipe                 ] Done wiping object databse.
226962ms th_a       object_database.cpp:115       open                 ] Opening object database from /tmp/graphene-tmp/df6f-0c9f-2e42-731b/blockchain ...
[New Thread 0x7ffff686a700 (LWP 822)]
[New Thread 0x7ffff6069700 (LWP 823)]
[New Thread 0x7ffff5868700 (LWP 824)]
[New Thread 0x7ffff5067700 (LWP 825)]
[New Thread 0x7ffff4866700 (LWP 826)]
[New Thread 0x7fffe7fff700 (LWP 827)]
[New Thread 0x7fffe77fe700 (LWP 828)]
[New Thread 0x7fffe6ffd700 (LWP 829)]
[New Thread 0x7fffe67fc700 (LWP 830)]
[New Thread 0x7fffe5dfb700 (LWP 831)]
[New Thread 0x7fffe55fa700 (LWP 832)]
[New Thread 0x7fffe4df9700 (LWP 833)]
[New Thread 0x7fffc3fff700 (LWP 834)]
[New Thread 0x7fffc37fe700 (LWP 835)]
[New Thread 0x7fffc2ffd700 (LWP 836)]
[New Thread 0x7fffc27fc700 (LWP 837)]
226970ms th_a       object_database.cpp:124       open                 ] Done opening object database.
226970ms th_a       application.cpp:309           operator()           ] Initializing database...
[New Thread 0x7fffc0ff9700 (LWP 838)]
227051ms th_a       application.cpp:194           reset_p2p_node       ] Configured p2p node to listen on 0.0.0.0:43121
227052ms th_a       application.cpp:272           reset_websocket_serv ] Configured websocket rpc to listen on 127.0.0.1:47351
[New Thread 0x7fffacdfe700 (LWP 839)]
227570ms th_a       database_api.cpp:355          database_api_impl    ] creating database api 103606480
227573ms th_a       database_api.cpp:355          database_api_impl    ] creating database api 103639760
227574ms th_a       database_api.cpp:355          database_api_impl    ] creating database api 103623392
227574ms th_a       database_api.cpp:355          database_api_impl    ] creating database api 103646512
227576ms th_a       database_api.cpp:355          database_api_impl    ] creating database api 103652096
227576ms th_a       database_api.cpp:374          ~database_api_impl   ] freeing database api 103639760
227577ms th_a       database_api.cpp:355          database_api_impl    ] creating database api 103640992
227577ms th_a       database_api.cpp:374          ~database_api_impl   ] freeing database api 103623392
227577ms th_a       database_api.cpp:355          database_api_impl    ] creating database api 103623392
227577ms th_a       database_api.cpp:374          ~database_api_impl   ] freeing database api 103646512
Setup cli_wallet::boost_fixture_test_case
...

A crash:

...
228702ms th_a       application.cpp:1079          startup_plugins      ] Plugin account_history started
228702ms th_a       application.cpp:1079          startup_plugins      ] Plugin grouped_orders started
228702ms th_a       application.cpp:1079          startup_plugins      ] Plugin market_history started
228702ms th_a       witness.cpp:117               plugin_startup       ] witness plugin:  plugin_startup() begin
228702ms th_a       witness.cpp:138               plugin_startup       ] No witness configured.
228702ms th_a       witness.cpp:140               plugin_startup       ] witness plugin:  plugin_startup() end
228702ms th_a       application.cpp:1079          startup_plugins      ] Plugin witness started
228702ms th_a       application.cpp:82            create_example_genes ] Allocating all stake to 5KQwrPbwdL6PhXujxW37FSSQZ1JiwsST4cqQzDeyXtP79zkvFD3
228704ms th_a       witness.cpp:84                plugin_initialize    ] witness plugin:  plugin_initialize() begin
228704ms th_a       witness.cpp:112               plugin_initialize    ] witness plugin:  plugin_initialize() end
228704ms th_a       application.cpp:1079          startup_plugins      ] Plugin account_history started
228704ms th_a       application.cpp:1079          startup_plugins      ] Plugin grouped_orders started
228704ms th_a       application.cpp:1079          startup_plugins      ] Plugin market_history started
228704ms th_a       witness.cpp:117               plugin_startup       ] witness plugin:  plugin_startup() begin
228704ms th_a       witness.cpp:138               plugin_startup       ] No witness configured.
228704ms th_a       witness.cpp:140               plugin_startup       ] witness plugin:  plugin_startup() end
228705ms th_a       application.cpp:1079          startup_plugins      ] Plugin witness started
228705ms th_a       db_management.cpp:186         open                 ] Wiping object_database due to missing or wrong version
228705ms th_a       object_database.cpp:100       wipe                 ] Wiping object database...
228705ms th_a       object_database.cpp:102       wipe                 ] Done wiping object databse.
228705ms th_a       object_database.cpp:115       open                 ] Opening object database from /tmp/graphene-tmp/1ade-92de-1761-6aca/blockchain ...

Thread 1 "cli_test" received signal SIGSEGV, Segmentation fault.
0x000000000493e242 in std::less<std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>*>::operator()(std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>* const&, std::_Sp_cou
nted_base<(__gnu_cxx::_Lock_policy)2>* const&) const ()

It seems the program crashed when creating the thread pool.

Another crash:

1194681ms th_a       application.cpp:1079          startup_plugins      ] Plugin account_history started
1194681ms th_a       application.cpp:1079          startup_plugins      ] Plugin grouped_orders started
1194681ms th_a       application.cpp:1079          startup_plugins      ] Plugin market_history started
1194681ms th_a       witness.cpp:117               plugin_startup       ] witness plugin:  plugin_startup() begin
1194681ms th_a       witness.cpp:138               plugin_startup       ] No witness configured.
1194681ms th_a       witness.cpp:140               plugin_startup       ] witness plugin:  plugin_startup() end
1194681ms th_a       application.cpp:1079          startup_plugins      ] Plugin witness started
1194682ms th_a       application.cpp:82            create_example_genes ] Allocating all stake to 5KQwrPbwdL6PhXujxW37FSSQZ1JiwsST4cqQzDeyXtP79zkvFD3
1194684ms th_a       witness.cpp:84                plugin_initialize    ] witness plugin:  plugin_initialize() begin
1194684ms th_a       witness.cpp:112               plugin_initialize    ] witness plugin:  plugin_initialize() end
1194684ms th_a       application.cpp:1079          startup_plugins      ] Plugin account_history started
1194684ms th_a       application.cpp:1079          startup_plugins      ] Plugin grouped_orders started
1194684ms th_a       application.cpp:1079          startup_plugins      ] Plugin market_history started
1194684ms th_a       witness.cpp:117               plugin_startup       ] witness plugin:  plugin_startup() begin
1194684ms th_a       witness.cpp:138               plugin_startup       ] No witness configured.
1194684ms th_a       witness.cpp:140               plugin_startup       ] witness plugin:  plugin_startup() end
1194684ms th_a       application.cpp:1079          startup_plugins      ] Plugin witness started
starting app
1194685ms th_a       db_management.cpp:186         open                 ] Wiping object_database due to missing or wrong version
1194685ms th_a       object_database.cpp:100       wipe                 ] Wiping object database...
1194685ms th_a       object_database.cpp:102       wipe                 ] Done wiping object databse.
1194685ms th_a       object_database.cpp:115       open                 ] Opening object database from /tmp/graphene-tmp/5b58-bdf4-4dd6-ba88/blockchain ...
1194686ms th_a       object_database.cpp:124       open                 ] Done opening object database.
1194686ms th_a       application.cpp:309           operator()           ] Initializing database...
[New Thread 0x7fffc0ff9700 (LWP 1070)]

Thread 1 "cli_test" received signal SIGSEGV, Segmentation fault.
0x000000000493e2b2 in std::less<std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>*>::operator()(std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>* const&, std::_Sp_cou
nted_base<(__gnu_cxx::_Lock_policy)2>* const&) const ()

Another crash:

2868698ms th_a       object_database.cpp:102       wipe                 ] Done wiping object databse.
2868699ms th_a       object_database.cpp:115       open                 ] Opening object database from /tmp/graphene-tmp/15f5-739f-8021-3a77/blockchain ...

Thread 1 "cli_test" received signal SIGSEGV, Segmentation fault.
0x0000000004824f3f in fc::spin_yield_lock::lock() ()
(gdb) bt
#0  0x0000000004824f3f in fc::spin_yield_lock::lock() ()
#1  0x0000000004823698 in fc::unique_lock<fc::spin_yield_lock&>::lock() ()
#2  0x0000000004823589 in fc::unique_lock<fc::spin_yield_lock&>::unique_lock(fc::spin_yield_lock&) ()
#3  0x0000000004823305 in fc::promise_base::_set_value(void const*) ()
#4  0x0000000003a92a97 in fc::promise<void>::set_value() ()
#5  0x00000000048fcc6b in fc::http::detail::websocket_server_impl::websocket_server_impl()::{lambda(std::weak_ptr<void>)#6}::operator()(std::weak_ptr<void>) const::{lambda()#1}::operator()() const ()
#6  0x000000000492316a in fc::detail::void_functor_run<fc::http::detail::websocket_server_impl::websocket_server_impl()::{lambda(std::weak_ptr<void>)#6}::operator()(std::weak_ptr<void>) const::{lambda()#1}>::run(void*, fc::http::detail::websocket_server_impl::websocket_server_impl()::{lambda(std::weak_ptr<void>)#6}::operator()(std::weak_ptr<void>) const::{lambda()#1}) ()
#7  0x0000000004823b00 in fc::task_base::run_impl() ()
#8  0x0000000004823a80 in fc::task_base::run() ()
#9  0x0000000004816a04 in fc::thread_d::run_next_task() ()
#10 0x0000000004816f89 in fc::thread_d::process_tasks() ()
#11 0x00000000048164b3 in fc::thread_d::start_process_tasks(long) ()
#12 0x0000000004b389b1 in make_fcontext ()
#13 0x0000000000000000 in ?? ()

Another crash:

Thread 1 "cli_test" received signal SIGSEGV, Segmentation fault.
0x0000000003a92b7d in fc::promise<void>::set_value() ()
(gdb) bt
#0  0x0000000003a92b7d in fc::promise<void>::set_value() ()
#1  0x00000000048fcf30 in fc::http::detail::websocket_server_impl::websocket_server_impl()::{lambda(std::weak_ptr<void>)#6}::operator()(std::weak_ptr<void>) const::{lambda()#1}::operator()() const ()
#2  0x0000000004923536 in fc::detail::void_functor_run<fc::http::detail::websocket_server_impl::websocket_server_impl()::{lambda(std::weak_ptr<void>)#6}::operator()(std::weak_ptr<void>) const::{lambda()#1}>::run(void*, fc::http::detail::websocket_server_impl::websocket_server_impl()::{lambda(std::weak_ptr<void>)#6}::operator()(std::weak_ptr<void>) const::{lambda()#1}) ()
#3  0x0000000004823c00 in fc::task_base::run_impl() ()
#4  0x0000000004823b80 in fc::task_base::run() ()
#5  0x0000000004816b04 in fc::thread_d::run_next_task() ()
#6  0x0000000004817089 in fc::thread_d::process_tasks() ()
#7  0x00000000048165b3 in fc::thread_d::start_process_tasks(long) ()
#8  0x0000000004b38cc1 in make_fcontext ()
#9  0x0000000000000000 in ?? ()

Another crash:

terminate called after throwing an instance of 'websocketpp::exception'
  what():  invalid state

Thread 1 "cli_test" received signal SIGABRT, Aborted.
0x00007ffff6b78428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
54      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x00007ffff6b78428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1  0x00007ffff6b7a02a in __GI_abort () at abort.c:89
#2  0x00007ffff74bb84d in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007ffff74b96b6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007ffff74b86a9 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007ffff74b9005 in __gxx_personality_v0 () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007ffff6f1cf83 in ?? () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#7  0x00007ffff6f1d2eb in _Unwind_RaiseException () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#8  0x00007ffff74b990c in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#9  0x000000000490a596 in websocketpp::endpoint<websocketpp::connection<fc::http::detail::asio_with_stub_log>, fc::http::detail::asio_with_stub_log>::close(std::weak_p
tr<void>, unsigned short, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
#10 0x00000000048fd62c in fc::http::detail::websocket_server_impl::~websocket_server_impl() ()

@abitmore abitmore assigned abitmore and unassigned cogutvalera Feb 27, 2019
@abitmore
Copy link
Member Author

For whatever reason, the lambda in _server.set_fail_handler( https://github.com/bitshares/bitshares-fc/blob/b29da15d519d04651b07831f746f864fbac5190d/src/network/http/websocket.cpp#L378-L390 ) will execute after every test case, and the async code block inside will only execute when the thread become idle. Not like witness_node, when running cli_test, the threads won't be destroyed/recreated after every test case. If the next test case starts immediately, the thread won't be idle, thus the asynced code will execute only when the next test case went to a break (e.g. sleep or wait for IO), however, at that moment, some data in memory may have been modified thus may lead to a crash.

The hang is something else, but perhaps related.

@abitmore
Copy link
Member Author

I believe this has been fixed by #1626.

By the way, cli_tests sometimes hang, we can open another issue for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1b User Story The User Story details a requirement. It may reference a parent Epic. It may reference child Task(s) 2a Discussion Needed Prompt for team to discuss at next stand up. 3d Bug Classification indicating the existing implementation does not match the intention of the design 4a Low Priority Priority indicating minimal impact to system/user -OR- an inexpensive workaround exists
Projects
None yet
Development

No branches or pull requests

3 participants