Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data parser is not thread safe. #4112

Closed
trivialfis opened this issue Feb 7, 2019 · 6 comments
Closed

Data parser is not thread safe. #4112

trivialfis opened this issue Feb 7, 2019 · 6 comments

Comments

@trivialfis
Copy link
Member

This is actually a bug in dmlc-core, but it's critical and I found it when debugging XGBoost.

While looking into #4107 , after some slight modifications, I got a segfault from gpu_predictor.MGPU_PicklingTest. Output from ASAN is attached at the end. Some locks for the text parser may be required. @hcho3

=================================================================
==9348==AddressSanitizer: while reporting a bug found another one. Ignoring.
==9348==AddressSanitizer: while reporting a bug found another one. Ignoring.
==9348==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000005018 at pc 0x55c47e0b76ca bp 0x7f9c9f7f6d00 sp 0x7f9c9f7f6cf0
==9348==AddressSanitizer: while reporting a bug found another one. Ignoring.
==9348==AddressSanitizer: while reporting a bug found another one. Ignoring.
==9348==AddressSanitizer: while reporting a bug found another one. Ignoring.
WRITE of size 8 at 0x602000005018 thread T16
==9348==AddressSanitizer: while reporting a bug found another one. Ignoring.
==9348==AddressSanitizer: while reporting a bug found another one. Ignoring.
==9348==AddressSanitizer: while reporting a bug found another one. Ignoring.
==9348==AddressSanitizer: while reporting a bug found another one. Ignoring.
==9348==AddressSanitizer: while reporting a bug found another one. Ignoring.
==9348==AddressSanitizer: while reporting a bug found another one. Ignoring.
==9348==AddressSanitizer: while reporting a bug found another one. Ignoring.
==9348==AddressSanitizer: while reporting a bug found another one. Ignoring.
==9348==AddressSanitizer: while reporting a bug found another one. Ignoring.
#0 0x55c47e0b76c9 in void __gnu_cxx::new_allocator::construct<unsigned long, unsigned long>(unsigned long*, unsigned long&&) /usr/include/c++/7/ext/new_allocator.h:136
#1 0x55c47e0b1139 in void std::allocator_traits<std::allocator >::construct<unsigned long, unsigned long>(std::allocator&, unsigned long*, unsigned long&&) /usr/include/c++/7/bits/alloc_traits.h:475
#2 0x55c47e0aa0bd in void std::vector<unsigned long, std::allocator >::emplace_back(unsigned long&&) /usr/include/c++/7/bits/vector.tcc:100
#3 0x55c47e0a5567 in std::vector<unsigned long, std::allocator >::push_back(unsigned long&&) (/home/fis/Workspace/xgb/xgboost/testxgboost+0x2a4567)
#4 0x55c47e623ec9 in dmlc::data::RowBlockContainer<unsigned int, float>::Clear() /home/fis/Workspace/xgb/xgboost/dmlc-core/src/data/./row_block.h:65
#5 0x55c47e6542b5 in dmlc::data::LibSVMParser<unsigned int, float>::ParseBlock(char const*, char const*, dmlc::data::RowBlockContainer<unsigned int, float>) /home/fis/Workspace/xgb/xgboost/dmlc-core/src/data/./libsvm_parser.h:72
#6 0x55c47e658e9b in dmlc::data::TextParserBase<unsigned int, float>::FillData(std::vector<dmlc::data::RowBlockContainer<unsigned int, float>, std::allocator<dmlc::data::RowBlockContainer<unsigned int, float> > >
)::{lambda()#1}::operator()() const /home/fis/Workspace/xgb/xgboost/dmlc-core/src/data/././text_parser.h:135
#7 0x55c47e65d874 in void dmlc::OMPException::Run<dmlc::data::TextParserBase<unsigned int, float>::FillData(std::vector<dmlc::data::RowBlockContainer<unsigned int, float>, std::allocator<dmlc::data::RowBlockContainer<unsigned int, float> > >)::{lambda()#1}>(dmlc::data::TextParserBase<unsigned int, float>::FillData(std::vector<dmlc::data::RowBlockContainer<unsigned int, float>, std::allocator<dmlc::data::RowBlockContainer<unsigned int, float> > >)::{lambda()#1}) /home/fis/Workspace/xgb/xgboost/dmlc-core/include/dmlc/././././common.h:66
#8 0x55c47e60af95 in dmlc::data::TextParserBase<unsigned int, float>::FillData(std::vector<dmlc::data::RowBlockContainer<unsigned int, float>, std::allocator<dmlc::data::RowBlockContainer<unsigned int, float> > >*) [clone ._omp_fn.5] /home/fis/Workspace/xgb/xgboost/dmlc-core/src/data/././text_parser.h:122
#9 0x7f9cc33ee97d (/usr/lib/x86_64-linux-gnu/libgomp.so.1+0x1697d)
#10 0x7f9cc2fa86da in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76da)
#11 0x7f9cc2cd188e in __clone (/lib/x86_64-linux-gnu/libc.so.6+0x12188e)

0x602000005018 is located 0 bytes to the right of 8-byte region [0x602000005010,0x602000005018)
freed by thread T14 here:
#0 0x7f9cc421b2d0 in operator delete(void*) (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xe12d0)
#1 0x55c47df64607 in __gnu_cxx::new_allocator::deallocate(unsigned long*, unsigned long) /usr/include/c++/7/ext/new_allocator.h:125
#2 0x55c47df6323e in std::allocator_traits<std::allocator >::deallocate(std::allocator&, unsigned long*, unsigned long) /usr/include/c++/7/bits/alloc_traits.h:462
#3 0x55c47df6187d in std::_Vector_base<unsigned long, std::allocator >::_M_deallocate(unsigned long*, unsigned long) /usr/include/c++/7/bits/stl_vector.h:180
#4 0x55c47e0b1515 in void std::vector<unsigned long, std::allocator >::_M_realloc_insert(__gnu_cxx::__normal_iterator<unsigned long*, std::vector<unsigned long, std::allocator > >, unsigned long&&) (/home/fis/Workspace/xgb/xgboost/testxgboost+0x2b0515)
#5 0x55c47e0aa128 in void std::vector<unsigned long, std::allocator >::emplace_back(unsigned long&&) /usr/include/c++/7/bits/vector.tcc:105
#6 0x55c47e0a5567 in std::vector<unsigned long, std::allocator >::push_back(unsigned long&&) (/home/fis/Workspace/xgb/xgboost/testxgboost+0x2a4567)
#7 0x55c47e654416 in dmlc::data::LibSVMParser<unsigned int, float>::ParseBlock(char const*, char const*, dmlc::data::RowBlockContainer<unsigned int, float>) /home/fis/Workspace/xgb/xgboost/dmlc-core/src/data/./libsvm_parser.h:96
#8 0x55c47e658e9b in dmlc::data::TextParserBase<unsigned int, float>::FillData(std::vector<dmlc::data::RowBlockContainer<unsigned int, float>, std::allocator<dmlc::data::RowBlockContainer<unsigned int, float> > >
)::{lambda()#1}::operator()() const /home/fis/Workspace/xgb/xgboost/dmlc-core/src/data/././text_parser.h:135
#9 0x55c47e65d874 in void dmlc::OMPException::Run<dmlc::data::TextParserBase<unsigned int, float>::FillData(std::vector<dmlc::data::RowBlockContainer<unsigned int, float>, std::allocator<dmlc::data::RowBlockContainer<unsigned int, float> > >)::{lambda()#1}>(dmlc::data::TextParserBase<unsigned int, float>::FillData(std::vector<dmlc::data::RowBlockContainer<unsigned int, float>, std::allocator<dmlc::data::RowBlockContainer<unsigned int, float> > >)::{lambda()#1}) /home/fis/Workspace/xgb/xgboost/dmlc-core/include/dmlc/././././common.h:66
#10 0x55c47e60af95 in dmlc::data::TextParserBase<unsigned int, float>::FillData(std::vector<dmlc::data::RowBlockContainer<unsigned int, float>, std::allocator<dmlc::data::RowBlockContainer<unsigned int, float> > >*) [clone ._omp_fn.5] /home/fis/Workspace/xgb/xgboost/dmlc-core/src/data/././text_parser.h:122
#11 0x7f9cc33ee97d (/usr/lib/x86_64-linux-gnu/libgomp.so.1+0x1697d)

previously allocated by thread T3 here:
#0 0x7f9cc421a458 in operator new(unsigned long) (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xe0458)
#1 0x55c47df64e26 in __gnu_cxx::new_allocator::allocate(unsigned long, void const*) /usr/include/c++/7/ext/new_allocator.h:111
#2 0x55c47df64575 in std::allocator_traits<std::allocator >::allocate(std::allocator&, unsigned long) /usr/include/c++/7/bits/alloc_traits.h:436
#3 0x55c47df6311d in std::_Vector_base<unsigned long, std::allocator >::_M_allocate(unsigned long) /usr/include/c++/7/bits/stl_vector.h:172
#4 0x55c47e0b127d in void std::vector<unsigned long, std::allocator >::_M_realloc_insert(__gnu_cxx::__normal_iterator<unsigned long*, std::vector<unsigned long, std::allocator > >, unsigned long&&) (/home/fis/Workspace/xgb/xgboost/testxgboost+0x2b027d)
#5 0x55c47e0aa128 in void std::vector<unsigned long, std::allocator >::emplace_back(unsigned long&&) /usr/include/c++/7/bits/vector.tcc:105
#6 0x55c47e0a5567 in std::vector<unsigned long, std::allocator >::push_back(unsigned long&&) (/home/fis/Workspace/xgb/xgboost/testxgboost+0x2a4567)
#7 0x55c47e623ec9 in dmlc::data::RowBlockContainer<unsigned int, float>::Clear() /home/fis/Workspace/xgb/xgboost/dmlc-core/src/data/./row_block.h:65
#8 0x55c47e617b2a in dmlc::data::RowBlockContainer<unsigned int, float>::RowBlockContainer() /home/fis/Workspace/xgb/xgboost/dmlc-core/src/data/./row_block.h:48
#9 0x55c47e660ea9 in void std::_Construct<dmlc::data::RowBlockContainer<unsigned int, float>>(dmlc::data::RowBlockContainer<unsigned int, float>) /usr/include/c++/7/bits/stl_construct.h:75
#10 0x55c47e6607ce in dmlc::data::RowBlockContainer<unsigned int, float>
std::__uninitialized_default_n_1::__uninit_default_n<dmlc::data::RowBlockContainer<unsigned int, float>, unsigned long>(dmlc::data::RowBlockContainer<unsigned int, float>, unsigned long) /usr/include/c++/7/bits/stl_uninitialized.h:527
#11 0x55c47e66008f in dmlc::data::RowBlockContainer<unsigned int, float>* std::__uninitialized_default_n<dmlc::data::RowBlockContainer<unsigned int, float>, unsigned long>(dmlc::data::RowBlockContainer<unsigned int, float>, unsigned long) /usr/include/c++/7/bits/stl_uninitialized.h:583
#12 0x55c47e65fa4d in dmlc::data::RowBlockContainer<unsigned int, float>* std::__uninitialized_default_n_a<dmlc::data::RowBlockContainer<unsigned int, float>, unsigned long, dmlc::data::RowBlockContainer<unsigned int, float> >(dmlc::data::RowBlockContainer<unsigned int, float>, unsigned long, std::allocator<dmlc::data::RowBlockContainer<unsigned int, float> >&) /usr/include/c++/7/bits/stl_uninitialized.h:645
#13 0x55c47e65ef9e in std::vector<dmlc::data::RowBlockContainer<unsigned int, float>, std::allocator<dmlc::data::RowBlockContainer<unsigned int, float> > >::_M_default_append(unsigned long) /usr/include/c++/7/bits/vector.tcc:575
#14 0x55c47e65d7c2 in std::vector<dmlc::data::RowBlockContainer<unsigned int, float>, std::allocator<dmlc::data::RowBlockContainer<unsigned int, float> > >::resize(unsigned long) /usr/include/c++/7/bits/stl_vector.h:692
#15 0x55c47e658f48 in dmlc::data::TextParserBase<unsigned int, float>::FillData(std::vector<dmlc::data::RowBlockContainer<unsigned int, float>, std::allocator<dmlc::data::RowBlockContainer<unsigned int, float> > >) /home/fis/Workspace/xgb/xgboost/dmlc-core/src/data/././text_parser.h:116
#16 0x55c47e651a90 in dmlc::data::TextParserBase<unsigned int, float>::ParseNext(std::vector<dmlc::data::RowBlockContainer<unsigned int, float>, std::allocator<dmlc::data::RowBlockContainer<unsigned int, float> > >
) /home/fis/Workspace/xgb/xgboost/dmlc-core/src/data/././text_parser.h:46
#17 0x55c47e614df1 in dmlc::data::ThreadedParser<unsigned int, float>::ThreadedParser(dmlc::data::ParserImpl<unsigned int, float>)::{lambda(std::vector<dmlc::data::RowBlockContainer<unsigned int, float>, std::allocator<dmlc::data::RowBlockContainer<unsigned int, float> > >)#1}::operator()(std::vector<dmlc::data::RowBlockContainer<unsigned int, float>, std::allocator<dmlc::data::RowBlockContainer<unsigned int, float> > >) const /home/fis/Workspace/xgb/xgboost/dmlc-core/src/data/parser.h:80
#18 0x55c47e630ee7 in std::_Function_handler<bool (std::vector<dmlc::data::RowBlockContainer<unsigned int, float>, std::allocator<dmlc::data::RowBlockContainer<unsigned int, float> > >**), dmlc::data::ThreadedParser<unsigned int, float>::ThreadedParser(dmlc::data::ParserImpl<unsigned int, float>
)::{lambda(std::vector<dmlc::data::RowBlockContainer<unsigned int, float>, std::allocator<dmlc::data::RowBlockContainer<unsigned int, float> > >)#1}>::_M_invoke(std::_Any_data const&, std::vector<dmlc::data::RowBlockContainer<unsigned int, float>, std::allocator<dmlc::data::RowBlockContainer<unsigned int, float> > >&&) /usr/include/c++/7/bits/std_function.h:302
#19 0x55c47e63130c in std::function<bool (std::vector<dmlc::data::RowBlockContainer<unsigned int, float>, std::allocator<dmlc::data::RowBlockContainer<unsigned int, float> > >)>::operator()(std::vector<dmlc::data::RowBlockContainer<unsigned int, float>, std::allocator<dmlc::data::RowBlockContainer<unsigned int, float> > >) const /usr/include/c++/7/bits/std_function.h:706
#20 0x55c47e61de20 in dmlc::ThreadedIter<std::vector<dmlc::data::RowBlockContainer<unsigned int, float>, std::allocator<dmlc::data::RowBlockContainer<unsigned int, float> > > >::Init(std::function<bool (std::vector<dmlc::data::RowBlockContainer<unsigned int, float>, std::allocator<dmlc::data::RowBlockContainer<unsigned int, float> > >)>, std::function<void ()>)::{lambda()#1}::operator()() const /home/fis/Workspace/xgb/xgboost/dmlc-core/include/dmlc/threadediter.h:357
#21 0x55c47e63b502 in void std::__invoke_impl<void, dmlc::ThreadedIter<std::vector<dmlc::data::RowBlockContainer<unsigned int, float>, std::allocator<dmlc::data::RowBlockContainer<unsigned int, float> > > >::Init(std::function<bool (std::vector<dmlc::data::RowBlockContainer<unsigned int, float>, std::allocator<dmlc::data::RowBlockContainer<unsigned int, float> > >
)>, std::function<void ()>)::{lambda()#1}>(std::__invoke_other, dmlc::ThreadedIter<std::vector<dmlc::data::RowBlockContainer<unsigned int, float>, std::allocator<dmlc::data::RowBlockContainer<unsigned int, float> > > >::Init(std::function<bool (std::vector<dmlc::data::RowBlockContainer<unsigned int, float>, std::allocator<dmlc::data::RowBlockContainer<unsigned int, float> > >)>, std::function<void ()>)::{lambda()#1}&&) /usr/include/c++/7/bits/invoke.h:60
#22 0x55c47e6313ef in std::__invoke_result<dmlc::ThreadedIter<std::vector<dmlc::data::RowBlockContainer<unsigned int, float>, std::allocator<dmlc::data::RowBlockContainer<unsigned int, float> > > >::Init(std::function<bool (std::vector<dmlc::data::RowBlockContainer<unsigned int, float>, std::allocator<dmlc::data::RowBlockContainer<unsigned int, float> > >
)>, std::function<void ()>)::{lambda()#1}>::type std::__invoke<dmlc::ThreadedIter<std::vector<dmlc::data::RowBlockContainer<unsigned int, float>, std::allocator<dmlc::data::RowBlockContainer<unsigned int, float> > > >::Init(std::function<bool (std::vector<dmlc::data::RowBlockContainer<unsigned int, float>, std::allocator<dmlc::data::RowBlockContainer<unsigned int, float> > >)>, std::function<void ()>)::{lambda()#1}>(std::__invoke_result&&, (dmlc::ThreadedIter<std::vector<dmlc::data::RowBlockContainer<unsigned int, float>, std::allocator<dmlc::data::RowBlockContainer<unsigned int, float> > > >::Init(std::function<bool (std::vector<dmlc::data::RowBlockContainer<unsigned int, float>, std::allocator<dmlc::data::RowBlockContainer<unsigned int, float> > >)>, std::function<void ()>)::{lambda()#1}&&)...) /usr/include/c++/7/bits/invoke.h:95
#23 0x55c47e65cbfb in decltype (__invoke((_S_declval<0ul>)())) std::thread::_Invoker<std::tuple<dmlc::ThreadedIter<std::vector<dmlc::data::RowBlockContainer<unsigned int, float>, std::allocator<dmlc::data::RowBlockContainer<unsigned int, float> > > >::Init(std::function<bool (std::vector<dmlc::data::RowBlockContainer<unsigned int, float>, std::allocator<dmlc::data::RowBlockContainer<unsigned int, float> > >)>, std::function<void ()>)::{lambda()#1}> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) /usr/include/c++/7/thread:234
#24 0x55c47e657229 in std::thread::_Invoker<std::tuple<dmlc::ThreadedIter<std::vector<dmlc::data::RowBlockContainer<unsigned int, float>, std::allocator<dmlc::data::RowBlockContainer<unsigned int, float> > > >::Init(std::function<bool (std::vector<dmlc::data::RowBlockContainer<unsigned int, float>, std::allocator<dmlc::data::RowBlockContainer<unsigned int, float> > >
)>, std::function<void ()>)::{lambda()#1}> >::operator()() /usr/include/c++/7/thread:243
#25 0x55c47e64f129 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<dmlc::ThreadedIter<std::vector<dmlc::data::RowBlockContainer<unsigned int, float>, std::allocator<dmlc::data::RowBlockContainer<unsigned int, float> > > >::Init(std::function<bool (std::vector<dmlc::data::RowBlockContainer<unsigned int, float>, std::allocator<dmlc::data::RowBlockContainer<unsigned int, float> > >**)>, std::function<void ()>)::{lambda()#1}> > >::_M_run() /usr/include/c++/7/thread:186
#26 0x7f9cc3a6257e (/usr/lib/x86_64-linux-gnu/libstdc++.so.6+0xbd57e)

@hcho3
Copy link
Collaborator

hcho3 commented Feb 7, 2019

@trivialfis That's quite concerning. Would you be able to post a small re-producible example? I am actually dealing with a memory bug in MXNet that may be related.

@trivialfis
Copy link
Member Author

@hcho3 I'm not sure it would be reproducible on your machine. It's found in a quite surprising way.

Fetch my CMake branch:
https://github.com/trivialfis/xgboost/tree/cmake

Add auto thread = omp_get_thread_num(); to:

https://github.com/trivialfis/xgboost/blob/017c97b8ce62935429c797b085bf46c63b2be617/src/common/device_helpers.cuh#L939

after the check. Run gpu_predictor.MGPU_PicklingTest from gtest.

I'm running Ubuntu 18.04 with CUDA 9.2 , gcc 7.3.0.

@trivialfis
Copy link
Member Author

And the cmake flags:
cmake ../xgboost -DUSE_CUDA=ON -DUSE_OPENMP=ON -DGOOGLE_TEST=ON -DGPU_COMPUTE_VER=61 -DUSE_NCCL=ON -DCMAKE_BUILD_TYPE=Debug -DCMAKE_CXX_FLAGS="-fuse-ld=gold" -DCMAKE_C_FLAGS="-fuse-ld=gold" -DUSE_SANITIZER=ON -DENABLED_SANITIZERS="address"

@hcho3
Copy link
Collaborator

hcho3 commented Feb 21, 2019

Related: dmlc/dmlc-core#505

@hcho3
Copy link
Collaborator

hcho3 commented Mar 8, 2019

@trivialfis I think dmlc/dmlc-core#511 would fix it.

@trivialfis
Copy link
Member Author

@hcho3 Closing for now.

@lock lock bot locked as resolved and limited conversation to collaborators Jun 6, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants