Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

训练好的模型,c++预测接口出core #21472

Closed
lost-little-lamb opened this issue Dec 2, 2019 · 4 comments
Closed

训练好的模型,c++预测接口出core #21472

lost-little-lamb opened this issue Dec 2, 2019 · 4 comments
Assignees
Labels
status/close 已关闭 User 用于标记用户问题

Comments

@lost-little-lamb
Copy link

模型训练使用paddle fluid: 1.5.1 GPU版本
C++预测依赖的版本(cpu预测): CONFIGS("baidu/lib/paddlepaddle@v1.5.2-avx-mkl-map_PD_BL@git_tag")
在线预测场景:
利用AnalysisConfig创建predictor,多线程预测(每个线程的predictor尝试过在主线程创建和子线程创建,创建没有问题,最后都会出core),每次最多预测50个(batch_size=50)
出core现象:
同一批预测集,并不是每次跑都会出core,线程数越少概率会小一些,debug了多次出core的信息,每次出core的预测样本都不一样(所以应该不是特定输入触发的core),但是出core的地方是一致的,出core的信息如下
Using host libthread_db library "/opt/compiler/gcc-4.8.2/lib/libthread_db.so.1".
Core was generated by `./bin/sug-as . --flagfile=./conf/gflags.conf'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007f584d5f53f7 in raise () from /opt/compiler/gcc-4.8.2/lib/libc.so.6
(gdb) bt
#0 0x00007f584d5f53f7 in raise () from /opt/compiler/gcc-4.8.2/lib/libc.so.6
#1 0x00007f584d5f67d8 in abort () from /opt/compiler/gcc-4.8.2/lib/libc.so.6
#2 0x00007f584dee5c65 in gnu_cxx::verbose_terminate_handler () at ../../../../libstdc++-v3/libsupc++/vterminate.cc:95
#3 0x00007f584dee3e06 in cxxabiv1::terminate (handler=)
at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:38
#4 0x00007f584dee2ec9 in cxa_call_terminate (ue_header=0x7f4b98035fc0) at ../../../../libstdc++-v3/libsupc++/eh_call.cc:54
#5 0x00007f584dee3a7a in cxxabiv1::gxx_personality_v0 (version=, actions=,
exception_class=, ue_header=, context=)
at ../../../../libstdc++-v3/libsupc++/eh_personality.cc:670
#6 0x00007f584d97c853 in Unwind_RaiseException_Phase2 (exc=exc@entry=0x7f4b98035fc0, context=context@entry=0x7f4bd2bf9900)
at ../../../libgcc/unwind.inc:62
#7 0x00007f584d97cd87 in Unwind_Resume (exc=0x7f4b98035fc0) at ../../../libgcc/unwind.inc:230
#8 0x00007f5858ecc735 in paddle::memory::detail::BuddyAllocator::Free(void*) ()
from /home/map/sug-as.new/bin/../lib/libpaddle_fluid.so
#9 0x00007f5858ec9175 in void paddle::memory::legacy::Freepaddle::platform::CPUPlace(paddle::platform::CPUPlace const&, void*, unsigned long) () from /home/map/sug-as.new/bin/../lib/libpaddle_fluid.so
#10 0x00007f5858ec9f15 in paddle::memory::allocation::LegacyAllocator::FreeImpl(paddle::memory::allocation::Allocation*) ()
from /home/map/sug-as.new/bin/../lib/libpaddle_fluid.so
#11 0x00007f585804ebf9 in paddle::framework::Tensor::mutable_data(boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void
, boost::detail::variant::void
, boost::detail::variant::void
, boost::detail::variant::void
, boost::detail::variant::void
, boost::detail::variant::void
, boost::detail::variant::void
, boost::detail::variant::void
, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void_>, paddle::framework::proto::VarType_Type, unsigned long) ()
from /home/map/sug-as.new/bin/../lib/libpaddle_fluid.so
#12 0x00007f5858278ca4 in paddle::operators::FusionSeqConvEltAddReluKernel::Compute(paddle::framework::ExecutionContext const&) const () from /home/map/sug-as.new/bin/../lib/libpaddle_fluid.so
#13 0x00007f5858279dd3 in std::Function_handler<void (paddle::framework::ExecutionContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CPUPlace, false, 0ul, paddle::operators::FusionSeqConvEltAddReluKernel, paddle::operators::FusionSeqConvEltAddReluKernel >::operator()(char const*, char const*, int) const::{lambda(paddle::framework::ExecutionContext const&)#1}>::M_invoke(std::Any_data const&, paddle::framework::ExecutionContext const&) ()
from /home/map/sug-as.new/bin/../lib/libpaddle_fluid.so
#14 0x00007f5858e721c7 in paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void
, boost::detail::variant::void
, boost::detail::variant::void
, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&, paddle::framework::RuntimeContext*) const () from /home/map/sug-as.new/bin/../lib/libpaddle_fluid.so
#15 0x00007f5858e72843 in paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) const ()
from /home/map/sug-as.new/bin/../lib/libpaddle_fluid.so
#16 0x00007f5858e6d8d4 in paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) ()
from /home/map/sug-as.new/bin/../lib/libpaddle_fluid.so
#17 0x00007f585807f518 in paddle::framework::NaiveExecutor::Run() () from /home/map/sug-as.new/bin/../lib/libpaddle_fluid.so
#18 0x00007f5857f10718 in paddle::AnalysisPredictor::Run(std::vector<paddle::PaddleTensor, std::allocatorpaddle::PaddleTensor > const&, std::vector<paddle::PaddleTensor, std::allocatorpaddle::PaddleTensor >*, int) ()
from /home/map/sug-as.new/bin/../lib/libpaddle_fluid.so
#19 0x0000000000512cd7 in map_sug_as::SemanticsManager::batch_predict (this=0x16d31b0 <map_sug_as::g_data+108816>,
data_buf=0x25708e9e8, query=..., poi_name=..., probs=...)
at baidu/mapsearch/sug-as/code/rd/src/Framework/semantics_manager.cpp:267
#20 0x00000000005ee02c in map_sug_as::RankerModel::calc_nn_feature (this=this@entry=0x7f4b9ae77070, cut_num=cut_num@entry=50,
nn_features=...) at baidu/mapsearch/sug-as/code/rd/src/Strategy/Reranker/ranker_model.cpp:321
#21 0x00000000005ee74d in map_sug_as::RankerModel::reranking (this=this@entry=0x7f4b9ae77070)
at baidu/mapsearch/sug-as/code/rd/src/Strategy/Reranker/ranker_model.cpp:385
---Type to continue, or q to quit---
#22 0x00000000005eeff9 in map_sug_as::RankerModel::calc_weight (this=0x7f4b9ae77070)
at baidu/mapsearch/sug-as/code/rd/src/Strategy/Reranker/ranker_model.cpp:103
#23 0x00000000005ef924 in map_sug_as::RerankerManager::calc_weight (this=0x7f4b9ae76ee0, databuf=0x25708e9e8, sorted_pois=...)
at baidu/mapsearch/sug-as/code/rd/src/Strategy/Reranker/reranker_manager.cpp:30
#24 0x0000000000519374 in map_sug_as::SugAsServer::rerank_queue (this=this@entry=0x7f4b9865c210)
at baidu/mapsearch/sug-as/code/rd/src/Framework/sug_as_server.cpp:454
#25 0x000000000051e452 in map_sug_as::SugAsServer::search (this=0x7f4b9865c210, databuf=databuf@entry=0x25708e9e8)
at baidu/mapsearch/sug-as/code/rd/src/Framework/sug_as_server.cpp:525
#26 0x0000000000521fc0 in get_response (databuf=0x25708e9e8) at baidu/mapsearch/sug-as/code/rd/src/Framework/sug_as_work.cpp:94
#27 map_sug_as::thread_main (arg=) at baidu/mapsearch/sug-as/code/rd/src/Framework/sug_as_work.cpp:333
#28 0x00007f5862ffe1c3 in start_thread () from /opt/compiler/gcc-4.8.2/lib/libpthread.so.0
#29 0x00007f584d6a712d in clone () from /opt/compiler/gcc-4.8.2/lib/libc.so.6

@willthefrog willthefrog added the User 用于标记用户问题 label Dec 2, 2019
@FrostML
Copy link
Contributor

FrostML commented Dec 2, 2019

请问您的多线程是怎么设置的呢?预测是一个线程只能起一个preditor

@lost-little-lamb
Copy link
Author

lost-little-lamb commented Dec 2, 2019

主线程中调用CreatePaddlePredictor创建main_predictor, 预测线程(work线程)的predictor是从根据main_predictor clone的

int init_thread_databuf(conf_info_t &g_conf, thread_data_buf* data_buf) {
paddle::AnalysisConfig config;
config.SetModel(g_conf.semantics_model_path);
config.DisableGpu();
config.SwitchIrOptim();
auto main_predictor = paddle::CreatePaddlePredictor(config);
if (main_predictor == nullptr) {
MAP_LOG_FATAL("create semantic model failure");
return ERR_RETURN;
}

for (int i = 0; i < g_conf.thread_num; i++) {
    (data_buf+i)->semantics_predictor = std::move(main_predictor->Clone());
    if ((data_buf+i)->semantics_predictor == nullptr) {
        MAP_LOG_FATAL("create semantic model failure");
        return ERR_RETURN;
    }   
}   
return SUC_RETURN;

}

@FrostML
Copy link
Contributor

FrostML commented Dec 2, 2019

“线程数越少概率会小一些”,如果是直接使用main_predictor单线程预测是否也会core呢?

@lost-little-lamb
Copy link
Author

没有直接使用main_predictor预测, 而是在多线程的环境下,发送请求是串行,请求随机分配到不同线程中去预测,相同时间应该只有一个线程在执行预测,这种情况没有出core

@paddle-bot paddle-bot bot added the status/close 已关闭 label Jan 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/close 已关闭 User 用于标记用户问题
Projects
None yet
Development

No branches or pull requests

3 participants