Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash when training Word2Vec using Debug build #32252

Closed
wozna opened this issue Apr 13, 2021 · 3 comments
Closed

Crash when training Word2Vec using Debug build #32252

wozna opened this issue Apr 13, 2021 · 3 comments
Assignees
Labels

Comments

@wozna
Copy link
Contributor

wozna commented Apr 13, 2021

I found that when running the Word2Vec model training if the Paddle was built as Debug, there is an error. It works fine with the Release build of PaddlePaddle.

Engine: Training on CPU with MKL-DNN (not checked without MKLDNN)
Operating System: Ubuntu 18.04
PaddlePaddle version: develop branch (tested commit 6e946e9) , but issue occurs on older revisions as well.

To reproduce it, you can use the files and the commands given here #30560 (comment) , just change the Release flag to Debug.

The error shows up with both ways to run:

cd 2.0benchmark/ps/static/word2vec
python -u ../train.py -c benchmark.yaml

and

cd 2.0benchmark/ps/static/word2vec
fleetrun --worker_num=1 --server_num=1 ../train.py -c benchmark.yaml

The log looks very similar to the one already mentioned in the issue #24863
Here is the stack trace:

--------------------------------------
C++ Traceback (most recent call last):
--------------------------------------
0   std::vector<paddle::framework::LoDTensor, std::allocator<paddle::framework::LoDTensor> >::clear()
1   std::vector<paddle::framework::LoDTensor, std::allocator<paddle::framework::LoDTensor> >::_M_erase_at_end(paddle::framework::LoDTensor*)
2   void std::_Destroy<paddle::framework::LoDTensor*, paddle::framework::LoDTensor>(paddle::framework::LoDTensor*, paddle::framework::LoDTensor*, std::allocator<paddle::framework::LoDTensor>&)
3   void std::_Destroy<paddle::framework::LoDTensor*>(paddle::framework::LoDTensor*, paddle::framework::LoDTensor*)
4   void std::_Destroy_aux<false>::__destroy<paddle::framework::LoDTensor*>(paddle::framework::LoDTensor*, paddle::framework::LoDTensor*)
5   void std::_Destroy<paddle::framework::LoDTensor>(paddle::framework::LoDTensor*)
6   paddle::framework::LoDTensor::~LoDTensor()
7   std::vector<paddle::framework::CPUVector<unsigned long>, std::allocator<paddle::framework::CPUVector<unsigned long> > >::~vector()
8   void std::_Destroy<paddle::framework::CPUVector<unsigned long>*, paddle::framework::CPUVector<unsigned long> >(paddle::framework::CPUVector<unsigned long>*, paddle::framework::CPUVector<unsigned long>*, std::allocator<paddle::framework::CPUVector<unsigned long> >&)
9   void std::_Destroy<paddle::framework::CPUVector<unsigned long>*>(paddle::framework::CPUVector<unsigned long>*, paddle::framework::CPUVector<unsigned long>*)
10  void std::_Destroy_aux<false>::__destroy<paddle::framework::CPUVector<unsigned long>*>(paddle::framework::CPUVector<unsigned long>*, paddle::framework::CPUVector<unsigned long>*)
11  void std::_Destroy<paddle::framework::CPUVector<unsigned long> >(paddle::framework::CPUVector<unsigned long>*)
12  paddle::framework::CPUVector<unsigned long>::~CPUVector()
13  std::vector<unsigned long, std::allocator<unsigned long> >::~vector()
14  paddle::framework::SignalHandle(char const*, int)
15  paddle::platform::GetCurrentTraceBackString[abi:cxx11]()
@paddle-bot-old
Copy link

您好,我们已经收到了您的问题,会安排技术人员尽快解答您的问题,请耐心等待。请您再次检查是否提供了清晰的问题描述、复现代码、环境&版本、报错信息等。同时,您也可以通过查看官网API文档常见问题历史IssueAI社区来寻求解答。祝您生活愉快~

Hi! We've received your issue and please be patient to get responded. We will arrange technicians to answer your questions as soon as possible. Please make sure that you have posted enough message to demo your request. You may also check out the APIFAQGithub Issue and AI community to get the answer.Have a nice day!

@seiriosPlus
Copy link
Collaborator

Hi! We've received your issue and had an internal meeting. we are analyzing this case now, thanks your wait patiently.

@MrChengmo
Copy link
Contributor

  • We reproduce this problem with similar error messages
--------------------------------------
C++ Traceback (most recent call last):
--------------------------------------
0   std::vector<paddle::framework::LoDTensor, std::allocator<paddle::framework::LoDTensor> >::clear()
1   std::vector<paddle::framework::LoDTensor, std::allocator<paddle::framework::LoDTensor> >::_M_erase_at_end(paddle::framework::LoDTensor*)
2   void std::_Destroy<paddle::framework::LoDTensor*, paddle::framework::LoDTensor>(paddle::framework::LoDTensor*, paddle::framework::LoDTensor*, std::allocator<paddle::framework::LoDTensor>&)
3   void std::_Destroy<paddle::framework::LoDTensor*>(paddle::framework::LoDTensor*, paddle::framework::LoDTensor*)
4   void std::_Destroy_aux<false>::__destroy<paddle::framework::LoDTensor*>(paddle::framework::LoDTensor*, paddle::framework::LoDTensor*)
5   void std::_Destroy<paddle::framework::LoDTensor>(paddle::framework::LoDTensor*)
6   paddle::framework::LoDTensor::~LoDTensor()
7   std::vector<paddle::framework::Vector<unsigned long>, std::allocator<paddle::framework::Vector<unsigned long> > >::~vector()
8   void std::_Destroy<paddle::framework::Vector<unsigned long>*, paddle::framework::Vector<unsigned long> >(paddle::framework::Vector<unsigned long>*, paddle::framework::Vector<unsigned long>*, std::allocator<paddle::framework::Vector<unsigned long> >&)
9   void std::_Destroy<paddle::framework::Vector<unsigned long>*>(paddle::framework::Vector<unsigned long>*, paddle::framework::Vector<unsigned long>*)
10  void std::_Destroy_aux<false>::__destroy<paddle::framework::Vector<unsigned long>*>(paddle::framework::Vector<unsigned long>*, paddle::framework::Vector<unsigned long>*)
11  void std::_Destroy<paddle::framework::Vector<unsigned long> >(paddle::framework::Vector<unsigned long>*)
12  paddle::framework::Vector<unsigned long>::~Vector()
13  paddle::framework::details::COWPtr<paddle::framework::Vector<unsigned long>::VectorData>::~COWPtr()
14  std::shared_ptr<paddle::framework::Vector<unsigned long>::VectorData>::~shared_ptr()
15  std::__shared_ptr<paddle::framework::Vector<unsigned long>::VectorData, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr()
16  std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count()
17  paddle::framework::SignalHandle(char const*, int)
18  paddle::platform::GetCurrentTraceBackString[abi:cxx11]()
 
----------------------
Error Message Summary:
----------------------
FatalError: `Segmentation fault` is detected by the operating system.
  [TimeInfo: *** Aborted at 1619458340 (unix time) try "date -d @1619458340" if you are using GNU date ***]
  [SignalInfo: *** SIGSEGV (@0xf) received by PID 26141 (TID 0x7ffaf75af700) from PID 15 ***]
  • We are analyzing this problem and will let you know as soon as possible

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants