Skip to content

Conversation

@guoyuhong
Copy link

Plasma Store will do the eviction when the memory allocation fails. When specified a smaller store limit, the memory allocation will succeed but Plasma Store will crash when limit memory reached.
Call stack:

F0912 23:55:49.693681 2961466240 eviction_policy.cc:73]  Check failed: memory_used_ <= store_info_->memory_capacity
*** Check failure stack trace: ***
    @        0x1087d0aca  google::LogMessage::Fail()
    @        0x1087ce8ee  google::LogMessage::SendToLog()
    @        0x1087cf76f  google::LogMessage::Flush()
    @        0x1087cf5a9  google::LogMessage::~LogMessage()
    @        0x1087cf865  google::LogMessage::~LogMessage()
    @        0x1087e8ef5  arrow::ArrowLog::~ArrowLog()
    @        0x1087aef8e  plasma::EvictionPolicy::ObjectCreated()
    @        0x1087a21e8  plasma::PlasmaStore::CreateObject()
    @        0x1087a6e8c  plasma::PlasmaStore::ProcessMessage()
    @        0x1087abc1c  std::__1::__function::__func<>::operator()()
    @        0x1087af81e  plasma::EventLoop::FileEventCallback()
    @        0x1087c8eb1  aeProcessEvents
    @        0x1087c919b  aeMain
    @        0x1087a8508  plasma::PlasmaStoreRunner::Start()
    @        0x1087a81fa  plasma::StartServer()
    @        0x1087a8d9c  main

@pcmoritz
Copy link
Contributor

@guoyuhong Thanks a lot for the patch! Do you currently understand why the crash is happening? How can dlmalloc promise more memory than it has room in the memory mapped file?

Also if you have a small script to reproduce it that would be helpful to see if that's the right fix or if something else is going wrong here (it seems to me that this fix should not be needed).

@guoyuhong
Copy link
Author

@pcmoritz You can use the test case in client_test of this PR to repro it or use the following script:

ray.init(object_store_memory=10000)
message = "This is a message"
for i in range(10000):
    ray.put(message)

@guoyuhong
Copy link
Author

@pcmoritz Have you tried the script or the test case?

@guoyuhong
Copy link
Author

@pcmoritz Maybe here is the problem. In dlmalloc_set_footprint_limit, the input memory is changed by granularity_align(bytes) and save to gm->footprint_limit. In this case, 1000 bytes will be changed 131072 and 1000000000 will be changed to 1000079360. Therefore, when we try to allocate an object which makes footprint between the input limit and the aligned limit, dlmemalign may return a normal pointer but the eviction policy check fails and Plasma crashes.

@guoyuhong
Copy link
Author

@pcmoritz

@pcmoritz
Copy link
Contributor

@guoyuhong I'm currently working on replacing dlmalloc with jemalloc to fix this issue and a bunch of others (memory fragmentation, file descriptor sending problems), let's wait until I have the patch ready which should be in the next couple of days and then we can see which solution is better. How do you feel about that?

@guoyuhong
Copy link
Author

@pcmoritz Thanks for the information. Yes, let's wait for the new allocation function and then see whether there is still this bug.

@pcmoritz
Copy link
Contributor

I put my current progress up here: #2593

It is currently blocked on jemalloc/jemalloc#1329, let me know if you have any ideas here :)

@wesm
Copy link
Member

wesm commented Apr 24, 2019

In #4189 it is proposed to update to jemalloc 5.2.0. Does that resolve the issue?

@wesm
Copy link
Member

wesm commented Jun 3, 2019

@guoyuhong do you know if jemalloc 5.2.0 resolves this issue?

@guoyuhong
Copy link
Author

@wesm Sorry for the late response. I use the SmallMemoryTest in this PR to take a test. Currently, Size 1000 is not allowed. I changed to 3000, it passed. Therefore, I will close this PR.

@guoyuhong guoyuhong closed this Jun 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants