Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory utilization increment after every request, worker died, memory issue #974

Open
n0thing233 opened this issue Oct 31, 2021 · 1 comment

Comments

@n0thing233
Copy link

Hi, MMS by default will print memory utilization into log which is great. The problem I have is after each request to MMS, the memory utilization increment a little bit. after several requests, the memory utilization went up to 100% and worker died.
I don't think this is the right behavior right?
I tried gc.collect() in _handle function but it doesn't work.(no gpu available in this machine)
I wonder if anyone can help me out here.
here is an example:
when just started the server, the log shows:
2021-10-31 18:22:25,881 [INFO ] pool-2-thread-1 MMS_METRICS - MemoryUtilization.Percent:5.1|#Level:Host|#hostname:cebbb237ccfc,timestamp:1635704545
After one request:
mms_1 | 2021-10-31 18:24:25,742 [INFO ] pool-2-thread-1 MMS_METRICS - MemoryUtilization.Percent:26.2|#Level:Host|#hostname:cebbb237ccfc,timestamp:1635704665
After second request:
mms_1 | 2021-10-31 18:26:25,601 [INFO ] pool-2-thread-1 MMS_METRICS - MemoryUtilization.Percent:39.7|#Level:Host|#hostname:cebbb237ccfc,timestamp:1635704785
After third request:
mms_1 | 2021-10-31 18:30:25,323 [INFO ] pool-2-thread-1 MMS_METRICS - MemoryUtilization.Percent:58.5|#Level:Host|#hostname:cebbb237ccfc,timestamp:1635705025
After 4th request:
mms_1 | 2021-10-31 18:32:25,187 [INFO ] pool-2-thread-1 MMS_METRICS - MemoryUtilization.Percent:81.6|#Level:Host|#hostname:cebbb237ccfc,timestamp:1635705145
After 5th request,OOM appears:
mms_1 | 2021-10-31 18:35:41,402 [INFO ] epollEventLoopGroup-4-7 com.amazonaws.ml.mms.wlm.WorkerThread - 9000-96795301 Worker disconnected. WORKER_MODEL_LOADED mms_1 | 2021-10-31 18:35:41,528 [DEBUG] W-9000-video_segmentation_v1 com.amazonaws.ml.mms.wlm.WorkerThread - Backend worker monitoring thread interrupted or backend worker process died. mms_1 | java.lang.InterruptedException mms_1 | at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014) mms_1 | at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088) mms_1 | at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418) mms_1 | at com.amazonaws.ml.mms.wlm.WorkerThread.runWorker(WorkerThread.java:148) mms_1 | at com.amazonaws.ml.mms.wlm.WorkerThread.run(WorkerThread.java:211) mms_1 | at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) mms_1 | at java.util.concurrent.FutureTask.run(FutureTask.java:266) mms_1 | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) mms_1 | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) mms_1 | at java.lang.Thread.run(Thread.java:748)

@n0thing233 n0thing233 changed the title worker died and restart, memory issue memory utilization increment after every request, worker died, memory issue Oct 31, 2021
@kastman
Copy link

kastman commented Dec 10, 2021

Commenting to follow - at first I suspected this was involved with #942 , but I tested with that PR and saw no change in behavior compared to the current released version (1.1.4). @n0thing233 - Are you doing any large memory load from inside the predict function, or is it all in the model load?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants