Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle errors in forward pass, delete cache and cancel all sequences (Survive OOM) #98

Merged
merged 10 commits into from
Apr 10, 2024

Conversation

lucasavila00
Copy link
Contributor

@lucasavila00 lucasavila00 commented Apr 9, 2024

I was playing around with the codebase, adding a few dbg! around and seeing what goes on under the hood.

While doing so I stumbled upon an Out of Memory error. I was using a very long prompt, and it won't fit my GPU indeed.

When this happened, the server no longer worked, the process was still alive and GPU memory had not been freed.

This behavior can be simulated with panicking anywhere in the forward function of the pipeline: https://github.com/EricLBuehler/mistral.rs/blob/master/mistralrs-core/src/pipeline/mistral.rs#L371

I changed the Pipeline trait to return errors on the forward pass instead of panicking. I handled the errors by setting all sequence states to error, deleting the GPU cache and sending an error message to the receiver.

This makes the server continue to work and frees all GPU memory, allowing new requests to go on normally.

@EricLBuehler What are your thoughts?

If this approach is any good, I'll make the errors OpenAI API compatible too.

Also, I noticed the Status Code of the error message was 200, not 4xx or 5xx (but that's a different issue #97)

Copy link

github-actions bot commented Apr 9, 2024

Code Metrics Report
  ───────────────────────────────────────────────────────────────────────────────
Language                 Files     Lines   Blanks  Comments     Code Complexity
───────────────────────────────────────────────────────────────────────────────
Rust                        46     16395     1158       702    14535        797
───────────────────────────────────────────────────────────────────────────────
Total                       46     16395     1158       702    14535        797
───────────────────────────────────────────────────────────────────────────────
Estimated Cost to Develop 48,880
Estimated Schedule Effort 10.144342 months
Estimated People Required 3.931182
───────────────────────────────────────────────────────────────────────────────
Processed 557802 bytes, 0.558 megabytes (SI)
───────────────────────────────────────────────────────────────────────────────
  

@lucasavila00
Copy link
Contributor Author

@EricLBuehler
Copy link
Owner

Thanks for adding this, I like the idea of just raising an error instead of just panicing. I think it may if it would be good to send back whatever was generated up until the error as a ChatCompletionResponse.

@lucasavila00 lucasavila00 marked this pull request as ready for review April 10, 2024 00:53
mistralrs-core/src/utils/mod.rs Outdated Show resolved Hide resolved
mistralrs-server/src/main.rs Show resolved Hide resolved
mistralrs-server/src/main.rs Outdated Show resolved Hide resolved
mistralrs-server/src/main.rs Show resolved Hide resolved
mistralrs-server/src/main.rs Show resolved Hide resolved
@EricLBuehler EricLBuehler merged commit 3d4d622 into EricLBuehler:master Apr 10, 2024
11 checks passed
@EricLBuehler
Copy link
Owner

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants