fix: llama.cpp integration model load and chat experience #5823

louis-menlo · 2025-07-20T16:02:59Z

Describe Your Changes

This PR addresses a few issues in the llama.cpp extension related to backend downloading, model loading, and generation management.

Added abort request to streaming request. So it stops generating message immediately when user cancel the request without involving model unload.
Ensure backend is ready before loading model bug: Jan allows LLM inference before llamacpp engine download is complete #5780. We find that unblocking UI works better than asking users to wait and send message manually.
Llama.cpp extension should not increase app start time, e.g. Backend download could block the GUI (since it waits for onLoad to be completed) which is not really good.

Self Checklist

Added relevant comments, esp in complex areas
Updated docs (for bug fixes / features)
Created issues for follow-up changes or refactoring needed

Important

This PR improves llama.cpp integration by adding an abort mechanism for streaming requests, ensuring backend readiness before model loading, and preventing backend download from blocking the UI.

Behavior:
- Adds abort mechanism to streaming requests in AIEngine.ts and index.ts to stop message generation immediately upon user cancellation.
- Ensures backend readiness before model loading in index.ts by implementing ensureBackendReady().
- Prevents backend download from blocking UI in index.ts by configuring backends asynchronously.
UI/UX:
- Removes stopAllModels() call from ChatInput.tsx to avoid unnecessary model unloading.
- Updates download completion toast message in multiple locale files to use {{item}} instead of {{modelId}}.
Misc:
- Minor code style changes in AIEngine.ts (removal of semicolons).

^{This description was created by}^{for 4c482ad. You can customize this summary. It will automatically update as commits are pushed.}

ellipsis-dev

Important

Looks good to me! 👍

Reviewed everything up to 4c482ad in 1 minute and 59 seconds. Click for details.

Reviewed 454 lines of code in 12 files
Skipped 0 files when reviewing.
Skipped posting 6 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. web-app/src/containers/ChatInput.tsx:161

Draft comment:
Stop streaming callback no longer calls stopAllModels(). Ensure this removal is intentional and that aborting streaming doesn’t require unloading all models.
Reason this comment was not posted:
Comment looked like it was already resolved.

2. web-app/src/containers/DownloadManegement.tsx:183

Draft comment:
The toast for download completion now uses the parameter '{{item}}' instead of '{{modelId}}'. Verify that the event payloads and translation keys are updated consistently throughout the code.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the author to verify that event payloads and translation keys are updated consistently, which is against the rules. It doesn't provide a specific suggestion or point out a clear issue.

3. web-app/src/hooks/useChat.ts:402

Draft comment:
Good use of abortController signal check before setting the model load error. This prevents error messages when a user aborts requests.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is purely informative, as it praises the use of abortController signal check without suggesting any changes or improvements. It doesn't align with the rules for useful comments.

4. web-app/src/lib/completion.ts:185

Draft comment:
Passing the abortController to engine.chat and to the fetch (via tokenJS) is a proper integration for cancellation. Consider additional error handling in case the abort event is triggered.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 30% vs. threshold = 50% The comment has two parts: 1) Confirming the abort integration is good (which is not useful as a comment), and 2) Suggesting additional error handling. While error handling for aborts could be valuable, we don't know what error handling already exists in the engine.chat() implementation. The comment is speculative without seeing that implementation. I might be too harsh - error handling for aborts is generally good practice. And the comment is specifically about the changed code. However, without seeing the engine implementation, we can't be certain that additional error handling is needed. The comment is speculative and asks the author to "consider" something rather than pointing out a definite issue. The comment should be deleted because it is partly congratulatory (which isn't useful) and partly speculative about potential improvements without clear evidence they are needed.

5. web-app/src/locales/de-DE/common.json:251

Draft comment:
Translation update: Changed description key for downloadComplete to use '{{item}}' instead of '{{modelId}}'. Confirm that the backend and frontend are updated to pass 'item'.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the author to confirm that both the backend and frontend are updated to pass 'item'. This is a request for confirmation, which violates the rule against asking the PR author to confirm their intention or ensure behavior is intended.

6. core/src/browser/extensions/engines/AIEngine.ts:171

Draft comment:
Typo: The comment on this line is missing a space after the slashes. It would improve readability and consistency with the comment on line 172 to have a space, e.g., // name of the model.
Reason this comment was not posted:
Comment was on unchanged code.

Workflow ID: wflow_dtuqC3k5yA9uiQgx

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

github-actions · 2025-07-20T16:07:55Z

Barecheck - Code coverage report

Total: 35.01%

Your code coverage diff: 0.00% ▴

Uncovered files and lines

File	Lines
web-app/src/containers/ChatInput.tsx	81-83, 103-104, 140-141, 163-164, 171-172, 175-178, 181-193, 196, 198-200, 206, 208, 210-211, 213-217, 220-221, 224-229, 231-234, 236-240, 242-254, 256-266, 268-277, 279-282, 312-320, 322-327, 329-346, 348-350, 352-354, 356-357, 406, 445-459, 475-477, 482-484, 499-503, 518-532, 535-549, 559, 573, 589, 591-594
web-app/src/containers/DownloadManegement.tsx	1, 6-17, 19-31, 33-38, 40-47, 49-50, 54-64, 66-75, 77-85, 87, 89-95, 98-106, 108-109, 111-116, 118-124, 127-132, 134-135, 137-143, 145-157, 159-166, 168-175, 177-191, 193-198, 201-203, 205-210, 213-228, 230-233, 235-256, 258-268, 270, 272-277, 279-289, 291-299, 301-302, 304-307, 309-351, 353, 355
web-app/src/hooks/useChat.ts	94-106, 120-126, 136-158, 161, 163, 165, 168, 171-176, 178-179, 184-186, 189-193, 196, 199-201, 203-211, 222, 252-253, 255, 275-286, 288-295, 297-313, 316-331, 333-344, 346-351, 353-362, 364-368, 370-372, 375-381, 383-389, 406
web-app/src/lib/completion.ts	130-138, 140, 142-143, 145-146, 148, 150, 153-159, 161-163, 165-169, 171-172, 175-183, 185, 187-202, 204-225, 241-247, 298, 301-306, 308-311, 327-334, 336-358, 361-366, 368-393, 395-404, 406-420, 422-425

qnixsynapse

Lgtm

louis-menlo added 3 commits July 20, 2025 21:04

fix: stop generating should not stop running models

73b95db

fix: ensure backend ready before loading model

b7c88c2

fix: backend setting should not block onLoad

4c482ad

github-project-automation bot added this to Jan Jul 20, 2025

louis-menlo added this to the v0.6.6 milestone Jul 20, 2025

github-actions bot assigned louis-menlo Jul 20, 2025

louis-menlo linked an issue Jul 20, 2025 that may be closed by this pull request

bug: Jan allows LLM inference before llamacpp engine download is complete #5780

Closed

4 tasks

louis-menlo requested a review from qnixsynapse July 20, 2025 16:04

louis-menlo moved this to Needs Review in Jan Jul 20, 2025

ellipsis-dev bot reviewed Jul 20, 2025

View reviewed changes

qnixsynapse approved these changes Jul 20, 2025

View reviewed changes

louis-menlo merged commit bc4fe52 into release/v0.6.6 Jul 21, 2025
24 of 27 checks passed

louis-menlo deleted the fix/llamacpp-chat-experiencene branch July 21, 2025 02:29

github-project-automation bot moved this from Needs Review to QA in Jan Jul 21, 2025

louis-menlo mentioned this pull request Jul 21, 2025

bug: Jan allows LLM inference before llamacpp engine download is complete #5780

Closed

4 tasks

louis-menlo mentioned this pull request Jul 29, 2025

Sync Release/v0.6.6 into dev #5973

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: llama.cpp integration model load and chat experience #5823

fix: llama.cpp integration model load and chat experience #5823

Uh oh!

louis-menlo commented Jul 20, 2025 •

edited by ellipsis-dev bot

Loading

Uh oh!

ellipsis-dev bot left a comment

Uh oh!

github-actions bot commented Jul 20, 2025

Uh oh!

qnixsynapse left a comment

Uh oh!

Uh oh!

Uh oh!

fix: llama.cpp integration model load and chat experience #5823

fix: llama.cpp integration model load and chat experience #5823

Uh oh!

Conversation

louis-menlo commented Jul 20, 2025 • edited by ellipsis-dev bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe Your Changes

Self Checklist

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jul 20, 2025

Barecheck - Code coverage report

Uh oh!

qnixsynapse left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

louis-menlo commented Jul 20, 2025 •

edited by ellipsis-dev bot

Loading