Skip to content

Conversation

louis-menlo
Copy link
Contributor

@louis-menlo louis-menlo commented Jul 20, 2025

Describe Your Changes

This PR addresses a few issues in the llama.cpp extension related to backend downloading, model loading, and generation management.

  • Added abort request to streaming request. So it stops generating message immediately when user cancel the request without involving model unload.
  • Ensure backend is ready before loading model bug: Jan allows LLM inference before llamacpp engine download is complete #5780. We find that unblocking UI works better than asking users to wait and send message manually.
  • Llama.cpp extension should not increase app start time, e.g. Backend download could block the GUI (since it waits for onLoad to be completed) which is not really good.

Self Checklist

  • Added relevant comments, esp in complex areas
  • Updated docs (for bug fixes / features)
  • Created issues for follow-up changes or refactoring needed

Important

This PR improves llama.cpp integration by adding an abort mechanism for streaming requests, ensuring backend readiness before model loading, and preventing backend download from blocking the UI.

  • Behavior:
    • Adds abort mechanism to streaming requests in AIEngine.ts and index.ts to stop message generation immediately upon user cancellation.
    • Ensures backend readiness before model loading in index.ts by implementing ensureBackendReady().
    • Prevents backend download from blocking UI in index.ts by configuring backends asynchronously.
  • UI/UX:
    • Removes stopAllModels() call from ChatInput.tsx to avoid unnecessary model unloading.
    • Updates download completion toast message in multiple locale files to use {{item}} instead of {{modelId}}.
  • Misc:
    • Minor code style changes in AIEngine.ts (removal of semicolons).

This description was created by Ellipsis for 4c482ad. You can customize this summary. It will automatically update as commits are pushed.

@louis-menlo louis-menlo added this to the v0.6.6 milestone Jul 20, 2025
@louis-menlo louis-menlo linked an issue Jul 20, 2025 that may be closed by this pull request
4 tasks
@louis-menlo louis-menlo requested a review from qnixsynapse July 20, 2025 16:04
@louis-menlo louis-menlo moved this to Needs Review in Jan Jul 20, 2025
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed everything up to 4c482ad in 1 minute and 59 seconds. Click for details.
  • Reviewed 454 lines of code in 12 files
  • Skipped 0 files when reviewing.
  • Skipped posting 6 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. web-app/src/containers/ChatInput.tsx:161
  • Draft comment:
    Stop streaming callback no longer calls stopAllModels(). Ensure this removal is intentional and that aborting streaming doesn’t require unloading all models.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.
2. web-app/src/containers/DownloadManegement.tsx:183
  • Draft comment:
    The toast for download completion now uses the parameter '{{item}}' instead of '{{modelId}}'. Verify that the event payloads and translation keys are updated consistently throughout the code.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the author to verify that event payloads and translation keys are updated consistently, which is against the rules. It doesn't provide a specific suggestion or point out a clear issue.
3. web-app/src/hooks/useChat.ts:402
  • Draft comment:
    Good use of abortController signal check before setting the model load error. This prevents error messages when a user aborts requests.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is purely informative, as it praises the use of abortController signal check without suggesting any changes or improvements. It doesn't align with the rules for useful comments.
4. web-app/src/lib/completion.ts:185
  • Draft comment:
    Passing the abortController to engine.chat and to the fetch (via tokenJS) is a proper integration for cancellation. Consider additional error handling in case the abort event is triggered.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 30% vs. threshold = 50% The comment has two parts: 1) Confirming the abort integration is good (which is not useful as a comment), and 2) Suggesting additional error handling. While error handling for aborts could be valuable, we don't know what error handling already exists in the engine.chat() implementation. The comment is speculative without seeing that implementation. I might be too harsh - error handling for aborts is generally good practice. And the comment is specifically about the changed code. However, without seeing the engine implementation, we can't be certain that additional error handling is needed. The comment is speculative and asks the author to "consider" something rather than pointing out a definite issue. The comment should be deleted because it is partly congratulatory (which isn't useful) and partly speculative about potential improvements without clear evidence they are needed.
5. web-app/src/locales/de-DE/common.json:251
  • Draft comment:
    Translation update: Changed description key for downloadComplete to use '{{item}}' instead of '{{modelId}}'. Confirm that the backend and frontend are updated to pass 'item'.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the author to confirm that both the backend and frontend are updated to pass 'item'. This is a request for confirmation, which violates the rule against asking the PR author to confirm their intention or ensure behavior is intended.
6. core/src/browser/extensions/engines/AIEngine.ts:171
  • Draft comment:
    Typo: The comment on this line is missing a space after the slashes. It would improve readability and consistency with the comment on line 172 to have a space, e.g., // name of the model.
  • Reason this comment was not posted:
    Comment was on unchanged code.

Workflow ID: wflow_dtuqC3k5yA9uiQgx

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Copy link
Contributor

@qnixsynapse qnixsynapse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm

@louis-menlo louis-menlo merged commit bc4fe52 into release/v0.6.6 Jul 21, 2025
24 of 27 checks passed
@louis-menlo louis-menlo deleted the fix/llamacpp-chat-experiencene branch July 21, 2025 02:29
@github-project-automation github-project-automation bot moved this from Needs Review to QA in Jan Jul 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

bug: Jan allows LLM inference before llamacpp engine download is complete
2 participants