feat(agents-api): added mmr to chat #1013

Vedantsahai18 · 2025-01-03T17:12:13Z

PR Type

Enhancement

Description

Introduced Maximal Marginal Relevance (MMR) to refine document selection.
Filtered document references to ensure valid embeddings for MMR.
Added conditions to enable MMR based on recall options.
Integrated numpy for handling embeddings in MMR implementation.

Changes walkthrough 📝

Relevant files

Enhancement

gather_messages.py `Implement MMR for document selection in chat` agents-api/agents_api/queries/chat/gather_messages.py Added MMR implementation to refine document selection. Filtered document references without embeddings. Integrated `numpy` for embedding handling. Applied MMR based on recall options and limits.	+27/-1

💡 PR-Agent usage: Comment /help "your question" on any pull request to receive relevant information

Important

Introduces Maximal Marginal Relevance (MMR) in gather_messages.py for improved document selection in chat based on recall options.

Behavior:
- Adds MMR to gather_messages() in gather_messages.py for document selection based on recall options.
- Filters doc_references to ensure embeddings are present before applying MMR.
- Uses maximal_marginal_relevance() to select document indices based on query and document embeddings.
Imports:
- Adds imports for maximal_marginal_relevance and numpy in gather_messages.py.
Misc:
- Adds get_history import in utils.py.
- Adds hasattr checks in lifespan() in app.py for state attributes.
- Fixes return logic in aembedding() in litellm.py.

^{This description was created by}^{for be69c60. It will automatically update as commits are pushed.}

qodo-merge-pro-for-open-source · 2025-01-03T17:12:32Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Potential Bug The MMR filtering of documents without embeddings happens after the limit check, which could lead to fewer documents than the limit being returned even when more valid documents are available doc_references = [ doc for doc in doc_references if doc.snippet.embedding is not None ] Code Duplication The comment "Apply MMR" appears twice in succession, and the second MMR application could be combined with the indices calculation for better readability indices = maximal_marginal_relevance( np.asarray(query_embedding), [doc.snippet.embedding for doc in doc_references], k=recall_options.limit, ) # Apply MMR doc_references = [ doc for i, doc in enumerate(doc_references) if i in set(indices) ]

ellipsis-dev

👍 Looks good to me! Reviewed everything up to c00d08d in 24 seconds

More details

Looked at 45 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 3 drafted comments based on config settings.

1. agents-api/agents_api/queries/chat/gather_messages.py:149

Draft comment:
Ensure that filtering out references without embeddings is the intended behavior, as it might result in an empty list if none have embeddings.
Reason this comment was not posted:
Confidence changes required: 50%
The code uses a list comprehension to filter doc_references based on the presence of embeddings. This is a temporary fix, but it should be noted that this could potentially remove all references if none have embeddings, which might not be the intended behavior.

2. agents-api/agents_api/queries/chat/gather_messages.py:155

Draft comment:
Ensure query_embedding is always a list or array-like to avoid potential errors with np.asarray.
Reason this comment was not posted:
Confidence changes required: 50%
The code uses np.asarray(query_embedding) which assumes query_embedding is a list or array-like. If query_embedding is not in the expected format, this could raise an error.

3. agents-api/agents_api/queries/chat/gather_messages.py:161

Draft comment:
Consider using a list comprehension with a condition check instead of set(indices) for potentially better performance.
Reason this comment was not posted:
Confidence changes required: 33%
The code uses set(indices) to filter doc_references. This is efficient but could be optimized by using a list comprehension directly with a condition check.

Workflow ID: wflow_RJnb9YudrHKaruPp

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

qodo-merge-pro-for-open-source · 2025-01-03T17:13:02Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Score
Possible issue	✅ Add error handling for embedding processing to prevent crashes from malformed data Suggestion Impact: The commit added a check for valid embeddings by ensuring there are at least 2 documents with embeddings before applying MMR, which helps prevent crashes from malformed data code diff: + and len([doc for doc in doc_references if doc.snippet.embedding is not None]) >= 2 Add error handling for potential numpy array conversion failures when processing embeddings, as some embeddings might be malformed or have unexpected formats. agents-api/agents_api/queries/chat/gather_messages.py [154-158] -indices = maximal_marginal_relevance( - np.asarray(query_embedding), - [doc.snippet.embedding for doc in doc_references], - k=recall_options.limit, -) +try: + indices = maximal_marginal_relevance( + np.asarray(query_embedding), + [doc.snippet.embedding for doc in doc_references], + k=recall_options.limit, + ) +except (ValueError, TypeError) as e: + logger.error(f"Failed to process embeddings for MMR: {e}") + return doc_references[:recall_options.limit] Apply this suggestion Suggestion importance[1-10]: 8 Why: The suggestion adds crucial error handling for potential numpy array conversion failures, which could prevent application crashes in production. This is particularly important as the code deals with external data that might be malformed.	8
	✅ Validate embedding dimensions to prevent array processing errors Suggestion Impact: The commit adds validation for MMR processing by checking if there are at least 2 documents with non-null embeddings before applying MMR code diff: + if ( + recall_options.mmr_strength > 0 + and len(doc_references) > recall_options.limit + and recall_options.mode != "text" + and len([doc for doc in doc_references if doc.snippet.embedding is not None]) >= 2 Validate the dimensions of embeddings match between query and documents before processing to prevent numpy dimension mismatch errors. agents-api/agents_api/queries/chat/gather_messages.py [154-158] +query_emb = np.asarray(query_embedding) +doc_embeddings = [doc.snippet.embedding for doc in doc_references] +if not all(len(emb) == len(query_emb) for emb in doc_embeddings): + logger.error("Embedding dimensions mismatch") + return doc_references[:recall_options.limit] indices = maximal_marginal_relevance( - np.asarray(query_embedding), - [doc.snippet.embedding for doc in doc_references], + query_emb, + doc_embeddings, k=recall_options.limit, ) Apply this suggestion Suggestion importance[1-10]: 8 Why: This suggestion adds critical validation of embedding dimensions, which is essential for preventing numpy dimension mismatch errors that could occur during MMR processing. Such errors could silently break the functionality.	8
	✅ Add validation to ensure sufficient valid documents exist before applying document selection algorithm Validate that there are enough documents with valid embeddings before applying MMR to avoid potential index errors or empty results. agents-api/agents_api/queries/chat/gather_messages.py [139-146] if ( recall_options.mmr_strength > 0 and len(doc_references) > recall_options.limit and recall_options.mode != "text" + and len([doc for doc in doc_references if doc.snippet.embedding is not None]) >= 2 ): `[Suggestion has been applied]` Suggestion importance[1-10]: 7 Why: The suggestion adds an important validation check to ensure there are enough valid documents before applying MMR, preventing potential errors from insufficient data. This complements the existing filtering of documents without embeddings.	7

qodo-merge-pro-for-open-source · 2025-01-03T17:13:32Z

CI Failure Feedback 🧐

(Checks updated until commit `e92fb4d`)

Action: Lint-And-Format
Failed stage: Run stefanzweifel/git-auto-commit-action@v4 [❌]
Failure summary: The action failed during the git auto-commit process because: The push to branch `f/add-mmr-pg` was rejected due to the local branch being behind its remote counterpart Git returned a non-fast-forward error, indicating that the remote branch has commits that aren't present in the local branch The error suggests running 'git pull' to integrate remote changes before pushing again
Relevant error logs: 1: ##[group]Operating System 2: Ubuntu ... 1256: �[36;1muv run poe lint�[0m 1257: shell: /usr/bin/bash -e {0} 1258: env: 1259: UV_CACHE_DIR: /home/runner/work/_temp/setup-uv-cache 1260: ##[endgroup] 1261: �[37mPoe =>�[0m �[94mruff format�[0m 1262: 1 file reformatted, 283 files left unchanged 1263: �[37mPoe =>�[0m �[94mruff check�[0m 1264: Fixed 1 error: 1265: - agents_api/activities/task_steps/base_evaluate.py: 1266: 1 × F841 (unused-variable) 1267: Found 1 error (1 fixed, 0 remaining). ... 1301: * [new branch] f/entries-summarization -> origin/f/entries-summarization 1302: * [new branch] f/grafana-traefik -> origin/f/grafana-traefik 1303: * [new branch] f/increase-test-coverage -> origin/f/increase-test-coverage 1304: * [new branch] f/integrations-sentry -> origin/f/integrations-sentry 1305: * [new branch] f/prompt-step-run-tools -> origin/f/prompt-step-run-tools 1306: * [new branch] f/remote-browser-args-connect-url -> origin/f/remote-browser-args-connect-url 1307: * [new branch] f/scispace-poc -> origin/f/scispace-poc 1308: * [new branch] f/switch-to-pg -> origin/f/switch-to-pg 1309: * [new branch] f/tasks-validation-errors-enhancement -> origin/f/tasks-validation-errors-enhancement ... 1326: [f/add-mmr-pg 21d0e33] refactor: Lint agents-api (CI) 1327: Author: Vedantsahai18 <[email protected]> 1328: 1 file changed, 2 insertions(+), 4 deletions(-) 1329: INPUT_TAGGING_MESSAGE: 1330: No tagging message supplied. No tag will be added. 1331: INPUT_PUSH_OPTIONS: 1332: To https://github.com/julep-ai/julep 1333: ! [rejected] f/add-mmr-pg -> f/add-mmr-pg (non-fast-forward) 1334: error: failed to push some refs to 'https://github.com/julep-ai/julep' 1335: hint: Updates were rejected because the tip of your current branch is behind 1336: hint: its remote counterpart. If you want to integrate the remote changes, 1337: hint: use 'git pull' before pushing again. 1338: hint: See the 'Note about fast-forwards' in 'git push --help' for details. 1339: Error: Invalid status code: 1 1340: at ChildProcess.<anonymous> (/home/runner/work/_actions/stefanzweifel/git-auto-commit-action/v4/index.js:17:19) 1341: at ChildProcess.emit (node:events:519:28) 1342: at maybeClose (node:internal/child_process:1105:16) 1343: at ChildProcess._handle.onexit (node:internal/child_process:305:5) { 1344: code: 1 1345: } 1346: Error: Invalid status code: 1 ... 1352: [command]/usr/bin/git version 1353: git version 2.47.1 1354: Temporarily overriding HOME='/home/runner/work/_temp/c70d4abf-f37c-477d-95f3-0ba15fe68f0a' before making global git config changes 1355: Adding repository directory to the temporary git global config as a safe directory 1356: [command]/usr/bin/git config --global --add safe.directory /home/runner/work/julep/julep 1357: [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 1358: [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' \|\| :" 1359: fatal: No url found for submodule path 'drafts/cozo' in .gitmodules 1360: ##[warning]The process '/usr/bin/git' failed with exit code 128

✨ CI feedback usage guide:

The CI feedback tool (/checks) automatically triggers when a PR has a failed check.
The tool analyzes the failed checks and provides several feedbacks:

Failed stage
Failed test name
Failure summary
Relevant error logs

In addition to being automatically triggered, the tool can also be invoked manually by commenting on a PR:

/checks "https://github.com/{repo_name}/actions/runs/{run_number}/job/{job_number}"

where {repo_name} is the name of the repository, {run_number} is the run number of the failed check, and {job_number} is the job number of the failed check.

Configuration options

enable_auto_checks_feedback - if set to true, the tool will automatically provide feedback when a check is failed. Default is true.
excluded_checks_list - a list of checks to exclude from the feedback, for example: ["check1", "check2"]. Default is an empty list.
enable_help_text - if set to true, the tool will provide a help message with the feedback. Default is true.
persistent_comment - if set to true, the tool will overwrite a previous checks comment with the new feedback. Default is true.
final_update_message - if persistent_comment is true and updating a previous checks message, the tool will also create a new message: "Persistent checks updated to latest commit". Default is true.

See more information about the checks tool in the docs.

agents-api/agents_api/queries/chat/gather_messages.py

Co-authored-by: qodo-merge-pro-for-open-source[bot] <189517486+qodo-merge-pro-for-open-source[bot]@users.noreply.github.com>

ellipsis-dev

👍 Looks good to me! Incremental review on 5608d8e in 31 seconds

More details

Looked at 1045 lines of code in 14 files
Skipped 0 files when reviewing.
Skipped posting 2 drafted comments based on config settings.

1. agents-api/agents_api/queries/chat/gather_messages.py:80

Draft comment:
The check for if recall: is redundant because the subsequent check if recall and recall_options: already covers the case where recall is False. Consider removing the first check to simplify the logic.
Reason this comment was not posted:
Comment was not on a valid diff hunk.

2. agents-api/agents_api/queries/chat/gather_messages.py:161

Draft comment:
Using set(indices) is unnecessary since indices is already a list of unique values. You can directly use indices in the list comprehension.
Reason this comment was not posted:
Confidence changes required: 50%
In gather_messages.py, the code uses set(indices) which is unnecessary since indices is already a list of unique values. This can be simplified by directly using indices.

Workflow ID: wflow_O9uUJyIGRVtjV8eW

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev

👍 Looks good to me! Incremental review on 6f1521c in 46 seconds

More details

Looked at 446 lines of code in 9 files
Skipped 0 files when reviewing.
Skipped posting 1 drafted comments based on config settings.

1. agents-api/agents_api/activities/utils.py:81

Draft comment:
Consider using yaml.safe_load(s) instead of yaml.load(s) to avoid potential security risks associated with loading YAML data.
Reason this comment was not posted:
Comment was on unchanged code.

Workflow ID: wflow_TKOTdxHdYcqvWyvN

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev

👍 Looks good to me! Incremental review on 9c12ad3 in 1 minute and 1 seconds

More details

Looked at 588 lines of code in 6 files
Skipped 0 files when reviewing.
Skipped posting 1 drafted comments based on config settings.

1. cookbooks/04-hook-generator-trending-reels.py:1

Draft comment:
The PR description mentions changes in gather_messages.py related to MMR, but the file is not visible in the diff. Please ensure all relevant files are included in the PR.
Reason this comment was not posted:
Comment did not seem useful.

Workflow ID: wflow_zdtXJKX4W4HTvAEv

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev

👍 Looks good to me! Incremental review on 7412ca2 in 14 seconds

More details

Looked at 136 lines of code in 5 files
Skipped 0 files when reviewing.
Skipped posting 1 drafted comments based on config settings.

1. agents-api/agents_api/queries/executions/list_execution_transitions.py:63

Draft comment:
The @pg_query(debug=True) decorator was removed. If debugging is still needed, consider adding a logging mechanism or re-adding the debug flag.
Reason this comment was not posted:
Confidence changes required: 50%
The decorator @pg_query(debug=True) was removed, which might affect debugging capabilities. This change should be noted.

Workflow ID: wflow_dSYhlVyan1bxfsMp

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

agents-api/agents_api/activities/task_steps/base_evaluate.py

agents-api/agents_api/activities/execute_integration.py

agents-api/agents_api/app.py

ellipsis-dev

👍 Looks good to me! Incremental review on be69c60 in 16 seconds

More details

Looked at 16 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 0 drafted comments based on config settings.

Workflow ID: wflow_AgYgsQJU0Y35L178

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

feat(agents-api): added mmr to chat

c00d08d

qodo-merge-pro-for-open-source bot added the Review effort [1-5]: 2 label Jan 3, 2025

ellipsis-dev bot reviewed Jan 3, 2025

View reviewed changes

qodo-merge-pro-for-open-source bot reviewed Jan 3, 2025

View reviewed changes

agents-api/agents_api/queries/chat/gather_messages.py Outdated Show resolved Hide resolved

Vedantsahai18 and others added 2 commits January 3, 2025 12:18

Update agents-api/agents_api/queries/chat/gather_messages.py

a38dca1

Co-authored-by: qodo-merge-pro-for-open-source[bot] <189517486+qodo-merge-pro-for-open-source[bot]@users.noreply.github.com>

chore: misc fixes and refactors

5608d8e

ellipsis-dev bot reviewed Jan 3, 2025

View reviewed changes

chore(agents-api): misc validations

6f1521c

ellipsis-dev bot reviewed Jan 3, 2025

View reviewed changes

chore: misc refactor

9c12ad3

ellipsis-dev bot reviewed Jan 3, 2025

View reviewed changes

chore(agents-api): misc fixes + refators

7412ca2

ellipsis-dev bot reviewed Jan 4, 2025

View reviewed changes

refactor: Lint agents-api (CI)

e92fb4d

HamadaSalhab reviewed Jan 4, 2025

View reviewed changes

agents-api/agents_api/activities/task_steps/base_evaluate.py Outdated Show resolved Hide resolved

agents-api/agents_api/activities/execute_integration.py Show resolved Hide resolved

agents-api/agents_api/app.py Show resolved Hide resolved

chore(agents-api): fix base_evaluate

be69c60

ellipsis-dev bot reviewed Jan 4, 2025

View reviewed changes

HamadaSalhab merged commit dfc9af3 into f/switch-to-pg Jan 4, 2025
11 checks passed

HamadaSalhab deleted the f/add-mmr-pg branch January 4, 2025 05:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(agents-api): added mmr to chat #1013

feat(agents-api): added mmr to chat #1013

Vedantsahai18 commented Jan 3, 2025 •

edited by ellipsis-dev bot

Loading

qodo-merge-pro-for-open-source bot commented Jan 3, 2025

ellipsis-dev bot left a comment

qodo-merge-pro-for-open-source bot commented Jan 3, 2025 •

edited

Loading

qodo-merge-pro-for-open-source bot commented Jan 3, 2025 •

edited

Loading

Configuration options

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

feat(agents-api): added mmr to chat #1013

feat(agents-api): added mmr to chat #1013

Conversation

Vedantsahai18 commented Jan 3, 2025 • edited by ellipsis-dev bot Loading

PR Type

Description

Changes walkthrough 📝

qodo-merge-pro-for-open-source bot commented Jan 3, 2025

PR Reviewer Guide 🔍

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

qodo-merge-pro-for-open-source bot commented Jan 3, 2025 • edited Loading

PR Code Suggestions ✨

qodo-merge-pro-for-open-source bot commented Jan 3, 2025 • edited Loading

CI Failure Feedback 🧐

(Checks updated until commit e92fb4d)

Configuration options

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Vedantsahai18 commented Jan 3, 2025 •

edited by ellipsis-dev bot

Loading

qodo-merge-pro-for-open-source bot commented Jan 3, 2025 •

edited

Loading

qodo-merge-pro-for-open-source bot commented Jan 3, 2025 •

edited

Loading

(Checks updated until commit `e92fb4d`)