Skip to content

feat(rag): add dual LLM RAG example#1971

Merged
holtskinner merged 14 commits intoGoogleCloudPlatform:mainfrom
AutoViML:main
May 30, 2025
Merged

feat(rag): add dual LLM RAG example#1971
holtskinner merged 14 commits intoGoogleCloudPlatform:mainfrom
AutoViML:main

Conversation

@AutoViML
Copy link
Member

Description

Thank you for opening a Pull Request!
Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • [X ] Follow the CONTRIBUTING Guide.
  • [X ] You are listed as the author in your notebook or README file.
    • [ X] Your account is listed in CODEOWNERS for the file(s).
  • [ X] Make your Pull Request title in the https://www.conventionalcommits.org/ specification.
  • [ X] Ensure the tests and linter pass (Run nox -s format from the repository root to format).
  • [ X] Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

Ram Seshadri and others added 7 commits March 27, 2025 21:04
…ertex_rag_demo_dual_llms.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…ertex_rag_demo_dual_llms.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@AutoViML AutoViML requested a review from a team as a code owner April 15, 2025 14:45
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @AutoViML, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

This pull request introduces a new feature: a dual LLM RAG (Retrieval-Augmented Generation) example. This demo allows comparison of responses from two different LLMs using Vertex AI Search over a corpus of documents. It also includes an optional judge model to evaluate the two responses and determine which one is better aligned with the query and context. The demo is built using Streamlit and includes instructions for customization, such as modifying prompts for different use cases.

Highlights

  • Dual LLM Comparison: Enables comparing responses from two LLMs using Vertex AI Search.
  • Judge Model Evaluation: Option to use a judge model to evaluate and compare the responses from the two LLMs.
  • Customizable Prompts: Provides instructions on how to customize the prompts for different use cases by modifying text files in the prompts folder.
  • Streamlit Demo: Implements a Streamlit application for easy interaction and demonstration of the dual LLM RAG system.

Changelog

Click here to see the changelog
  • search/retrieval-augmented-generation/rag_with_dual_llms/README.md
    • Added a README file with instructions on how to run the demo, including setting up Google Cloud SDK, authenticating, installing requirements, and running the Streamlit app.
    • Includes instructions on how to customize the prompts for different use cases.
  • search/retrieval-augmented-generation/rag_with_dual_llms/requirements.txt
    • Added necessary dependencies such as streamlit, ollama, langchain, google-generativeai, requests, PyPDF2, tiktoken, faiss-cpu, pandas, fsspec, gcsfs, google-cloud-aiplatform, google-cloud-discoveryengine, google-genai, langchain-google-vertexai, langchain-google-community, asynciolimiter, asyncio, tqdm, re, and json.
  • search/retrieval-augmented-generation/rag_with_dual_llms/src/prompts/judge_model_name.txt
    • Added a text file containing the name of the judge model to be used (gemini-2.0-flash-thinking-exp-01-21).
  • search/retrieval-augmented-generation/rag_with_dual_llms/src/prompts/judge_prompt.txt
    • Added a prompt for the judge model, instructing it to analyze two responses and provide a judgment on which response is better and why.
  • search/retrieval-augmented-generation/rag_with_dual_llms/src/prompts/rephraser.txt
    • Added a prompt for rephrasing user queries into concise search engine queries, specifically tailored for a chef advisor use case.
  • search/retrieval-augmented-generation/rag_with_dual_llms/src/prompts/summarizer.txt
    • Added a prompt for summarizing documents retrieved from a recipe database, tailored for an Asian cuisine advisor use case.
  • search/retrieval-augmented-generation/rag_with_dual_llms/src/prompts/system_instruction.txt
    • Added system instructions for the AI chatbot, defining its role as a cooking assistance chatbot and preventing mission changes.
  • search/retrieval-augmented-generation/rag_with_dual_llms/src/vertex_rag_demo_dual_llms_with_judge.py
    • Created a Streamlit application that compares responses from two LLMs using Vertex AI Search.
    • Includes functionality for rephrasing queries, retrieving relevant documents, generating summaries, and evaluating responses with an optional judge model.
    • Added model selection, RAG setup, and chat history management.
    • Implements logic for generating responses using Gemini and Ollama models.
    • Includes argument parsing for enabling the judge model via command line.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.


Two models converse with might,
RAG weaves knowledge in the light,
A judge then casts a discerning eye,
Which answer soars, reaching for the sky?
In AI's realm, the best we try.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a dual LLM RAG example, which is a valuable addition to the repository. The code provides a Streamlit-based demo for comparing responses from two LLMs using Vertex AI Search, and includes a judge model for evaluating the responses. Overall, the implementation is well-structured and includes helpful features such as customizable prompts and clear instructions. However, there are a few areas that could be improved to enhance the code's robustness and maintainability.

Summary of Findings

  • Error Handling in Prompt Loading: The prompt loading functions (load_text_file, _load_prompt_template) include error handling, but the error messages displayed to the user could be more informative. Consider providing more context or specific instructions to help users resolve the issue.
  • Model Initialization: The initialize_models function attempts to list both Gemini and Ollama models. The error handling for Ollama model listing could be improved to provide more specific guidance to the user, such as checking if the Ollama server is running or if the correct URL is being used.
  • Code Clarity and Maintainability: In several functions, there are opportunities to improve code clarity and maintainability by reducing redundancy and simplifying complex logic. For example, the generate_gemini_response and generate_ollama_response functions could benefit from a shared error handling mechanism.

Merge Readiness

The pull request is a valuable contribution and is mostly well-implemented. However, before merging, it's recommended to address the identified issues, particularly those related to error handling and code clarity. Addressing these points will improve the robustness and maintainability of the code. I am unable to approve this pull request, and users should have others review and approve this code before merging.

@holtskinner
Copy link
Collaborator

@AutoViML Can you please resolve the remaining spelling errors and lint errors?

@holtskinner holtskinner merged commit 08814d9 into GoogleCloudPlatform:main May 30, 2025
4 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants