Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a comparison workflow / app #18

Open
Tracked by #26
dharhas opened this issue Aug 24, 2023 · 3 comments
Open
Tracked by #26

Add a comparison workflow / app #18

dharhas opened this issue Aug 24, 2023 · 3 comments
Labels
area: documentation 📖 Improvements or additions to documentation area: user-experience 🙋🏻‍♀️ impact: medium 🟨 This item affects some users, not critical type: enhancement 💅 New feature or request

Comments

@dharhas
Copy link
Member

dharhas commented Aug 24, 2023

A common use case during the research/exploration phase is to compare/assess the difference in responses based on different embedding models/llm's etc. It would be useful to make this workflow easier.

i.e.

  • setup a matrix of configuration options
  • setup the list of docs to use
  • setup a list of questions to ask
  • summarize the responses from each configuration along with any relevant metrics (response time etc)
  • potentially calculate similarity scores between responses.

This could be a fairly straightforward panel app.

@iiLaurens
Copy link

iiLaurens commented Nov 5, 2023

+1

In addition a basic annotation work flow would help. Getting anecdotal evidence of a good RAG is obviously nice, but a more systematic approach would help compare different configurations.

I often make a set of questions and annotated chunks (labels could be "relevant", "inconclusive" or "irrelevant" in its ability to answer the question). Then make a summary table that shows per question and model configuration how well the embedding models rank on retrieving the relevant (and inconclusive) chunks. This also helps me in the future when new embedding models are released and I want to test them.

@pmeier
Copy link
Member

pmeier commented Nov 6, 2023

Add a comparison workflow / app

We need to clarify what we want here. Since we have a fully featured Python API, the "workflow" part is already covered. However, if you haven't worked with async programming before, it might be non-obvious. We should have an example in the documentation for this.

As for the "app" part, I'm not super enthusiastic about it. This whole use case screams experimentation. And for that you need all kinds of knobs, which is very hard to get consistent in a general UI. This is why we built the Python API (note that the issue was created before the Python API was a thing). IMO, if someone really wants / needs an UI for that, it should be a third-party app that builds on top of the Ragna Python / REST API.

@pmeier pmeier added this to the 0.2.0 milestone Nov 27, 2023
@pmeier pmeier added impact: medium 🟨 This item affects some users, not critical and removed impact: low 🟩 Minimally disruptive or niche labels Nov 27, 2023
@pmeier
Copy link
Member

pmeier commented Nov 27, 2023

Bumping impact to medium. For the 0.2.0 release, we are going to add an example to the documentation to describe the comparison workflow together with the Python API. Although still not enthusiastic about a possible UI for this, let's open a separate issue for that as soon as the documentation is updated.

@pmeier pmeier removed this from the 0.2.0 milestone Feb 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: documentation 📖 Improvements or additions to documentation area: user-experience 🙋🏻‍♀️ impact: medium 🟨 This item affects some users, not critical type: enhancement 💅 New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants