Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENHANCEMENT] how to evaluate a full app behavior? #5274

Open
dcsan opened this issue Nov 5, 2024 · 3 comments
Open

[ENHANCEMENT] how to evaluate a full app behavior? #5274

dcsan opened this issue Nov 5, 2024 · 3 comments
Labels
c/evals enhancement New feature or request

Comments

@dcsan
Copy link

dcsan commented Nov 5, 2024

I can only see how to test an existing LLM directly - eg openAI etc.
what if i want to test the results of my own app, which includes RAG etc features.

is there a way to connect to an API endpoint and run evals?

@dcsan dcsan added enhancement New feature or request triage issues that need triage labels Nov 5, 2024
@github-project-automation github-project-automation bot moved this to 📘 Todo in phoenix Nov 5, 2024
@dosubot dosubot bot added the c/evals label Nov 5, 2024
@Jgilhuly
Copy link
Contributor

Jgilhuly commented Nov 5, 2024

Hi @dcsan , thanks for the question!

Phoenix let's you run evals on any part of your application. Most of our prebuilt evaluators focus on LLM responses, but some focus on things like evaluating your RAG retriever step. That examples walks through how to evaluate documents on how relevant they are to a user's question, using a prebuilt evaluator we have in Phoenix.

All Phoenix evals follow the same general loop of:

  1. Preparing the data you want to evaluate - often this is Phoenix traces
  2. Generate evaluation labels for that data. This could be done with our evaluators like in the example above, or through another approach
  3. Logging the evaluation results back to Phoenix

Another good resource to check out would be this walkthrough of building your own eval pipeline. That might be most similar to what you're thinking of.

Hope that helps. Let me know if you have questions there! Or happy to discuss your use case specifically in our community slack

@dcsan
Copy link
Author

dcsan commented Nov 5, 2024

how can i use this to just call an API and evaluate the response? eg with a field like text ?

I wanted to use your dashboard to manage test data, then use your tool to eval the results (with an LLM) and view the results in your dashboard. seems like a pretty common request? I'm not testing openAI's API, i'm testing my own.

also i can't join the slack without a blessed domain:

image

@Jgilhuly
Copy link
Contributor

Jgilhuly commented Nov 5, 2024

Ah sorry for the link mix up, here's the right one to use: https://join.slack.com/t/arize-ai/shared_invite/zt-22vj03k4k-MlrNEwv5WeswapTs0kNCBw - updated above as well!

To answer your question, here's what you'd do:

  1. Call your API and trace the request in Phoenix. Because you're using your own API, use these docs
  2. Add evaluations to the traces you've captured in Phoenix. Those evaluations could be about any of the information in the traces, they don't have to be related to an LLM call.

Some of the docs use OpenAI as an example, but you can swap in the call to your own API instead.

@RogerHYang RogerHYang removed the triage issues that need triage label Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/evals enhancement New feature or request
Projects
Status: 📘 Todo
Development

No branches or pull requests

3 participants