-
Notifications
You must be signed in to change notification settings - Fork 292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENHANCEMENT] how to evaluate a full app behavior? #5274
Comments
Hi @dcsan , thanks for the question! Phoenix let's you run evals on any part of your application. Most of our prebuilt evaluators focus on LLM responses, but some focus on things like evaluating your RAG retriever step. That examples walks through how to evaluate documents on how relevant they are to a user's question, using a prebuilt evaluator we have in Phoenix. All Phoenix evals follow the same general loop of:
Another good resource to check out would be this walkthrough of building your own eval pipeline. That might be most similar to what you're thinking of. Hope that helps. Let me know if you have questions there! Or happy to discuss your use case specifically in our community slack |
how can i use this to just call an API and evaluate the response? eg with a field like I wanted to use your dashboard to manage test data, then use your tool to eval the results (with an LLM) and view the results in your dashboard. seems like a pretty common request? I'm not testing openAI's API, i'm testing my own. also i can't join the slack without a blessed domain: |
Ah sorry for the link mix up, here's the right one to use: https://join.slack.com/t/arize-ai/shared_invite/zt-22vj03k4k-MlrNEwv5WeswapTs0kNCBw - updated above as well! To answer your question, here's what you'd do:
Some of the docs use OpenAI as an example, but you can swap in the call to your own API instead. |
I can only see how to test an existing LLM directly - eg openAI etc.
what if i want to test the results of my own app, which includes RAG etc features.
is there a way to connect to an API endpoint and run evals?
The text was updated successfully, but these errors were encountered: