Replies: 3 comments
-
yes i agree |
Beta Was this translation helpful? Give feedback.
0 replies
-
@usama-openai could we get an eval run, along w/ log containing accuracy for each eval, at each model release? And put the log files in a repo? I would prefer to not burn my own cash to see where eval performance is currently at for each of the models/versions. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
And which evals are currently failing? Maybe we could get a dashboard or something which gets updated once a week?
Beta Was this translation helpful? Give feedback.
All reactions