Can someone explain like I'm 5 what these "evals" are #4

ThioJoe · 2023-03-14T17:42:50Z

ThioJoe
Mar 14, 2023

Maybe I'm just dumb, but after reading the Readme I don't understand. Can someone walk through and explain the process?

The rough idea I've gathered is it's a way to "rate" GPT's outputs, more or less? And the "evals" are a script/system you create to automate this, or to rate a specific type of prompt output using some rating system you choose/create? Sort of like creating your own benchmark for certain use cases?

HackTuah · 2023-03-14T17:55:48Z

HackTuah
Mar 14, 2023

Ask Bing to dumb it down. I'm about to 🤣 now if only the rest of the world could explain things to me like I'm an adorable genius 6 year old.

2 replies

crystalwizard Mar 14, 2023

from the GPT-4 release page "We’re also open-sourcing OpenAI Evals, our framework for automated evaluation of AI model performance, to allow anyone to report shortcomings in our models to help guide further improvements."

crystalwizard Mar 14, 2023

it's a benchmarking tool for any AI model

HackTuah · 2023-03-14T17:57:58Z

HackTuah
Mar 14, 2023

Bing says "Okay, let me try to explain this like you are 6 years old.

Imagine you have a very smart friend who can do many things with words and pictures. You can ask your friend to write stories, poems, jokes, or anything else you want. Your friend is very good at this and can make you happy.

But sometimes your friend makes mistakes or says things that are not true or nice. For example, your friend might tell you that 2 + 2 = 5, or that cats can fly, or that you are a bad person. This can make you confused or sad.

You want your friend to be better and not make these mistakes. You also want your friend to listen to you and do what you ask them to do. So you decide to give your friend some tests to see how well they can do different things.

For example, you might ask your friend to write a poem about dogs, or to draw a picture of a house, or to tell you what color is the sky. You also tell your friend what kind of answers are good and what kind of answers are bad.

You hope that by giving these tests to your friend, they will learn from their mistakes and become smarter and nicer. You also hope that other people who have similar friends can use these tests too and share their results with each other.

This is what OpenAI Evals is: a way for people to give tests to their smart friends (who are actually computer programs) and see how well they can do different things with words and pictures. OpenAI Evals helps people make their smart friends better and safer for everyone."

0 replies

placcaumuhire · 2023-03-14T21:21:10Z

placcaumuhire
Mar 14, 2023

1 reply

placcaumuhire Mar 14, 2023

From ChatGPT! and Placca The Plug!

Luka5-A8ram · 2023-04-10T14:53:24Z

Luka5-A8ram
Apr 10, 2023

These people answering are like when you want a specific answer but chat gpt keeps dancing arround the answer.
If someone could make an step by step guide of how to make and post an eval, it would be really helpful.

2 replies

Prakkkmak Apr 18, 2023

+1 I don't get it :(

MMAcode Apr 21, 2023

It helped me to look on existingb pull requests, like this one: #749

eugene-kim-pipe17 · 2023-04-23T15:22:06Z

eugene-kim-pipe17
Apr 23, 2023

Are evals only used as a signal to see where a model is weak and does OpenAI train the model separately given a known weakness or are they also a part of the training?

1 reply

RogerThiede May 26, 2023

I haven't seen if this has been fully disclosed by OpenAI yet. These evals could be used for evaluation of a base model or fine-tuned model, but could theoretically also be used to fine-tune a model to perform these exact tasks and result in even better evaluation.

Bennoo · 2023-06-09T08:52:19Z

Bennoo
Jun 9, 2023

Can it evaluate also some other models than OpenAI's?
I would like to compare performance on a specific task between GPT and something else. Is it the right tool?

2 replies

AaronGoldsmith Jun 10, 2023

Definitely seems like a reasonable use case.

https://github.com/openai/evals/blob/main/docs/completion-fns.md
(See last example on the page - non GPT LLM)

I feel like model graded evals should be able to handle this as well.
https://github.com/openai/evals/blob/main/evals/registry/modelgraded/battle.yaml

nat-ski Jun 21, 2023

Has anyone actually used a different model that doesn't have a hosted inference API? For example, the other model used in the last example of non-GPT LLM is google/flan-t5-xl. If you actually try running oaieval langchain/llm/flan-t5-xl test-match, you'll get a timeout error because the model is too big to be loaded into the hosted inference API. I get that some models that are too large could used hosted inferences endpoints from hugging face, but is there documentation anywhere of pointing the evals package to a local hosting on some GPU compute instances of a non-GPT model?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can someone explain like I'm 5 what these "evals" are #4

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 6 comments 8 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Can someone explain like I'm 5 what these "evals" are #4

Replies: 6 comments · 8 replies

Replies: 6 comments 8 replies