Replies: 6 comments 8 replies
-
Ask Bing to dumb it down. I'm about to 🤣 now if only the rest of the world could explain things to me like I'm an adorable genius 6 year old. |
Beta Was this translation helpful? Give feedback.
-
Bing says "Okay, let me try to explain this like you are 6 years old. Imagine you have a very smart friend who can do many things with words and pictures. You can ask your friend to write stories, poems, jokes, or anything else you want. Your friend is very good at this and can make you happy. But sometimes your friend makes mistakes or says things that are not true or nice. For example, your friend might tell you that 2 + 2 = 5, or that cats can fly, or that you are a bad person. This can make you confused or sad. You want your friend to be better and not make these mistakes. You also want your friend to listen to you and do what you ask them to do. So you decide to give your friend some tests to see how well they can do different things. For example, you might ask your friend to write a poem about dogs, or to draw a picture of a house, or to tell you what color is the sky. You also tell your friend what kind of answers are good and what kind of answers are bad. You hope that by giving these tests to your friend, they will learn from their mistakes and become smarter and nicer. You also hope that other people who have similar friends can use these tests too and share their results with each other. This is what OpenAI Evals is: a way for people to give tests to their smart friends (who are actually computer programs) and see how well they can do different things with words and pictures. OpenAI Evals helps people make their smart friends better and safer for everyone." |
Beta Was this translation helpful? Give feedback.
-
These people answering are like when you want a specific answer but chat gpt keeps dancing arround the answer. |
Beta Was this translation helpful? Give feedback.
-
Are evals only used as a signal to see where a model is weak and does OpenAI train the model separately given a known weakness or are they also a part of the training? |
Beta Was this translation helpful? Give feedback.
-
Can it evaluate also some other models than OpenAI's? |
Beta Was this translation helpful? Give feedback.
-
Maybe I'm just dumb, but after reading the Readme I don't understand. Can someone walk through and explain the process?
The rough idea I've gathered is it's a way to "rate" GPT's outputs, more or less? And the "evals" are a script/system you create to automate this, or to rate a specific type of prompt output using some rating system you choose/create? Sort of like creating your own benchmark for certain use cases?
Beta Was this translation helpful? Give feedback.
All reactions