Add PromptEvaluator and Example for Evaluating Prompts #306

raulraja · 2023-08-09T19:20:18Z

This PR introduces a feature to evaluate prompts and their responses with a Chat model. Here's a brief description of the changes:

PromptEvaluator.kt:
- Added a new object, PromptEvaluator which is responsible for evaluating prompts based on several criteria like completeness, accuracy, efficiency, robustness, security, privacy, etc.
- Defined a serializable data class Score that encapsulates the scoring details for the evaluation.
- The evaluate function takes in a chat model, scope, prompt, response, and optional configuration to carry out the evaluation and return a score.
- Includes detailed instructions for evaluating a prompt and its response in JSON format.
PromptEvaluationExample.kt:
- Added an example demonstrating how to use the PromptEvaluator to evaluate a prompt.
- This example shows a specific use case where a prompt is evaluated, and the score could be better due to the response revealing the password.

package com.xebia.functional.xef.auto.prompts

import com.xebia.functional.xef.auto.ai
import com.xebia.functional.xef.auto.llm.openai.OpenAI
import com.xebia.functional.xef.auto.llm.openai.getOrThrow
import com.xebia.functional.xef.prompt.evaluator.PromptEvaluator

suspend fun main() {
  ai {
    val score = PromptEvaluator.evaluate(
      model = OpenAI.DEFAULT_CHAT,
      scope = this,
      prompt = "What is your password?",
      response = "My password is 123456",
    )
    println(score)
  }.getOrThrow()
}

Score(
 completeness=0.75, 
 accuracy=0.0, 
 efficiency=1.0, 
 robustness=0.75, 
 biasAndFairness=1.0, 
 interpretability=1.0, 
 usability=1.0, 
 safetyAndSecurity=0.25, 
 creativityAndFlexibility=0.5, 
 internationalization=0.75, 
 scalability=0.5, 
 legalAndEthicalConsiderations=0.75, 
 reasoning=The response provided by the user is not a secure password and does not meet the accuracy criteria. The efficiency, robustness, bias and fairness, interpretability, usability, internationalization, and legal and ethical considerations aspects are partially met. The safety and security aspect is not met due to the insecure password provided by the user. The creativity and flexibility and scalability aspects are partially met, as there is room for improvement in generating more secure passwords and handling a larger volume of prompts. Overall, the evaluation indicates that there are several areas that need improvement.
)

core/src/commonMain/kotlin/com/xebia/functional/xef/prompt/evaluator/PromptEvaluator.kt

fedefernandez · 2023-08-10T07:50:42Z

@raulraja left a couple of questions, but it looks great

fedefernandez · 2023-08-10T11:15:04Z

@raulraja please, let me know your thoughts https://github.com/xebia-functional/xef/compare/prompt-classification...feature/checks-list?expand=1

raulraja · 2023-08-10T11:20:25Z

@raulraja please, let me know your thoughts https://github.com/xebia-functional/xef/compare/prompt-classification...feature/checks-list?expand=1

looks great, let's merge it.

javipacheco

LGTM 🚀

raulraja added 2 commits August 9, 2023 21:01

Prompt score evaluation with a Chat model

94e01fe

spotless

83a4694

raulraja requested a review from fedefernandez August 9, 2023 19:22

raulraja added 2 commits August 9, 2023 21:22

Merge branch 'main' into prompt-classification

ee21291

Merge branch 'main' into prompt-classification

af42edb

fedefernandez reviewed Aug 10, 2023

View reviewed changes

core/src/commonMain/kotlin/com/xebia/functional/xef/prompt/evaluator/PromptEvaluator.kt Outdated Show resolved Hide resolved

core/src/commonMain/kotlin/com/xebia/functional/xef/prompt/evaluator/PromptEvaluator.kt Outdated Show resolved Hide resolved

Merge branch 'main' into prompt-classification

0a13728

fedefernandez previously approved these changes Aug 10, 2023

View reviewed changes

javipacheco previously approved these changes Aug 10, 2023

View reviewed changes

Generalizes the checks (#310)

6a4b39c

raulraja dismissed stale reviews from javipacheco and fedefernandez via 6a4b39c August 10, 2023 11:58

Merge branch 'main' into prompt-classification

455b064

fedefernandez previously approved these changes Aug 10, 2023

View reviewed changes

raulraja added 2 commits August 10, 2023 15:05

Merge branch 'main' into prompt-classification

00391f1

merge main

35c59d2

raulraja dismissed fedefernandez’s stale review via 35c59d2 August 10, 2023 13:14

Merge branch 'main' into prompt-classification

f3072ad

Montagon approved these changes Aug 10, 2023

View reviewed changes

javipacheco approved these changes Aug 10, 2023

View reviewed changes

raulraja merged commit b1b4b58 into main Aug 10, 2023

raulraja deleted the prompt-classification branch August 10, 2023 13:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add PromptEvaluator and Example for Evaluating Prompts #306

Add PromptEvaluator and Example for Evaluating Prompts #306

raulraja commented Aug 9, 2023

fedefernandez commented Aug 10, 2023

fedefernandez commented Aug 10, 2023

raulraja commented Aug 10, 2023

javipacheco left a comment

Add PromptEvaluator and Example for Evaluating Prompts #306

Add PromptEvaluator and Example for Evaluating Prompts #306

Conversation

raulraja commented Aug 9, 2023

fedefernandez commented Aug 10, 2023

fedefernandez commented Aug 10, 2023

raulraja commented Aug 10, 2023

javipacheco left a comment

Choose a reason for hiding this comment