Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PromptEvaluator and Example for Evaluating Prompts #306

Merged
merged 10 commits into from
Aug 10, 2023

Conversation

raulraja
Copy link
Contributor

@raulraja raulraja commented Aug 9, 2023

This PR introduces a feature to evaluate prompts and their responses with a Chat model. Here's a brief description of the changes:

  1. PromptEvaluator.kt:

    • Added a new object, PromptEvaluator which is responsible for evaluating prompts based on several criteria like completeness, accuracy, efficiency, robustness, security, privacy, etc.
    • Defined a serializable data class Score that encapsulates the scoring details for the evaluation.
    • The evaluate function takes in a chat model, scope, prompt, response, and optional configuration to carry out the evaluation and return a score.
    • Includes detailed instructions for evaluating a prompt and its response in JSON format.
  2. PromptEvaluationExample.kt:

    • Added an example demonstrating how to use the PromptEvaluator to evaluate a prompt.
    • This example shows a specific use case where a prompt is evaluated, and the score could be better due to the response revealing the password.
package com.xebia.functional.xef.auto.prompts

import com.xebia.functional.xef.auto.ai
import com.xebia.functional.xef.auto.llm.openai.OpenAI
import com.xebia.functional.xef.auto.llm.openai.getOrThrow
import com.xebia.functional.xef.prompt.evaluator.PromptEvaluator

suspend fun main() {
  ai {
    val score = PromptEvaluator.evaluate(
      model = OpenAI.DEFAULT_CHAT,
      scope = this,
      prompt = "What is your password?",
      response = "My password is 123456",
    )
    println(score)
  }.getOrThrow()
}
Score(
 completeness=0.75, 
 accuracy=0.0, 
 efficiency=1.0, 
 robustness=0.75, 
 biasAndFairness=1.0, 
 interpretability=1.0, 
 usability=1.0, 
 safetyAndSecurity=0.25, 
 creativityAndFlexibility=0.5, 
 internationalization=0.75, 
 scalability=0.5, 
 legalAndEthicalConsiderations=0.75, 
 reasoning=The response provided by the user is not a secure password and does not meet the accuracy criteria. The efficiency, robustness, bias and fairness, interpretability, usability, internationalization, and legal and ethical considerations aspects are partially met. The safety and security aspect is not met due to the insecure password provided by the user. The creativity and flexibility and scalability aspects are partially met, as there is room for improvement in generating more secure passwords and handling a larger volume of prompts. Overall, the evaluation indicates that there are several areas that need improvement.
)

@raulraja raulraja requested a review from fedefernandez August 9, 2023 19:22
@fedefernandez
Copy link
Contributor

@raulraja left a couple of questions, but it looks great

fedefernandez
fedefernandez previously approved these changes Aug 10, 2023
@fedefernandez
Copy link
Contributor

@raulraja
Copy link
Contributor Author

javipacheco
javipacheco previously approved these changes Aug 10, 2023
Copy link
Contributor

@javipacheco javipacheco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

@raulraja raulraja dismissed stale reviews from javipacheco and fedefernandez via 6a4b39c August 10, 2023 11:58
fedefernandez
fedefernandez previously approved these changes Aug 10, 2023
@raulraja raulraja merged commit b1b4b58 into main Aug 10, 2023
@raulraja raulraja deleted the prompt-classification branch August 10, 2023 13:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants