Skip to content

Commit

Permalink
[15/N][RAG Demo fixes] [dev] benchmarking {gpt3.5,gpt4} X {generate_V…
Browse files Browse the repository at this point in the history
…1, generate_V2} (#1206)

[15/N][RAG Demo fixes] [dev] benchmarking {gpt3.5,gpt4} X {generate_V1,
generate_V2}

5 trials each. With GPT4, generate prompt v2 does slightly better on the
benchmark.

GPT3.5, V1
<img width="873" alt="v1-gpt3 5"
src="https://github.com/lastmile-ai/aiconfig/assets/148090348/22b40ef8-b015-4185-8fcd-00dee4e295f3">


GPT3.5, V2
<img width="862" alt="v2-gpt-3 5"
src="https://github.com/lastmile-ai/aiconfig/assets/148090348/73dca37d-c696-4d31-b604-febcfc89fbce">


GPT4, V1
<img width="879" alt="v1-gpt4"
src="https://github.com/lastmile-ai/aiconfig/assets/148090348/8b5b7f0c-cefb-4332-90fe-d26a2a0bbf2d">


GPT4, V2

<img width="870" alt="v2-gpt4"
src="https://github.com/lastmile-ai/aiconfig/assets/148090348/3a7f1825-a5ce-4c11-9c7e-04135b22b94a">

---
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/lastmile-ai/aiconfig/pull/1206).
* __->__ #1206
* #1204
* #1203
* #1202
* #1200
  • Loading branch information
saqadri committed Feb 9, 2024
2 parents f8d802d + da52f4a commit e6193b3
Show file tree
Hide file tree
Showing 2 changed files with 1,883 additions and 125 deletions.
73 changes: 67 additions & 6 deletions cookbooks/RAG-with-Model-Graded-Eval-v2/rag.aiconfig.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -76,16 +76,61 @@ metadata:
remember_chat_context: false
name: Rag Demo With Model-graded Eval
prompts:
- input: "Answer the following question using the provided context. Question: {{query}}\
\ \nContext: {{context}}"
- input: "Answer the following question using the provided context. \n\nQuestion:\
\ {{query}} \nContext: {{context}}"
metadata:
model: gpt-3.5-turbo
model:
name: gpt-4
settings: {}
parameters: {}
remember_chat_context: false
name: generate_v1
outputs:
- data: The price of flour sold in Boston was $11.87 a barrel in August.
execution_count: 0
metadata:
created: 1707439800
id: chatcmpl-8q9NY0YI6dDvzeFEebWIQZaOyLPwQ
model: gpt-3.5-turbo-16k-0613
object: chat.completion.chunk
raw_response:
content: The price of flour sold in Boston was $11.87 a barrel in August.
role: assistant
role: assistant
output_type: execute_result
- input: "Answer the following question using the provided context. \n\nQuestion:\
\ {{query}} \nContext: {{context}}\n\nReview your answer and the context. Is the\
\ answer strictly justified by the context? If necessary, revise your answer.\n\
\nReview your best answer so far. Is it succinct, or does it contain extra information?\
\ If necessary, revise your answer.\n\nReview your best answer so far again. Does\
\ it actually answer the question? If necessary, revise your answer again.\n\n\
Review your best answer so far one last time. Is it easy to unerstand? If necessary,\
\ reviews your answer a final time.\n\n\nOutput ONLY YOUR BEST ANSWER.\n\nBEST\
\ ANSWER:"
metadata:
model:
name: gpt-3.5-turbo-16k-0613
settings: {}
parameters: {}
name: generate_v2
outputs: []
- input: "Answer the following question using the provided context. \n\nQuestion:\
\ {{query}} \nContext: {{context}}\n\nReview your answer and the context. Is the\
\ answer strictly justified by the context? If necessary, revise your answer.\n\
\nReview your best answer so far. Is it succinct, or does it contain extra information?\
\ If necessary, revise your answer.\n\nReview your best answer so far again. Does\
\ it actually answer the question? If necessary, revise your answer again.\n\n\
Review your best answer so far one last time. Is it easy to unerstand? If necessary,\
\ reviews your answer a final time.\n\n\nOutput ONLY YOUR BEST ANSWER.\n\nBEST\
\ ANSWER:"
metadata:
model: gpt-4
parameters: {}
name: generate
outputs: []
- input: "Given the following question, and answer, does the answer satisfactorily\
\ answer the question? \n\nQuestion: {{query}}\nAnswer: {{generate.output}}"
\ answer the question? \n\nGive a relevance verdict (YES or NO) with an explanation.\n\
\nQuestion: {{query}}\nAnswer: {{generate.output}}\n\nVerdict:\nExplanation:"
metadata:
model: gpt-4
parameters: {}
Expand All @@ -103,15 +148,31 @@ prompts:
name: evaluate_faithfulness
outputs: []
- input: 'Given the following answer, is the answer self-consistent and easy to understand?
Answer: {{generate.output}}'
Answer: {{generate.output}}
Give a relevance verdict (YES or NO) with an explanation.
Verdict:
Explanation:'
metadata:
model: gpt-4
parameters: {}
remember_chat_context: false
name: evaluate_coherence
outputs: []
- input: 'Given the following answer, is the answer self-consistent and easy to understand?
Answer: {{generate.output}}'
Answer: {{generate.output}}
Give a succinctness verdict (YES or NO) with an explanation.
Verdict:
Explanation:'
metadata:
model: gpt-4
parameters: {}
Expand Down
Loading

0 comments on commit e6193b3

Please sign in to comment.