Skip to content

Commit

Permalink
typeform link in readme (#329)
Browse files Browse the repository at this point in the history
  • Loading branch information
nihit authored Jun 15, 2023
1 parent bfa3294 commit a64dbe5
Show file tree
Hide file tree
Showing 2 changed files with 20 additions and 0 deletions.
10 changes: 10 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -152,6 +152,16 @@ output_df.head()
4. [Confidence estimation](https://docs.refuel.ai/guide/accuracy/confidence/) and explanations out of the box for every single output label
5. [Caching and state management](https://docs.refuel.ai/guide/reliability/state-management/) to minimize costs and experimentation time

## Access to Refuel hosted LLMs

Refuel provides access to hosted open source LLMs for labeling, and for estimating confidence This is helpful, because you can calibrate a confidence threshold for your labeling task, and then route less confident labels to humans, while you still get the benefits of auto-labeling for the confident examples.

In order to use Refuel hosted LLMs, you can [request access here](https://refuel-ai.typeform.com/llm-access).

## Benchmark

Check out our [technical report](https://www.refuel.ai/blog-posts/llm-labeling-technical-report) to learn more about the performance of various LLMs, and human annoators, on label quality, turnaround time and cost.

## 🛠️ Roadmap
Our goal is to allow users to label, create or enrich any dataset, with any LLM - easily and quickly.

Expand Down
10 changes: 10 additions & 0 deletions docs/guide/llms/benchmarks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@

## Benchmarking LLMs for data labeling


Key takeaways from our [technical report](https://www.refuel.ai/blog-posts/llm-labeling-technical-report):

* State of the art LLMs can label text datasets at the same or better quality compared to skilled human annotators, **but ~20x faster and ~7x cheaper**.
* For achieving the highest quality labels, GPT-4 is the best choice among out of the box LLMs (88.4% agreement with ground truth, compared to 86% for skilled human annotators).
* For achieving the best tradeoff between label quality and cost, GPT-3.5-turbo, PaLM-2 and open source models like FLAN-T5-XXL are compelling.
* Confidence based thresholding can be a very effective way to mitigate impact of hallucinations and ensure high label quality.

0 comments on commit a64dbe5

Please sign in to comment.