Added better functionality for label selection #713

Vaibhav2001 · 2024-02-13T22:54:01Z

Pull Review Summary

Continued from #561. This closes #712

Description

Now we can select the top K labels to pass to the LLM with label_selection, and specify the count though label_selection_count, and a threshold via label_selection_threshold.
Enabled caching for storing embeddings.
Few shot examples will only contain the correct examples which have the provided label. (only works with "fixed" algorithm for now)

Benefits:

Leads to a similar accuracy as base classification, however leads to a better completion rate when num_labels is huge, as the model prediction does not map exactly with any label (mostly because of large context length).
Cost reduction as the number of tokens passed to the final model can be small.

Usage:

config = {
    "task_name": "BankingComplaintsClassification",
    "task_type": "classification",
        ...
        "labels": label_descriptions,
        "label_selection": True,
        "label_selection_count": 10,
        "label_selection_threshold": 0.5,
        ...
    },
}

Can use label descriptions to embed the labels.
"label_selection_count": -1 makes maximum possible labels equal to len(labels). Defaults to min(10, num_lables)
label_selection_threshold enables us to only choose the labels where label_score / topScore > threshold for effective filtering. This value defaults to 0.95 (based on experimentation).

Type of change

Improving existing feature
This change requires a documentation update

Tests

Ran benchmarking tests for different datasets.

Future Work

Do not classify if there is only one selected_label. Needs to change the annotation processing in dataset.process_labels.
Costing for embedding model. However this is typically upper bounded by the base model and is cheaper by a factor of up to 1000.

src/autolabel/labeler.py

src/autolabel/few_shot/label_selector.py

src/autolabel/configs/config.py

rajasbansal

lgtm - after small fixes

src/autolabel/few_shot/label_selector.py

src/autolabel/labeler.py

Vaibhav2001 added 2 commits February 13, 2024 14:33

fixed label selection

750b9e4

fixed label selection

26ccc61

Vaibhav2001 self-assigned this Feb 13, 2024

fixed few shot for label selection

7c87da9

rajasbansal reviewed Feb 16, 2024

View reviewed changes

addressed comments

28bab95

Vaibhav2001 requested a review from rajasbansal February 16, 2024 22:13

rajasbansal approved these changes Feb 16, 2024

View reviewed changes

src/autolabel/few_shot/label_selector.py Outdated Show resolved Hide resolved

src/autolabel/few_shot/label_selector.py Show resolved Hide resolved

src/autolabel/few_shot/label_selector.py Show resolved Hide resolved

src/autolabel/labeler.py Outdated Show resolved Hide resolved

Vaibhav2001 added 2 commits February 16, 2024 14:31

addressed comments

1c60742

addressed comments

27de167

Vaibhav2001 merged commit 0cbc022 into main Feb 16, 2024
2 checks passed

Vaibhav2001 deleted the benchmark branch February 16, 2024 22:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added better functionality for label selection #713

Added better functionality for label selection #713

Vaibhav2001 commented Feb 13, 2024 •

edited

Loading

rajasbansal left a comment

Added better functionality for label selection #713

Added better functionality for label selection #713

Conversation

Vaibhav2001 commented Feb 13, 2024 • edited Loading

Pull Review Summary

Description

Type of change

Tests

Future Work

rajasbansal left a comment

Choose a reason for hiding this comment

Vaibhav2001 commented Feb 13, 2024 •

edited

Loading