-
Notifications
You must be signed in to change notification settings - Fork 31.6k
[DOCS] Add example for HammingDiversityLogitsProcessor #25481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 10 commits
ab75556
5f3eb85
0b666a3
ad6ceb6
f54e125
bfb1536
5651226
4764308
a9c5b32
c7a9ae8
0b9528c
aa51216
3584a53
8b50875
b11f536
a21a3ab
aef619c
f6c301b
d34c714
a29cbee
e252405
33c682c
ba56724
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1085,20 +1085,134 @@ def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor) -> to | |
|
|
||
| class HammingDiversityLogitsProcessor(LogitsProcessor): | ||
| r""" | ||
| [`LogitsProcessor`] that enforces diverse beam search. Note that this logits processor is only effective for | ||
| [`PreTrainedModel.group_beam_search`]. See [Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence | ||
| Models](https://arxiv.org/pdf/1610.02424.pdf) for more details. | ||
| [`LogitsProcessor`] that enforces diverse beam search. | ||
|
|
||
| Note that this logits processor is only effective for [`PreTrainedModel.group_beam_search`]. See [Diverse Beam | ||
| Search: Decoding Diverse Solutions from Neural Sequence Models](https://arxiv.org/pdf/1610.02424.pdf) for more | ||
| details. | ||
|
|
||
| <Tip> | ||
|
|
||
| Diverse beam search can be particularly useful in scenarios where a variety of different outputs is desired, rather | ||
| than multiple similar sequences. It allows the model to explore different generation paths and provides a broader | ||
| coverage of possible outputs. | ||
|
|
||
| </Tip> | ||
|
|
||
| <Warning> | ||
jessthebp marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| This logits processor can be resource-intensive, especially when using large models or long sequences. | ||
|
|
||
| </Warning> | ||
jessthebp marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| Traditional beam search often generates very similar sequences across different beams. | ||
|
|
||
| The `HammingDiversityLogitsProcessor` addresses this by penalizing beams that generate tokens already chosen by | ||
| other beams in the same time step. | ||
|
|
||
| How It Works: | ||
| - **Grouping Beams**: Beams are divided into groups. Each group selects tokens independently of the others. | ||
| - **Penalizing Repeated Tokens**: If a beam in a group selects a token already chosen by another group in the same | ||
| step, a penalty is applied to that token's score. | ||
| - **Promoting Diversity**: This penalty discourages beams within a group from selecting the same tokens as beams in | ||
| other groups. | ||
|
|
||
| Benefits: | ||
| - **Diverse Outputs**: Produces a variety of different sequences. | ||
| - **Exploration**: Allows the model to explore different paths. | ||
|
|
||
| Args: | ||
| diversity_penalty (`float`): | ||
| This value is subtracted from a beam's score if it generates a token same as any beam from other group at a | ||
| particular time. Note that `diversity_penalty` is only effective if `group beam search` is enabled. | ||
| -- The penalty applied to a beam's score when it generates a token that has already been chosen | ||
| by another beam within the same group during the same time step. | ||
| -- A higher `diversity_penalty` will enforce greater diversity among the beams, | ||
| making it less likely for multiple beams to choose the same token. | ||
| -- Conversely, a lower penalty will allow beams to more freely choose similar tokens. -- Adjusting | ||
| this value can help strike a balance between diversity and natural likelihood. | ||
jessthebp marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| num_beams (`int`): | ||
| Number of beams used for group beam search. See [this paper](https://arxiv.org/pdf/1610.02424.pdf) for more | ||
| details. | ||
| -- Beam search is a method used that maintains beams (or "multiple hypotheses") at each step, | ||
jessthebp marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| expanding each one and keeping the top-scoring sequences. | ||
| -- A higher `num_beams` will explore more potential sequences -- This can increase chances of | ||
| finding a high-quality output but also increases computational cost. | ||
| num_beam_groups (`int`): | ||
| Number of groups to divide `num_beams` into in order to ensure diversity among different groups of beams. | ||
| See [this paper](https://arxiv.org/pdf/1610.02424.pdf) for more details. | ||
| -- Each group of beams will operate independently, selecting tokens without considering the choices | ||
| of other groups. -- This division promotes diversity by ensuring that beams within different groups | ||
| explore different paths. -- For instance, if `num_beams` is 6 and `num_beam_groups` is 2, there | ||
| will be 2 groups each containing 3 beams. -- The choice of `num_beam_groups` should be made | ||
jessthebp marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| considering the desired level of output diversity and the total number of beams. | ||
|
|
||
|
|
||
| Example: the below example shows a comparison before and after applying Hamming Diversity. | ||
|
|
||
| ```python | ||
| from transformers import AutoTokenizer, AutoModelForSeq2SeqLM | ||
|
||
|
|
||
| # Initialize the model and tokenizer | ||
| tokenizer = AutoTokenizer.from_pretrained("t5-base") | ||
| model = AutoModelForSeq2SeqLM.from_pretrained("t5-base") | ||
|
|
||
| # Input variable is a long text about space: | ||
|
|
||
| text = "The Solar System is a gravitationally bound system comprising the Sun and the objects that orbit it, either directly or indirectly. Of the objects that orbit the Sun directly, the largest are the eight planets, with the remainder being smaller objects, such as the five dwarf planets and small Solar System bodies. The Solar System formed 4.6 billion years ago from the gravitational collapse of a giant interstellar molecular cloud." | ||
|
|
||
| # Prepare the input | ||
| encoder_input_str = "summarize: " + text | ||
| encoder_input_ids = tokenizer(encoder_input_str, return_tensors="pt").input_ids | ||
|
|
||
| # Set the parameters for diverse beam search | ||
| num_beams = 8 # higher is more diverse | ||
| num_beam_groups = 4 # 4 groups of 2 beams will explore 4*2=8 beams (=num_beams). by separating the beams into groups and applying penalties within groups, the model is encouraged to explore different sequence possibilities in each group | ||
| diversity_penalty = 5.5 # enforces diversity among different groups of beams, discourages beams within a group from selecting the same tokens | ||
|
|
||
| # Generate three diverse summaries using the `generate` method | ||
| outputs_diverse = model.generate( | ||
| encoder_input_ids, | ||
| max_length=100, | ||
| num_beams=num_beams, | ||
| num_beam_groups=num_beam_groups, | ||
| diversity_penalty=diversity_penalty, | ||
| no_repeat_ngram_size=2, | ||
| early_stopping=True, | ||
| num_return_sequences=3, | ||
| ) | ||
|
|
||
| # Generate two non-diverse summaries | ||
| outputs_non_diverse = model.generate( | ||
| encoder_input_ids, | ||
| max_length=100, | ||
| num_beams=num_beams, | ||
| no_repeat_ngram_size=2, | ||
| early_stopping=True, | ||
| num_return_sequences=2, | ||
| ) | ||
|
|
||
| # Decode and print the summaries | ||
| summaries_diverse = tokenizer.batch_decode(outputs_diverse, skip_special_tokens=True) | ||
| summaries_non_diverse = tokenizer.batch_decode(outputs_non_diverse, skip_special_tokens=True) | ||
|
|
||
| # Print the results | ||
| print("Diverse Summaries:") | ||
| for summary in summaries_diverse: | ||
| print(summary) | ||
| # summary 1: the solar system formed 4.6 billion years ago from the collapse of a giant interstellar molecular cloud | ||
| # summary 2: the solar system formed 4.6 billion years ago from the collapse of a giant interstellar molecular cloud. of the objects that orbit the Sun directly, the largest are the eight planets, says john mccartney jr. | ||
| # summary 3: solar system formed 4.6 billion years ago from collapse of interstellar molecular cloud. largest of the eight planets orbit the Sun directly, with the remainder being smaller objects, such as dwarf planet and small solar System bodies - nicolaus mills-simons: the largest are the dwarf worlds and the solar systems' bodies. | ||
|
|
||
| print("\nNon-Diverse Summaries:") | ||
| for summary in summaries_non_diverse: | ||
| print(summary) | ||
| # summary 1: the solar system formed 4.6 billion years ago from the collapse of a giant interstellar molecular cloud. | ||
| # summary 2: the solar system formed 4.6 billion years ago from the collapse of a giant interstellar molecular cloud. | ||
| ``` | ||
| For more details, see [Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence | ||
|
||
| Models](https://arxiv.org/pdf/1610.02424.pdf). | ||
|
|
||
| """ | ||
|
|
||
| def __init__(self, diversity_penalty: float, num_beams: int, num_beam_groups: int): | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.