Beam search decoding during inference doesn't generate good text. #265

fabrahman · 2019-12-29T05:58:36Z

Hi,

I have trained a model using Reinforcement learning.
When I use "beam search" to generate text, it generates all

"raeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraera"

However, when I use greedy or topk sampling the generation is like:

Sam was watching a movie. He was very focused on the action. He fell asleep. Sam's glasses fell off his face <|endoftext|>eraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraeraera

I used the tx.utils.strip_eos to strip anything after <|endoftext|>.

1- I am not sure why beam search is performing this way? I would appreciate your help. following is my piece of code for doing decoding using beam search:

    def _infer_beam_ids(context_name):
        # beam-search
        predictions = decoder(
            beam_width=10,
            length_penalty=config_train.length_penalty,
            embedding=_embedding_fn,
            context=batch['%s_ids' % context_name],
            context_sequence_length=batch['%s_len' % context_name],
            max_decoding_length=max_decoding_length,
            end_token=end_token,
            mode=tf.estimator.ModeKeys.PREDICT)

        beam_output_ids = tx.utils.varlength_roll(predictions["sample_id"][:, :, 0], -batch['%s_len' % context_name], axis=1)

        return beam_output_ids

    beam_search_ids = _infer_beam_ids('x1')

2- Is it better to use beam search for a model which is trained in a self-critical fashion, right?

I would appreciate if you can help me with these.

The text was updated successfully, but these errors were encountered:

fabrahman · 2020-01-07T01:39:00Z

Hi,
Anyone has a thought on this?
In another experiment, I used my trained model to generate with beam search and it generates the same output for different inputs. And it's weird that the greedy result is good but not the beam search.
Am I correctly calling the beam decoding method?

jchwenger · 2020-01-24T15:21:45Z

That is in fact a feature of beam search, see this discussion, this implementation and this paper! Temperature-based random sampling and/or top_p (nucleus) sampling are in my experience always preferable to beam search.

The root cause of the failure of beam search is that 1) a repetitive sequence will have a higher probability than any other, since the more you repeat, the more likely the next token will be (from the perspective of the network), and so will be chosen by beam search; 2) if you ask a network to assign probabilities to human text the distribution is actually highly irregular (not the most likely sentence, but a stream where some steps are extremely likely, others extremely random). Lovely graphs and explanations in the paper!

fabrahman · 2020-01-24T17:30:22Z

That is in fact a feature of beam search, see this discussion, this implementation and this paper! Temperature-based random sampling and/or top_p (nucleus) sampling are in my experience always preferable to beam search.

The root cause of the failure of beam search is that 1) a repetitive sequence will have a higher probability than any other, since the more you repeat, the more likely the next token will be (from the perspective of the network), and so will be chosen by beam search; 2) if you ask a network to assign probabilities to human text the distribution is actually highly irregular (not the most likely sentence, but a stream where some steps are extremely likely, others extremely random). Lovely graphs and explanations in the paper!

@jchwenger Thanks for you reply. I understand and I agree that sampling methods works much better. But this performance that I reported here is not accepted from beam search, it didn't even generate anything meaningful. Beside, for any input it generates the same thing.
Also greedy decoding is working fine, so isn't it weird that beam search cannot generate anything?
I am thinking maybe there is some issue with my way of using it or the implementation.
Also, I heard and saw in many papers that when using Self-critical reinforcement learning, it's better to use beam at inference.

jchwenger · 2020-01-24T17:43:50Z

My pleasure! From the network's perspective meaningful is not particularly relevant. If it's a character-level or bpe model this repetition of characters over and over might still be the one with the highest probability from the model's perspective, and that is, out of all possible outputs of the network, what beam search will attempt to pick. Beyond that, however, and how beam search is successfully used in other papers, I'm afraid I can't help you.

gpengzhi added the question Further information is requested label Jan 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Beam search decoding during inference doesn't generate good text. #265

Beam search decoding during inference doesn't generate good text. #265

fabrahman commented Dec 29, 2019

fabrahman commented Jan 7, 2020 •

edited

Loading

jchwenger commented Jan 24, 2020

fabrahman commented Jan 24, 2020

jchwenger commented Jan 24, 2020

Beam search decoding during inference doesn't generate good text. #265

Beam search decoding during inference doesn't generate good text. #265

Comments

fabrahman commented Dec 29, 2019

fabrahman commented Jan 7, 2020 • edited Loading

jchwenger commented Jan 24, 2020

fabrahman commented Jan 24, 2020

jchwenger commented Jan 24, 2020

fabrahman commented Jan 7, 2020 •

edited

Loading