[ChatLLaMA] RLHF Training: Prompt too long #299

swang99 · 2023-03-25T23:59:13Z

I am getting the following error when doing RLHF training. I decreased the max_sequence_length in my actor configuration to 1024 because there were errors with training for me when set to 2048. Is my actor max_sequence_length too small, and does this mean I have to redo pre-training with a larger max sequence? There isn't a way to change the state_length to my knowledge.

ValueError: The prompt is too long w.r.t the model sequence length
max_sequence_length=1024
state_length=1024
min_tokens=100
max_tokens=2048
max_generation_possible=0

diegofiori · 2023-03-26T09:31:07Z

Hi @swang99, thank you for reaching out! What model are you currently using?

PierpaoloSorbellini · 2023-03-26T10:33:37Z

Hi @swang99, thanks for reaching out.
When doing RLHF the same sequence gets propagated to all the models, so I would recommend:

first to check that all the models in your config share the same max_seq_length in the config.yaml to avoid problems.
Second you can try to increase the additonal_prompt_tokens a bit if the problem still persist.
If you still having the problem I would be helpful to report the config.yaml so that we can investigate further.

swang99 · 2023-03-26T14:29:57Z

Thanks for the recommendations. The error still persists unfortunately. Can I simply increase the additional_prompt_tokens or would I need to save a new actor model? Below is my config.yaml

trainer_config:
  # learning rates
  actor_lr: 0.000005
  critic_lr: 0.000009
  # PPO Hyperparameters
  actor_eps_clip: 0.2
  critic_eps_clip: 0.2
  beta_s: 0.02
  # coefficient for the discounted rewards
  gamma_discounted: 1 
  # path to examples to be sampled (training dataset) see rlhf_dataset.json
  examples_path: "./datasets/rlhf_training_data.json"
  # number of episodes and generation performed for each episode
  # in the train() method
  num_episodes: 100
  max_timesteps: 32
  # number of timesteps after which the learn() method is called 
  # (to update the weights)
  update_timesteps: 32
  # number of example sampled at each timestep
  num_examples: 1
  # batch and epochs for the training
  batch_size: 1
  epochs: 1
  # number of episodes after which update the checkpoints in RL training
  checkpoint_steps: 10
  # here specify the name of the actor_rl checkpoint from which resume 
  # during actor RL training. If null load the last one.
  checkpoint_name: null

actor_config:
  model: "facebook/opt-1.3b"
  model_folder: "./models"
  tokenizer_path: "path-to-tokenizer"
  train_dataset_path: "./datasets/actor_training_data.json"
  validation_dataset_path: null
  # froze model embedding during training
  froze_embeddings: True
  # use fairscale layers to build the model instead of vanilla pytorch
  # only for llama
  use_fairscale: False
  # max sequence length for the actor (i.e. prompt + completion) it depends on
  # the model used.
  max_sequence_length: 1024
  # max tokens generated by the actor (completion only)
  max_tokens: 2048
  # minimum number of tokens generated by the actor
  min_tokens: 100
  # additional prompt tokens to be used for template or as safety
  additonal_prompt_tokens: 100
  # temperature for the actor
  temperature: 0.1
  batch_size: 1
  # number iteration after print
  iteration_per_print: 10
  lr: 0.000009
  epochs: 1
  # number of backpropagation after saving the checkpoints
  checkpoint_steps: 3000
  # number of checkpoints to keep while removing the older 
  # (keep memory consumption of checkpoints reasonable)
  n_checkpoints_to_keep: 2
  # here specify the name of the actor checkpoint from which resume 
  # during actor training. If null load the last one.
  checkpoint_name: null
  # deepspeed settings
  deepspeed_enable: False
  deepspeed_config_path: "./artifacts/config/ds_config.json"
  # accelerate settings
  accelerate_enable: False

reward_config:
  # model to be chosen are gp2-large, bart-base, longformer-base-4096
  # more can be simply added in the reward.py __init__()
  model: "facebook/opt-125m"
  model_folder: "./models"
  # hidden size of the additional ffw head to produce the scores
  model_head_hidden_size: 2048
  max_sequence_length: 1024
  train_dataset_path: "./datasets/reward_training_data.json"
  validation_dataset_path: null
  batch_size: 8
  epochs: 32
  iteration_per_print: 1
  # steps after which the checkpoint are saved
  checkpoint_steps: 10000
  # here specify the name of the reward checkpoint from which resume 
  # during reward training. If null load the last one.
  checkpoint_name: null
  lr: 0.000009
  # deepspeed settings
  deepspeed_enable: False
  deepspeed_config_path: "./artifacts/config/ds_config.json"
  # accelerate settings
  accelerate_enable: False

critic_config:
  # model to be chosen are gp2-large, bart-base, longformer-base-4096
  # more can be simply added in the reward.py __init__()
  model: "facebook/opt-125m"
  # hidden size of the additional ffw head to produce the scores
  model_head_hidden_size: 2048
  max_sequence_length: 1024
  model_folder: "./models"
  # here specify the name of the critic checkpoint from which resume 
  # during critic training. If null load the last one.
  checkpoint_name: null

PierpaoloSorbellini · 2023-03-30T07:52:00Z

Hi @swang99
I will test it more details in the following days and let you now!

PierpaoloSorbellini · 2023-04-03T14:47:44Z

Hi @swang99 .
I have found the problem and should have been fixed in the PR #306
Let me know if you still have the same issue!

swang99 · 2023-04-06T05:14:22Z

Hi @PierpaoloSorbellini thank you for rolling out the fixes. This might not be very specific, but although I was able to get further into training, around the 9th timestep the training stopped suddently due to a loss is NaN error. Has this been addressed in the past?

Mialiu91 · 2023-04-13T08:41:48Z

Hi @PierpaoloSorbellini thank you for rolling out the fixes. This might not be very specific, but although I was able to get further into training, around the 9th timestep the training stopped suddently due to a loss is NaN error. Has this been addressed in the past?

I have the same problem. Did you fixed it?

PierpaoloSorbellini · 2023-04-14T13:55:44Z

Hi @Mialiu91 @swang99,
Yes problem should be fixed in #306 soon to be merged.
Now before starting the training a method for checking the dataset is implemented.
Inside this method the None elements are removed from the dataset to avoid this error.

if isinstance(config, ConfigReward):
  cnt = 0
  while cnt < len(conversations):
      if conversations[cnt]["score"] is None:
          conversations.pop(cnt)
      cnt = cnt + 1

swang99 changed the title ~~[ChatLlama] RLFH Training: Prompt too long~~ [ChatLlama] RLHF Training: Prompt too long Mar 26, 2023

diegofiori changed the title ~~[ChatLlama] RLHF Training: Prompt too long~~ [ChatLLaMA] RLHF Training: Prompt too long Mar 26, 2023

diegofiori added chatllama Issue related to the ChatLLaMA module bug Something isn't working labels Mar 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ChatLLaMA] RLHF Training: Prompt too long #299

[ChatLLaMA] RLHF Training: Prompt too long #299

swang99 commented Mar 25, 2023

diegofiori commented Mar 26, 2023

PierpaoloSorbellini commented Mar 26, 2023

swang99 commented Mar 26, 2023 •

edited by diegofiori

Loading

PierpaoloSorbellini commented Mar 30, 2023

PierpaoloSorbellini commented Apr 3, 2023

swang99 commented Apr 6, 2023 •

edited

Loading

Mialiu91 commented Apr 13, 2023

PierpaoloSorbellini commented Apr 14, 2023 •

edited

Loading

[ChatLLaMA] RLHF Training: Prompt too long #299

[ChatLLaMA] RLHF Training: Prompt too long #299

Comments

swang99 commented Mar 25, 2023

diegofiori commented Mar 26, 2023

PierpaoloSorbellini commented Mar 26, 2023

swang99 commented Mar 26, 2023 • edited by diegofiori Loading

PierpaoloSorbellini commented Mar 30, 2023

PierpaoloSorbellini commented Apr 3, 2023

swang99 commented Apr 6, 2023 • edited Loading

Mialiu91 commented Apr 13, 2023

PierpaoloSorbellini commented Apr 14, 2023 • edited Loading

swang99 commented Mar 26, 2023 •

edited by diegofiori

Loading

swang99 commented Apr 6, 2023 •

edited

Loading

PierpaoloSorbellini commented Apr 14, 2023 •

edited

Loading