Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ChatLLaMA] RLHF Training: Prompt too long #299

Open
swang99 opened this issue Mar 25, 2023 · 8 comments
Open

[ChatLLaMA] RLHF Training: Prompt too long #299

swang99 opened this issue Mar 25, 2023 · 8 comments
Labels
bug Something isn't working chatllama Issue related to the ChatLLaMA module

Comments

@swang99
Copy link

swang99 commented Mar 25, 2023

I am getting the following error when doing RLHF training. I decreased the max_sequence_length in my actor configuration to 1024 because there were errors with training for me when set to 2048. Is my actor max_sequence_length too small, and does this mean I have to redo pre-training with a larger max sequence? There isn't a way to change the state_length to my knowledge.

ValueError: The prompt is too long w.r.t the model sequence length
max_sequence_length=1024
state_length=1024
min_tokens=100
max_tokens=2048
max_generation_possible=0

@swang99 swang99 changed the title [ChatLlama] RLFH Training: Prompt too long [ChatLlama] RLHF Training: Prompt too long Mar 26, 2023
@diegofiori
Copy link
Collaborator

Hi @swang99, thank you for reaching out! What model are you currently using?

@diegofiori diegofiori changed the title [ChatLlama] RLHF Training: Prompt too long [ChatLLaMA] RLHF Training: Prompt too long Mar 26, 2023
@diegofiori diegofiori added chatllama Issue related to the ChatLLaMA module bug Something isn't working labels Mar 26, 2023
@PierpaoloSorbellini
Copy link
Collaborator

Hi @swang99, thanks for reaching out.
When doing RLHF the same sequence gets propagated to all the models, so I would recommend:

  1. first to check that all the models in your config share the same max_seq_length in the config.yaml to avoid problems.
  2. Second you can try to increase the additonal_prompt_tokens a bit if the problem still persist.
    If you still having the problem I would be helpful to report the config.yaml so that we can investigate further.

@swang99
Copy link
Author

swang99 commented Mar 26, 2023

Thanks for the recommendations. The error still persists unfortunately. Can I simply increase the additional_prompt_tokens or would I need to save a new actor model? Below is my config.yaml

trainer_config:
  # learning rates
  actor_lr: 0.000005
  critic_lr: 0.000009
  # PPO Hyperparameters
  actor_eps_clip: 0.2
  critic_eps_clip: 0.2
  beta_s: 0.02
  # coefficient for the discounted rewards
  gamma_discounted: 1 
  # path to examples to be sampled (training dataset) see rlhf_dataset.json
  examples_path: "./datasets/rlhf_training_data.json"
  # number of episodes and generation performed for each episode
  # in the train() method
  num_episodes: 100
  max_timesteps: 32
  # number of timesteps after which the learn() method is called 
  # (to update the weights)
  update_timesteps: 32
  # number of example sampled at each timestep
  num_examples: 1
  # batch and epochs for the training
  batch_size: 1
  epochs: 1
  # number of episodes after which update the checkpoints in RL training
  checkpoint_steps: 10
  # here specify the name of the actor_rl checkpoint from which resume 
  # during actor RL training. If null load the last one.
  checkpoint_name: null

actor_config:
  model: "facebook/opt-1.3b"
  model_folder: "./models"
  tokenizer_path: "path-to-tokenizer"
  train_dataset_path: "./datasets/actor_training_data.json"
  validation_dataset_path: null
  # froze model embedding during training
  froze_embeddings: True
  # use fairscale layers to build the model instead of vanilla pytorch
  # only for llama
  use_fairscale: False
  # max sequence length for the actor (i.e. prompt + completion) it depends on
  # the model used.
  max_sequence_length: 1024
  # max tokens generated by the actor (completion only)
  max_tokens: 2048
  # minimum number of tokens generated by the actor
  min_tokens: 100
  # additional prompt tokens to be used for template or as safety
  additonal_prompt_tokens: 100
  # temperature for the actor
  temperature: 0.1
  batch_size: 1
  # number iteration after print
  iteration_per_print: 10
  lr: 0.000009
  epochs: 1
  # number of backpropagation after saving the checkpoints
  checkpoint_steps: 3000
  # number of checkpoints to keep while removing the older 
  # (keep memory consumption of checkpoints reasonable)
  n_checkpoints_to_keep: 2
  # here specify the name of the actor checkpoint from which resume 
  # during actor training. If null load the last one.
  checkpoint_name: null
  # deepspeed settings
  deepspeed_enable: False
  deepspeed_config_path: "./artifacts/config/ds_config.json"
  # accelerate settings
  accelerate_enable: False

reward_config:
  # model to be chosen are gp2-large, bart-base, longformer-base-4096
  # more can be simply added in the reward.py __init__()
  model: "facebook/opt-125m"
  model_folder: "./models"
  # hidden size of the additional ffw head to produce the scores
  model_head_hidden_size: 2048
  max_sequence_length: 1024
  train_dataset_path: "./datasets/reward_training_data.json"
  validation_dataset_path: null
  batch_size: 8
  epochs: 32
  iteration_per_print: 1
  # steps after which the checkpoint are saved
  checkpoint_steps: 10000
  # here specify the name of the reward checkpoint from which resume 
  # during reward training. If null load the last one.
  checkpoint_name: null
  lr: 0.000009
  # deepspeed settings
  deepspeed_enable: False
  deepspeed_config_path: "./artifacts/config/ds_config.json"
  # accelerate settings
  accelerate_enable: False

critic_config:
  # model to be chosen are gp2-large, bart-base, longformer-base-4096
  # more can be simply added in the reward.py __init__()
  model: "facebook/opt-125m"
  # hidden size of the additional ffw head to produce the scores
  model_head_hidden_size: 2048
  max_sequence_length: 1024
  model_folder: "./models"
  # here specify the name of the critic checkpoint from which resume 
  # during critic training. If null load the last one.
  checkpoint_name: null

@PierpaoloSorbellini
Copy link
Collaborator

Hi @swang99
I will test it more details in the following days and let you now!

@PierpaoloSorbellini
Copy link
Collaborator

Hi @swang99 .
I have found the problem and should have been fixed in the PR #306
Let me know if you still have the same issue!

@swang99
Copy link
Author

swang99 commented Apr 6, 2023

Hi @PierpaoloSorbellini thank you for rolling out the fixes. This might not be very specific, but although I was able to get further into training, around the 9th timestep the training stopped suddently due to a loss is NaN error. Has this been addressed in the past?

@Mialiu91
Copy link

Hi @PierpaoloSorbellini thank you for rolling out the fixes. This might not be very specific, but although I was able to get further into training, around the 9th timestep the training stopped suddently due to a loss is NaN error. Has this been addressed in the past?

I have the same problem. Did you fixed it?

@PierpaoloSorbellini
Copy link
Collaborator

PierpaoloSorbellini commented Apr 14, 2023

Hi @Mialiu91 @swang99,
Yes problem should be fixed in #306 soon to be merged.
Now before starting the training a method for checking the dataset is implemented.
Inside this method the None elements are removed from the dataset to avoid this error.

if isinstance(config, ConfigReward):
  cnt = 0
  while cnt < len(conversations):
      if conversations[cnt]["score"] is None:
          conversations.pop(cnt)
      cnt = cnt + 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working chatllama Issue related to the ChatLLaMA module
Projects
None yet
Development

No branches or pull requests

4 participants