response too little tokens? #529
-
| Hi, I deploy llama-2-7b-chat.ggmlv3.q6_K.bin with llama-cpp-python[server]. Try to access it with OpenAI API, its response body, llama-cpp-python[server] is running in Docker container with following params: It always return little tokens, how can I get the full poem in this case ? Need I set Max_Tokens? And how? Thanks a lot! | 
Beta Was this translation helpful? Give feedback.
      
      
          Answered by
          
            jeffreydevreede
          
      
      
        Jul 28, 2023 
      
    
    Replies: 1 comment
-
| In your curl request you need to set the max_tokens attribute: curl -X 'POST' 
  'http://llama07.server.com/v1/chat/completions' 
  -H 'accept: application/json' 
  -H 'Content-Type: application/json' 
  -d '{
  "max_tokens": 200,
  "messages": [
    {
      "content": "You are a helpful assistant.",
      "role": "system"
    },
    {
      "content": "Write a poem for France?",
      "role": "user"
    }
  ]
}' | 
Beta Was this translation helpful? Give feedback.
                  
                    0 replies
                  
                
            
      Answer selected by
        st01cs
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
        
    
In your curl request you need to set the max_tokens attribute: