Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Truncated output with local backend? #125

Open
remmen-io opened this issue Jul 9, 2024 · 2 comments
Open

Truncated output with local backend? #125

remmen-io opened this issue Jul 9, 2024 · 2 comments

Comments

@remmen-io
Copy link

remmen-io commented Jul 9, 2024

➜ aiac --version
aiac version 5.2.1

We are using local backends provided by huggingface TGI

[backends.phi3]
type = "openai"
default_model = "Phi-3"
url = "https://phi3.ourcluster/v1"

[backends.deepseek2]
type = "openai"
default_model = "DeepSeekv2"
url = "https://deepseekv2.ourcluster/v1"

Invoking aiac results in a truncated response

➜ aiac dockerfile for nodejs with comments
```Dockerfile
# Use the official Node.js 14 image as a parent image
FROM node:14

# Set the working directory in the container
WORKDIR /usr/src/app

# Copy package.json and package-lock.json (if available) to the working directory
COPY package*.json ./

# Install project dependencies
RUN npm install

# Copy the rest of the application code to the working directory

[S/s]: save and exit
[W/w]: save and chat
[C/c]: continue chatting
[R/r]: retry same prompt
[Y/y]: copy to clipboard
[Q/q]: quit
Choice: c
✔ New message: continue█
# Copy the rest of the application code to the working directory
COPY . .

# Build the application if necessary (e.g., if using a build step)
RUN npm run build

# Expose the port the app runs on
EXPOSE 3000

# Define the command to run the app
CMD ["npm", "start"]

This Dockerfile is designed for a Node.js application that listens on

[S/s]: save and exit
[W/w]: save and chat
[C/c]: continue chatting
[R/r]: retry same prompt
[Y/y]: copy to clipboard
[Q/q]: quit
Choice: c
✔ New message: continue█
# Define the command to run the app
CMD ["npm", "start"]

# Add a health check to ensure the application is running properly
HEALTHCHECK --interval=30s --timeout=30s --start-period=5s --retries=3 \
  CMD curl -f http://localhost:3000/ || exit 1

# Optional: Set up a volume for persistent data
VOLUME [

[S/s]: save and exit
[W/w]: save and chat
[C/c]: continue chatting
[R/r]: retry same prompt
[Y/y]: copy to clipboard
[Q/q]: quit
✔ Choice: q█

The endpoint provides information about the ma_tokens, but I guess they are not used?

{"model_id":"/models-cache/deepseek-coder-v2-lite","model_sha":null,"model_dtype":"torch.float16","model_device_type":"cuda","model_pipeline_tag":null,"max_concurrent_requests":128,"max_best_of":2,"max_stop_sequences":4,"max_input_tokens":4096,"max_total_tokens":6124,"waiting_served_ratio":0.3,"max_batch_total_tokens":16000,"max_waiting_tokens":20,"max_batch_size":null,"validation_workers":2,"max_client_batch_size":4,"router":"text-generation-router","version":"2.1.1","sha":"4dfdb481fb1f9cf31561c056061d693f38ba4168","docker_label":"sha-4dfdb48"}

Is it somehow possible to add/modify the max_new_tokens parameter?

Like curl -N https://deepseekv2.mycluster/generate -X POST -d '{"inputs":"dockerfile for nodejs with comments?","parameters":{"max_new_tokens":200}}' -H 'Content-Type: application/json'

@ido50
Copy link
Collaborator

ido50 commented Jul 10, 2024

I'm not familiar with Huggingface, but I see that it implements the OpenAI API (although your examples seem to be its own API?) with a relatively small default for "max_tokens". I suppose we can expose other parameters too, do you think it would make sense though to add something like max_tokens to the backend configuration, rather than doing this on a per-request basis (e.g. as a flag in the CLI or a parameter in the library)? Which would make more sense for your use case?

@remmen-io
Copy link
Author

Hi @ido50
Unfortunately no as TGI currently is not supporting this on the server side

There is a open issue: huggingface/text-generation-inference#870

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants