-
Notifications
You must be signed in to change notification settings - Fork 3.4k
fix: Fix deprecated max_tokens param in openai ChatCompletionRequest #3122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
403eb6c to
eed4b36
Compare
|
Can you only update the |
|
Much thanks for addressing #3098 so quickly and FYI I've tested this branch and it does resolve the issue! |
eed4b36 to
91f7dba
Compare
Updated. Removed the frontend part |
91f7dba to
b48b900
Compare
|
@mickqian mick I will take a look on this. Thanks! |
python/sglang/lang/ir.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wondering what's the definition of max_tokens in non-chat model? And why we do not keep max_tokens for non-chat models in the backend?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- the def of
max_tokensin completion models is tokens that can be generated in the completion, pretty much the same asmax_completion_tokens - Yes we should keep both params, updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey. For embedding model, “max_tokens” means the max sequence length that can be processed. What if it exceeds the length? Should the sequence be truncated or throw an error? The chat model also. And, I personally think we should call it generation model and embedding model. That's what we typically call these models.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- for embedding models, I think both ways will do, depending on the situation. Providing an option sounds good too
- Yes,
completion modelandchat completion modelfall into the category ofgeneration model, I thought you were referring tocompletion model. Theis_chat_modelis used to distinguish different generation models, if I'm correct. For sglang.lang, does it involve embedding models(or did I miss something?)? If not, probablyis_chat_modelwould suffice for generation models inbackend
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. This should be sound to me. But, in my ideal:
def to_openai_kwargs(self, is_chat_model):You mean is_chat_model is an element of class OpenAI(BaseBackend) in python/sglang/lang/backend/openai.py rather than an element of class SglSamplingParams in python/sglang/lang/ir.py ?
So this function is not:
def to_openai_kwargs(self):Right? That make sense. But I prefer the latter one if we can.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes it's supposed to be an internal property of a BaseBackend( as it directly describes the model ), and passed to SglSamplingParams for generating openai-compatible request.
While adding this field to SglSamplingParams do sounds good in some cases, I personally reckon SglSamplingParams is meant to be a model-unaware data, which can be sent to different backends, and let the actual backend decides the final openai request? Feel free to correct me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree! Nice work.
b48b900 to
b80b526
Compare
zhaochenyang20
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
I think we should comment on the
protocal.pyor anywhere regarding the definition ofmax_completion_tokensand the difference before previousmax_tokens. -
Could you run the docs CI locally, just
make compileis enough. Current docs CI is closed due to long queue time to compile on CI. But we should run it locally.
python/sglang/lang/ir.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. This should be sound to me. But, in my ideal:
def to_openai_kwargs(self, is_chat_model):You mean is_chat_model is an element of class OpenAI(BaseBackend) in python/sglang/lang/backend/openai.py rather than an element of class SglSamplingParams in python/sglang/lang/ir.py ?
So this function is not:
def to_openai_kwargs(self):Right? That make sense. But I prefer the latter one if we can.
$ jupyter nbconvert --to notebook --execute --inplace ./backend/openai_api_completions.ipynb \
--ExecutePreprocessor.timeout=600 \
--ExecutePreprocessor.kernel_name=python3 || exit 1;
...
AttributeError Traceback (most recent call last)
Cell In[9], line 58
52 batch_details = client.batches.retrieve(batch_id=batch_job.id)
54 print_highlight(
55 f"Batch job details (check {i+1} / {max_checks}) // ID: {batch_details.id} // Status: {batch_details.status} // Created at: {batch_details.created_at} // Input file ID: {batch_details.input_file_id} // Output file ID: {batch_details.output_file_id}"
56 )
57 print_highlight(
---> 58 f"<strong>Request counts: Total: {batch_details.request_counts.total} // Completed: {batch_details.request_counts.completed} // Failed: {batch_details.request_counts.failed}</strong>"
59 )
61 time.sleep(3)
AttributeError: 'NoneType' object has no attribute 'total'Is it a known issue, or the error is from my side? |
b80b526 to
82c817b
Compare
Fixed, and |
82c817b to
e9aa94a
Compare
|
Will review it today. Stay tuned! |
zhaochenyang20
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good improvment. I am wondering do openai still have max tokens for chat API right now? For chat completion api, there should only be max completion tokens. But I don't know what's for chat API.
python/sglang/lang/ir.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree! Nice work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Make it a full sentence:
Non-chat-completion models only have max tokens. -
So chat completion models count the
max_completion_tokensand chat models (not completion models) count themax_tokenright?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- There's a nuance difference between
non-chat-completion models onlyandNon-chat-completion models only have max tokens, I'm afraid. Changed toOnly available for non-chat-completion models, is that ok? - yes, to be more specific, in openai's legacy completion api(non-chat completion models only), they only have
max_tokens. In their chat-completion api, they have both two params, but:
replied in here |
10ce5fd to
db732ae
Compare
|
@mickqian This should be rebased. Thanks 😂 |
473fa26 to
353c258
Compare
|
@shuaills Could you review this? Thnaks! |
692e269 to
f132950
Compare
|
LGTM @zhaochenyang20 Can you help rerun the CI? |
|
@mickqian @shuaills keep this CI here. Shuai, plz approve at https://github.com/sgl-project/sglang/pull/3122/files I will rerun the CI |
10c16b3 to
01dab60
Compare
|
@mickqian will help to merge after the CI |
Replace it with a newer one: max_completion_tokens
01dab60 to
aeba666
Compare
|
@mickqian I've rerun the CI. Thanks so much. Do not rebase with main. We can merge this after the CI. |
|
is this PR gonna be merged soon? |
@mickqian Can you do this first? Use other parts for different PRs. Do not mix them together. |
Motivation
Address #3098
Modifications
max_tokensparam with a newer one:max_completion_tokens, according to hereChecklist