fix: Fix deprecated max_tokens param in openai ChatCompletionRequest #3122

mickqian · 2025-01-25T07:19:21Z

Motivation

Address #3098

Modifications

replace the deprecated max_tokens param with a newer one: max_completion_tokens, according to here

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling.

merrymercy · 2025-01-26T02:30:53Z

Can you only update the adapter and not change other parts?

ywang96 · 2025-01-26T03:55:16Z

Much thanks for addressing #3098 so quickly and FYI I've tested this branch and it does resolve the issue!

mickqian · 2025-01-26T07:36:58Z

Can you only update the adapter and not change other parts?

Updated. Removed the frontend part

zhaochenyang20 · 2025-02-03T04:51:10Z

@mickqian mick I will take a look on this. Thanks!

zhaochenyang20 · 2025-02-03T04:56:14Z

python/sglang/lang/ir.py

wondering what's the definition of max_tokens in non-chat model? And why we do not keep max_tokens for non-chat models in the backend?

the def of max_tokens in completion models is tokens that can be generated in the completion, pretty much the same as max_completion_tokens

Yes we should keep both params, updated

Hey. For embedding model, “max_tokens” means the max sequence length that can be processed. What if it exceeds the length? Should the sequence be truncated or throw an error? The chat model also. And, I personally think we should call it generation model and embedding model. That's what we typically call these models.

for embedding models, I think both ways will do, depending on the situation. Providing an option sounds good too

Yes, completion model and chat completion model fall into the category of generation model, I thought you were referring to completion model. The is_chat_model is used to distinguish different generation models, if I'm correct. For sglang.lang, does it involve embedding models(or did I miss something?)? If not, probably is_chat_model would suffice for generation models in backend

Okay. This should be sound to me. But, in my ideal:

def to_openai_kwargs(self, is_chat_model):

You mean is_chat_model is an element of class OpenAI(BaseBackend) in python/sglang/lang/backend/openai.py rather than an element of class SglSamplingParams in python/sglang/lang/ir.py ?

So this function is not:

def to_openai_kwargs(self):

Right? That make sense. But I prefer the latter one if we can.

yes it's supposed to be an internal property of a BaseBackend( as it directly describes the model ), and passed to SglSamplingParams for generating openai-compatible request.
While adding this field to SglSamplingParams do sounds good in some cases, I personally reckon SglSamplingParams is meant to be a model-unaware data, which can be sent to different backends, and let the actual backend decides the final openai request? Feel free to correct me.

I agree! Nice work.

zhaochenyang20

I think we should comment on the protocal.py or anywhere regarding the definition of max_completion_tokens and the difference before previous max_tokens.
Could you run the docs CI locally, just make compile is enough. Current docs CI is closed due to long queue time to compile on CI. But we should run it locally.

zhaochenyang20 · 2025-02-03T19:49:59Z

python/sglang/lang/ir.py

Okay. This should be sound to me. But, in my ideal:

def to_openai_kwargs(self, is_chat_model):

You mean is_chat_model is an element of class OpenAI(BaseBackend) in python/sglang/lang/backend/openai.py rather than an element of class SglSamplingParams in python/sglang/lang/ir.py ?

So this function is not:

def to_openai_kwargs(self):

Right? That make sense. But I prefer the latter one if we can.

mickqian · 2025-02-04T10:08:30Z

I think we should comment on the protocal.py or anywhere regarding the definition of max_completion_tokens and the difference before previous max_tokens.

Could you run the docs CI locally, just make compile is enough. Current docs CI is closed due to long queue time to compile on CI. But we should run it locally.

make compile failed even in main branch:

$ jupyter nbconvert --to notebook --execute --inplace ./backend/openai_api_completions.ipynb \
                                --ExecutePreprocessor.timeout=600 \
                                --ExecutePreprocessor.kernel_name=python3 || exit 1; 
                                
...
AttributeError                            Traceback (most recent call last)
Cell In[9], line 58
     52 batch_details = client.batches.retrieve(batch_id=batch_job.id)
     54 print_highlight(
     55     f"Batch job details (check {i+1} / {max_checks}) // ID: {batch_details.id} // Status: {batch_details.status} // Created at: {batch_details.created_at} // Input file ID: {batch_details.input_file_id} // Output file ID: {batch_details.output_file_id}"
     56 )
     57 print_highlight(
---> 58     f"<strong>Request counts: Total: {batch_details.request_counts.total} // Completed: {batch_details.request_counts.completed} // Failed: {batch_details.request_counts.failed}</strong>"
     59 )
     61 time.sleep(3)

AttributeError: 'NoneType' object has no attribute 'total'

Is it a known issue, or the error is from my side?

mickqian · 2025-02-04T12:55:17Z

I think we should comment on the protocal.py or anywhere regarding the definition of max_completion_tokens and the difference before previous max_tokens.

Could you run the docs CI locally, just make compile is enough. Current docs CI is closed due to long queue time to compile on CI. But we should run it locally.

make compile failed even in main branch:

$ jupyter nbconvert --to notebook --execute --inplace ./backend/openai_api_completions.ipynb \
                                --ExecutePreprocessor.timeout=600 \
                                --ExecutePreprocessor.kernel_name=python3 || exit 1; 
                                
...
AttributeError                            Traceback (most recent call last)
Cell In[9], line 58
     52 batch_details = client.batches.retrieve(batch_id=batch_job.id)
     54 print_highlight(
     55     f"Batch job details (check {i+1} / {max_checks}) // ID: {batch_details.id} // Status: {batch_details.status} // Created at: {batch_details.created_at} // Input file ID: {batch_details.input_file_id} // Output file ID: {batch_details.output_file_id}"
     56 )
     57 print_highlight(
---> 58     f"<strong>Request counts: Total: {batch_details.request_counts.total} // Completed: {batch_details.request_counts.completed} // Failed: {batch_details.request_counts.failed}</strong>"
     59 )
     61 time.sleep(3)

AttributeError: 'NoneType' object has no attribute 'total'

Is it a known issue, or the error is from my side?

Fixed, and make compile test passed locally.

zhaochenyang20 · 2025-02-04T17:24:40Z

Will review it today. Stay tuned!

zhaochenyang20

Good improvment. I am wondering do openai still have max tokens for chat API right now? For chat completion api, there should only be max completion tokens. But I don't know what's for chat API.

docs/Makefile

zhaochenyang20 · 2025-02-04T20:59:19Z

python/sglang/lang/ir.py

I agree! Nice work.

zhaochenyang20 · 2025-02-04T21:00:53Z

python/sglang/srt/openai_api/protocol.py

Make it a full sentence: Non-chat-completion models only have max tokens.

So chat completion models count the max_completion_tokens and chat models (not completion models) count the max_token right?

There's a nuance difference between non-chat-completion models only and Non-chat-completion models only have max tokens, I'm afraid. Changed to Only available for non-chat-completion models, is that ok?

yes, to be more specific, in openai's legacy completion api(non-chat completion models only), they only have max_tokens. In their chat-completion api, they have both two params, but:

python/sglang/srt/openai_api/protocol.py

mickqian · 2025-02-05T02:21:46Z

Good improvment. I am wondering do openai still have max tokens for chat API right now? For chat completion api, there should only be max completion tokens. But I don't know what's for chat API.

replied in here

python/sglang/srt/openai_api/protocol.py

zhaochenyang20 · 2025-03-04T08:06:17Z

@mickqian This should be rebased. Thanks 😂

zhaochenyang20 · 2025-03-04T18:05:32Z

@shuaills Could you review this? Thnaks!

shuaills · 2025-03-15T07:18:24Z

LGTM @zhaochenyang20 Can you help rerun the CI?

zhaochenyang20 · 2025-03-15T16:51:27Z

@mickqian @shuaills keep this CI here. Shuai, plz approve at https://github.com/sgl-project/sglang/pull/3122/files

I will rerun the CI

mickqian · 2025-03-18T01:59:58Z

@zhaochenyang20

zhaochenyang20 · 2025-03-18T02:26:28Z

@mickqian will help to merge after the CI

zhaochenyang20 · 2025-03-18T02:36:02Z

https://github.com/sgl-project/sglang/actions/runs/13913981893/job/38933560619?pr=3122

@mickqian this should be fixed

Replace it with a newer one: max_completion_tokens

zhaochenyang20 · 2025-03-18T18:04:13Z

@mickqian I've rerun the CI. Thanks so much. Do not rebase with main. We can merge this after the CI.

CatherineSue · 2025-04-15T18:29:12Z

is this PR gonna be merged soon?

zhyncs · 2025-04-15T18:31:16Z

Can you only update the adapter and not change other parts?

@mickqian Can you do this first? Use other parts for different PRs. Do not mix them together.

mickqian requested review from ByronHsu, Ying1123, hnyls2002, ispobock, merrymercy and zhyncs as code owners January 25, 2025 07:19

mickqian force-pushed the fix-max-tokens branch 4 times, most recently from 403eb6c to eed4b36 Compare January 25, 2025 08:31

yizhang2077 mentioned this pull request Jan 25, 2025

[Bug] Qwen2-VL Online Serving Issue #3098

Closed

5 tasks

mickqian force-pushed the fix-max-tokens branch from eed4b36 to 91f7dba Compare January 26, 2025 06:26

mickqian force-pushed the fix-max-tokens branch from 91f7dba to b48b900 Compare January 29, 2025 01:13

zhaochenyang20 requested changes Feb 3, 2025

View reviewed changes

mickqian force-pushed the fix-max-tokens branch from b48b900 to b80b526 Compare February 3, 2025 05:13

zhaochenyang20 requested changes Feb 3, 2025

View reviewed changes

mickqian force-pushed the fix-max-tokens branch from b80b526 to 82c817b Compare February 4, 2025 12:53

mickqian force-pushed the fix-max-tokens branch from 82c817b to e9aa94a Compare February 4, 2025 12:56

zhaochenyang20 requested changes Feb 4, 2025

View reviewed changes

mickqian force-pushed the fix-max-tokens branch 3 times, most recently from 10ce5fd to db732ae Compare February 5, 2025 02:39

CatherineSue reviewed Mar 4, 2025

View reviewed changes

python/sglang/srt/openai_api/protocol.py Outdated Show resolved Hide resolved

mickqian force-pushed the fix-max-tokens branch 4 times, most recently from 473fa26 to 353c258 Compare March 4, 2025 10:19

mickqian force-pushed the fix-max-tokens branch from 692e269 to f132950 Compare March 15, 2025 06:58

shuaills approved these changes Mar 18, 2025

View reviewed changes

mickqian force-pushed the fix-max-tokens branch from 10c16b3 to 01dab60 Compare March 18, 2025 02:21

fix: Fix deprecated max_tokens param in openai ChatCompletionRequest

aeba666

Replace it with a newer one: max_completion_tokens

mickqian force-pushed the fix-max-tokens branch from 01dab60 to aeba666 Compare March 18, 2025 10:07

CatherineSue mentioned this pull request Apr 28, 2025

Support max_completion_tokens for OpenAIChatCompletions #5857

Merged

6 tasks

mickqian closed this May 10, 2025

mickqian deleted the fix-max-tokens branch August 23, 2025 06:55

fix: Fix deprecated max_tokens param in openai ChatCompletionRequest #3122

fix: Fix deprecated max_tokens param in openai ChatCompletionRequest #3122

Uh oh!

Conversation

mickqian commented Jan 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Checklist

Uh oh!

merrymercy commented Jan 26, 2025

Uh oh!

ywang96 commented Jan 26, 2025

Uh oh!

mickqian commented Jan 26, 2025

Uh oh!

zhaochenyang20 commented Feb 3, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mickqian Feb 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mickqian Feb 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhaochenyang20 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mickqian commented Feb 4, 2025

Uh oh!

mickqian commented Feb 4, 2025

Uh oh!

zhaochenyang20 commented Feb 4, 2025

Uh oh!

zhaochenyang20 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mickqian Feb 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mickqian commented Feb 5, 2025

Uh oh!

Uh oh!

zhaochenyang20 commented Mar 4, 2025

Uh oh!

zhaochenyang20 commented Mar 4, 2025

Uh oh!

shuaills commented Mar 15, 2025

Uh oh!

zhaochenyang20 commented Mar 15, 2025

Uh oh!

mickqian commented Mar 18, 2025

Uh oh!

zhaochenyang20 commented Mar 18, 2025

Uh oh!

zhaochenyang20 commented Mar 18, 2025

Uh oh!

zhaochenyang20 commented Mar 18, 2025

Uh oh!

CatherineSue commented Apr 15, 2025

mickqian commented Jan 25, 2025 •

edited

Loading

mickqian Feb 3, 2025 •

edited

Loading

mickqian Feb 3, 2025 •

edited

Loading

mickqian Feb 5, 2025 •

edited

Loading