[Tokenizer] Add an option to specify tokenizer by WoosukKwon · Pull Request #284 · vllm-project/vllm

WoosukKwon · 2023-06-28T05:28:45Z

This PR adds tokenizer to the input/cli arguments. If it is None, vLLM uses the model name/path as the tokenizer name/path. In addition, from this PR, vLLM does not use hf-internal-testing/llama-tokenizer as the default tokenizer for llama models.

sleepcoo · 2023-06-28T09:49:00Z

This pr is very useful, my local test is always hard code tokenizer path

zhuohan123

LGTM! Thanks for the great work!

zhuohan123 · 2023-06-28T15:12:12Z

+    if "llama" in tokenizer_name.lower() and kwargs.get("use_fast", True):
        logger.info(
-            "OpenLLaMA models do not support the fast tokenizer. "
-            "Using the slow tokenizer instead.")
-    elif config.model_type == "llama" and kwargs.get("use_fast", True):
-        # LLaMA fast tokenizer causes protobuf errors in some environments.
-        # However, we found that the below LLaMA fast tokenizer works well in
-        # most environments.
-        model_name = "hf-internal-testing/llama-tokenizer"
-        logger.info(
-            f"Using the LLaMA fast tokenizer in '{model_name}' to avoid "
-            "potential protobuf errors.")
-    elif config.model_type in _MODEL_TYPES_WITH_SLOW_TOKENIZER:
-        if kwargs.get("use_fast", False) == True:
-            raise ValueError(
-                f"Cannot use the fast tokenizer for {config.model_type} due to "
-                "bugs in the fast tokenizer.")
-        logger.info(
-            f"Using the slow tokenizer for {config.model_type} due to bugs in "
-            "the fast tokenizer. This could potentially lead to performance "
-            "degradation.")
-        kwargs["use_fast"] = False
-    return AutoTokenizer.from_pretrained(model_name, *args, **kwargs)
+            "For some LLaMA-based models, initializing the fast tokenizer may "
+            "take a long time. To eliminate the initialization time, consider "
+            f"using '{_FAST_LLAMA_TOKENIZER}' instead of the original "
+            "tokenizer.")


After this PR, do we need to manually specify llama to use the fast tokenizer for benchmarking?

It depends. Actually, LLaMA fast tokenizers in lmsys/vicuna-7b-v1.3 or huggyllama/llama-7b work in my docker environment. So hf-internal-testing/llama-tokenizer is not needed when I use vLLM in my docker environment.

929359291 · 2023-06-29T02:07:55Z

wow is cool，bro you are so cool

sunyuhan19981208 · 2023-06-29T09:53:45Z

THANKS VERY MUCH!

SUMMARY: * only run 4 x a10 tests for python 3.10.12 NOTE: AWS looks to be having availability issues with these instances. i'm day to day with this repo being migrated to GCP, so in the meantime let's reduce demand. TEST PLAN: n/a Co-authored-by: andy-neuma <andy@neuralmagic.com>

* fix cuda compilation * checkout tuned gemm from develop

Sync to upstream's [v0.11.0](https://github.com/vllm-project/vllm/releases/tag/v0.11.0) release + a cherry pick of vllm-project#24768 This PR targets CUDA but may also be sufficient for ROCM. Dockerfile updates: - general updates to match upstream's Dockerfile - nvcc, nvrtc and cuobjdump were addded for deepgemm JIT requirementes: neuralmagic/nm-vllm-ent@2a545c8 - missing paths were added for triton JIT: neuralmagic/nm-vllm-ent@b3027fc Tests: Branch in nm-cicd: https://github.com/neuralmagic/nm-cicd/tree/sync-v0.11-cuda accept-sync: https://github.com/neuralmagic/nm-cicd/actions/runs/18270550524 -- please ignore unit tests, they need to be updated to v1. Image tested: quay.io/vllm/automation-vllm:cuda-18270550524 Image validation: https://github.com/neuralmagic/nm-cicd/actions/runs/18271507914 Whisper runs: https://github.com/neuralmagic/nm-cicd/actions/runs/18281815955/job/52046560584 https://github.com/neuralmagic/nm-cicd/actions/runs/18281511979

(cherry picked from commit 0c8ef2a) Signed-off-by: Salar <skhorasgani@tenstorrent.com>

WoosukKwon added 10 commits June 28, 2023 04:19

Move to transformers_utils

e524636

Fix get_tokenizer

e687ea9

Add tokenizer to input args

f2f832e

info -> warn

c281024

Fix error handling

690763d

Add __init__.py

ca134f4

Minor fix

5bcd98a

Fix get_tokenizer

b908cec

Add tokenizer option

7773614

Fix benchmark scripts

0eced5c

WoosukKwon requested a review from zhuohan123 June 28, 2023 08:31

Merge branch 'main' into custom-tokenizer

561e06d

This was linked to issues Jun 28, 2023

Why vllm does not support Chinese input #246

Closed

How to mannually Set use_fast for tokenizer to False？ #259

Closed

The hf-internal-testing/llama-tokenizer do not support Chinese prompt #270

Closed

WoosukKwon added 2 commits June 28, 2023 08:59

Minor

cc34264

Minor

f9b59ee

zhuohan123 approved these changes Jun 28, 2023

View reviewed changes

WoosukKwon merged commit 4338cc4 into main Jun 28, 2023

WoosukKwon deleted the custom-tokenizer branch June 28, 2023 16:47

michaelfeil pushed a commit to michaelfeil/vllm that referenced this pull request Jul 1, 2023

[Tokenizer] Add an option to specify tokenizer (vllm-project#284)

f8206d5

hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024

[Tokenizer] Add an option to specify tokenizer (vllm-project#284)

2464e93

billishyahao pushed a commit to billishyahao/vllm that referenced this pull request Dec 31, 2024

Cuda compile fix2 (vllm-project#284)

a8a8fe9

* fix cuda compilation * checkout tuned gemm from develop

mickg10 pushed a commit to mickg10/vllm that referenced this pull request Feb 11, 2026

Move link to TT readme to top of main readme (vllm-project#284)

1728d69

(cherry picked from commit 0c8ef2a) Signed-off-by: Salar <skhorasgani@tenstorrent.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Tokenizer] Add an option to specify tokenizer#284

[Tokenizer] Add an option to specify tokenizer#284
WoosukKwon merged 13 commits intomainfrom
custom-tokenizer

WoosukKwon commented Jun 28, 2023 •

edited by zhuohan123

Loading

Uh oh!

sleepcoo commented Jun 28, 2023

Uh oh!

zhuohan123 left a comment

Uh oh!

zhuohan123 Jun 28, 2023

Uh oh!

WoosukKwon Jun 28, 2023

Uh oh!

929359291 commented Jun 29, 2023

Uh oh!

sunyuhan19981208 commented Jun 29, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Conversation

WoosukKwon commented Jun 28, 2023 • edited by zhuohan123 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sleepcoo commented Jun 28, 2023

Uh oh!

zhuohan123 left a comment

Choose a reason for hiding this comment

Uh oh!

zhuohan123 Jun 28, 2023

Choose a reason for hiding this comment

Uh oh!

WoosukKwon Jun 28, 2023

Choose a reason for hiding this comment

Uh oh!

929359291 commented Jun 29, 2023

Uh oh!

sunyuhan19981208 commented Jun 29, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

WoosukKwon commented Jun 28, 2023 •

edited by zhuohan123

Loading