Async tokenization using thread pool by njhill · Pull Request #3206 · vllm-project/vllm

njhill · 2024-03-05T15:52:59Z

@Yard1's open PR #2879 uses ray to offload tokenization from the asyncio event loop.

This PR extends that to support using a thread pool instead of ray. Here is the diff showing just the newly added commits (note that I also rebased onto the latest main branch).

The main thing to note is that separate tokenizer instances are used per thread. This is because officially the HF tokenizers are not thread-safe. In practice I think they are unless you're making use of padding/truncation, which we aren't currently but may want to soon (see for example #3144).

njhill · 2024-03-05T17:16:50Z

@Yard1 separate PR to hopefully address the failing Ray distributed CI tests: #3207

rkooo567 · 2024-03-06T02:26:41Z

Q: is using thread pool actually helping performance? I was curious mainly due to GIL (and I suspect tokenizer is using CPU)?

nickshawn · 2024-03-06T09:33:56Z

Q: is using thread pool actually helping performance? I was curious mainly due to GIL (and I suspect tokenizer is using CPU)?

It's true. And maybe we can just use the process pool ProcessPoolExecutor?

njhill · 2024-03-06T16:30:36Z

@rkooo567 @nickshawn most of the huggingface tokenizers (in particular those for the most prominent models) use "fast" rust-based implementations which won't hold the GIL.

Co-Authored-By: Nick Hill <nickhill@us.ibm.com> Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>

njhill · 2024-03-16T18:54:08Z

@Yard1 I've opened #3449 to replace this now that your PR is merged, ptal!

vllm-project#2879 added support for using ray to offload tokenization from the asyncio event loop. This PR extends that to support using a thread pool instead of ray, and makes that the default, with the default pool size determined based on the number of available CPU cores and the tensor parallel size. The main thing to note is that separate tokenizer instances are used per thread. This is because officially the HF tokenizers are not thread-safe. In practice I think they are unless you're making use of padding/truncation, which we aren't currently but may want to soon (see for example vllm-project#3144). Also includes some type hint additions to related parts of the code. This replaces the original PR vllm-project#3206 from before vllm-project#2879 was reworked and merged.

njhill mentioned this pull request Mar 5, 2024

Asynchronous tokenization #2879

Merged

njhill force-pushed the async_tokenization branch from 3d314f0 to 85e01c3 Compare March 5, 2024 16:24

njhill force-pushed the async_tokenization branch from a5851e7 to 8b55b05 Compare March 6, 2024 01:28

Yard1 and others added 4 commits March 7, 2024 17:14

Asynchronous tokenization

57b4196

Also support thread-based tokenizer pools

fe389ac

Fix non-async tokenization

6ee5737

Log when starting tokenizer threads

71d73ed

njhill force-pushed the async_tokenization branch from 8b55b05 to 71d73ed Compare March 8, 2024 01:15

joerunde pushed a commit to IBM/vllm that referenced this pull request Mar 11, 2024

Changes pending from vllm-project/vllm#3206

1810585

joerunde pushed a commit to IBM/vllm that referenced this pull request Mar 11, 2024

Changes pending from vllm-project/vllm#3206

690b523

Co-Authored-By: Nick Hill <nickhill@us.ibm.com> Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>

joerunde pushed a commit to IBM/vllm that referenced this pull request Mar 12, 2024

Changes pending from vllm-project/vllm#3206

3f49eed

Co-Authored-By: Nick Hill <nickhill@us.ibm.com> Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>

njhill mentioned this pull request Mar 16, 2024

[Core] Support thread-based async tokenizer pools #3449

Closed

njhill closed this Mar 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Async tokenization using thread pool#3206

Async tokenization using thread pool#3206
njhill wants to merge 4 commits intovllm-project:mainfrom
njhill:async_tokenization

njhill commented Mar 5, 2024 •

edited

Loading

Uh oh!

njhill commented Mar 5, 2024

Uh oh!

rkooo567 commented Mar 6, 2024

Uh oh!

nickshawn commented Mar 6, 2024

Uh oh!

njhill commented Mar 6, 2024

Uh oh!

njhill commented Mar 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

njhill commented Mar 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

njhill commented Mar 5, 2024

Uh oh!

rkooo567 commented Mar 6, 2024

Uh oh!

nickshawn commented Mar 6, 2024

Uh oh!

njhill commented Mar 6, 2024

Uh oh!

njhill commented Mar 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

njhill commented Mar 5, 2024 •

edited

Loading