[Spyre-Next] Add RowParallelLinear and ColumnParallelLinear(MLP) wrappers by R3hankhan123 · Pull Request #869 · torch-spyre/sendnn-inference

R3hankhan123 · 2026-03-26T12:51:57Z

Description

Add RowParallelLinear and ColumnParallelLinear wrappers for torch-spyre which will act as Up projection and Down Projection in MLP layer

Related Issues

Contributes towards #736

Test Plan

Run the script provided in examples folder to check if output is being generated or not
Run script provided by Thomas Ortner after removing wrapping for other layers and only keeping linear layer

Test Result

Output of examples/torch_spyre_inference.py

(EngineCore pid=25709) INFO 03-24 13:24:30 [parallel_state.py:1716] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 0, EP rank N/A, EPLB rank N/A
(EngineCore pid=25709) INFO 03-24 13:24:30 [cpu_model_runner.py:62] Starting to load model ibm-ai-platform/micro-g3.3-8b-instruct-1b...
(EngineCore pid=25709) WARNING 03-24 13:24:30 [linear.py:165] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=25709) WARNING 03-24 13:24:30 [linear.py:69] SpyreMergedColumnParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=25709) WARNING 03-24 13:24:30 [linear.py:165] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=25709) WARNING 03-24 13:24:30 [linear.py:165] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=25709) WARNING 03-24 13:24:30 [linear.py:69] SpyreMergedColumnParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=25709) WARNING 03-24 13:24:30 [linear.py:165] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=25709) WARNING 03-24 13:24:30 [linear.py:165] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=25709) WARNING 03-24 13:24:30 [linear.py:69] SpyreMergedColumnParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=25709) WARNING 03-24 13:24:30 [linear.py:165] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=25709) WARNING 03-24 13:24:30 [linear.py:165] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=25709) WARNING 03-24 13:24:30 [linear.py:69] SpyreMergedColumnParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=25709) WARNING 03-24 13:24:30 [linear.py:165] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=25709) INFO 03-24 13:24:30 [weight_utils.py:618] No model.safetensors.index.json found in remote.
Loading safetensors checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 15.63it/s]
(EngineCore pid=25709) 
(EngineCore pid=25709) INFO 03-24 13:24:30 [default_loader.py:384] Loading weights took 0.13 seconds
(EngineCore pid=25709) INFO 03-24 13:24:30 [kv_cache_utils.py:1319] GPU KV cache size: 16,507,392 tokens
(EngineCore pid=25709) INFO 03-24 13:24:30 [kv_cache_utils.py:1324] Maximum concurrency for 2,048 tokens per request: 8060.25x
(EngineCore pid=25709) INFO 03-24 13:24:33 [cpu_model_runner.py:73] Warming up model for the compilation...
(EngineCore pid=25709) INFO 03-24 13:25:22 [decorators.py:638] saved AOT compiled function to /home/rehankhan/.cache/vllm/torch_compile_cache/torch_aot_compile/c2955d84292fa86695de8dbd486acee55c6111d9f30c6092f4f227d27e0e5512/rank_0_0/model
(EngineCore pid=25709) INFO 03-24 13:25:34 [monitor.py:76] Initial profiling/warmup run took 11.14 s
(EngineCore pid=25709) INFO 03-24 13:25:34 [cpu_model_runner.py:83] Warming up done.
(EngineCore pid=25709) INFO 03-24 13:25:34 [core.py:283] init engine (profile, create kv cache, warmup model) took 63.47 seconds
(EngineCore pid=25709) WARNING 03-24 13:25:34 [scheduler.py:173] Using custom scheduler class vllm.v1.core.sched.scheduler.Scheduler. This scheduler interface is not public and compatibility may not be maintained.
(EngineCore pid=25709) INFO 03-24 13:25:35 [vllm.py:750] Asynchronous scheduling is disabled.
(EngineCore pid=25709) WARNING 03-24 13:25:35 [vllm.py:806] Inductor compilation was disabled by user settings, optimizations settings that are only active during inductor compilation will be ignored.
(EngineCore pid=25709) INFO 03-24 13:25:35 [platform.py:103] Loading scheduler from: vllm.v1.core.sched.scheduler.Scheduler
(EngineCore pid=25709) WARNING 03-24 13:25:35 [cpu.py:136] VLLM_CPU_KVCACHE_SPACE not set. Using 251.88 GiB for KV cache.
INFO 03-24 13:25:35 [llm.py:391] Supported tasks: ['generate']
=============== GENERATE
Rendering prompts: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 38.33it/s]
Processed prompts: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [01:58<00:00, 39.60s/it, est. speed input: 1.49 toks/s, output: 0.88 toks/s]
Time elaspsed for 20 tokens is 118.88 sec
===============
CompletionOutput(index=0, text='\n\nThe response is a 2-3 page document that describes the task.\n\n###', token_ids=[203, 203, 1318, 1789, 438, 312, 225, 36, 31, 37, 1938, 1825, 688, 18872, 322, 2899, 32, 203, 203, 1482], routed_experts=None, cumulative_logprob=None, logprobs=None, finish_reason=length, stop_reason=None)
CompletionOutput(index=0, text='\n\n1. The user will receive a list of instructions for preparing chicken soup for a family.\n2. The user will receive a list of instructions for preparing chicken soup for a family.\n3. The user will receive a list of instructions for preparing chicken soup for a family.\n', token_ids=[203, 203, 35, 32, 886, 1256, 1098, 7768, 312, 1149, 432, 9400, 436, 1406, 26124, 663, 21217, 31628, 436, 312, 13872, 32, 203, 36, 32, 886, 1256, 1098, 7768, 312, 1149, 432, 9400, 436, 1406, 26124, 663, 21217, 31628, 436, 312, 13872, 32, 203, 37, 32, 886, 1256, 1098, 7768, 312, 1149, 432, 9400, 436, 1406, 26124, 663, 21217, 31628, 436, 312, 13872, 32, 203], routed_experts=None, cumulative_logprob=None, logprobs=None, finish_reason=length, stop_reason=None)
CompletionOutput(index=0, text='\n\nThe user is a ghoul.\n\n### Instruction:\n\nThe user is a', token_ids=[203, 203, 1318, 1256, 438, 312, 28472, 825, 32, 203, 203, 1482, 21081, 44, 203, 203, 1318, 1256, 438, 312], routed_experts=None, cumulative_logprob=None, logprobs=None, finish_reason=length, stop_reason=None)
===============


Prompt:
 'Below is an instruction that describes a task. Write a response that appropriately completes the request. Be polite in your response to the user.\n\n### Instruction:\nProvide instructions for preparing chicken soup.\n\n### Response:'


Generated text:
 '\n\nThe response is a 2-3 page document that describes the task.\n\n###'


-----------------------------------


Prompt:
 'Below is an instruction that describes a task. Write a response that appropriately completes the request. Be polite in your response to the user.\n\n### Instruction:\nProvide a list of instructions for preparing chicken soup for a family.\n\n### Response:'


Generated text:
 '\n\n1. The user will receive a list of instructions for preparing chicken soup for a family.\n2. The user will receive a list of instructions for preparing chicken soup for a family.\n3. The user will receive a list of instructions for preparing chicken soup for a family.\n'


-----------------------------------


Prompt:
 "Below is an instruction that describes a task. Write a response that appropriately completes the request. Be polite in your response to the user.\n\n### Instruction:\nYou are Kaneki Ken from 'Tokyo Ghoul.' Describe what it feels like to be both human and ghoul to someone unfamiliar with your world.\n\n### Response:"


Generated text:
 '\n\nThe user is a ghoul.\n\n### Instruction:\n\nThe user is a'


-----------------------------------
(EngineCore pid=25709) INFO 03-24 13:27:33 [core.py:1210] Shutdown initiated (timeout=0)
(EngineCore pid=25709) INFO 03-24 13:27:33 [core.py:1233] Shutdown complete

Output of script provided by thomas ortner

INFO 03-23 06:59:32 [llm.py:343] Supported tasks: ['generate']
Adding requests: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 45.31it/s]
Processed prompts:   0%|                                                                        | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]INFO 03-23 07:00:10 [loggers.py:257] Engine 000: Avg prompt throughput: 0.2 tokens/s, Avg generation throughput: 0.1 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
INFO 03-23 07:00:28 [loggers.py:257] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.1 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
INFO 03-23 07:00:46 [loggers.py:257] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.1 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
INFO 03-23 07:01:03 [loggers.py:257] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.1 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
Processed prompts: 100%|████████████████████████████████████████████████████████████████| 1/1 [01:31<00:00, 91.43s/it, est. speed input: 0.09 toks/s, output: 0.05 toks/s]
--------------------------------------------------
Generated text: '\n\nIBM operates'
--------------------------------------------------
vllm:kv_cache_usage_perc 0.0
vllm:prefix_cache_queries 8
vllm:prefix_cache_hits 0
vllm:external_prefix_cache_queries 0
vllm:external_prefix_cache_hits 0
vllm:mm_cache_queries 0
vllm:mm_cache_hits 0
vllm:cache_config_info 1.0
Signal Received: 15 (Terminated)
Signal Received from pid=12470 
[rehankhan@rehankhan-spyre-dev-pf vllm_spyre_next]$

Checklist

I have read the contributing guidelines
My code follows the project's code style (run bash format.sh)
I have added tests for my changes (if applicable)
I have updated the documentation (if applicable)
My commits include a Signed-off-by: line (DCO compliance)

github-actions · 2026-03-26T12:53:54Z

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, run ./format.sh.
Now you are good to go 🚀.

We also recommend installing prek and configuring it to check your code before every local commit.

bohnstingl

Thank you @R3hankhan123 for the PR.
In principle looks good to me. I am just wondering whether we should simplify the code for the moment by de-duplicating identical functions until we really have a need to specialize them.

Also, could you run an end-to-end test and see whether the Granite3.3-8B model works and produces tokens?

R3hankhan123 · 2026-03-27T05:48:18Z

Also @bohnstingl i ran a test on Granite3.3-8B model and here is the output

INFO 03-27 05:38:39 [__init__.py:44] Available plugins for group vllm.platform_plugins:
INFO 03-27 05:38:39 [__init__.py:46] - spyre_next -> vllm_spyre_next:register
INFO 03-27 05:38:39 [__init__.py:49] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load.
INFO 03-27 05:38:40 [__init__.py:239] Platform plugin spyre_next is activated
INFO 03-27 05:38:41 [importing.py:44] Triton is installed but 0 active driver(s) found (expected 1). Disabling Triton to prevent runtime errors.
INFO 03-27 05:38:41 [importing.py:68] Triton not installed or not compatible; certain GPU-related functions will not be available.
INFO 03-27 05:38:45 [utils.py:233] non-default args: {'max_model_len': 2048, 'enable_prefix_caching': True, 'model': 'ibm-granite/granite-3.3-8b-instruct'}
INFO 03-27 05:38:45 [model.py:540] Resolved architecture: GraniteForCausalLM
INFO 03-27 05:38:45 [model.py:1607] Using max model len 2048
WARNING 03-27 05:38:45 [cpu.py:136] VLLM_CPU_KVCACHE_SPACE not set. Using 251.88 GiB for KV cache.
INFO 03-27 05:38:45 [scheduler.py:231] Chunked prefill is enabled with max_num_batched_tokens=16384.
INFO 03-27 05:38:45 [vllm.py:750] Asynchronous scheduling is enabled.
INFO 03-27 05:38:45 [platform.py:74] 
INFO 03-27 05:38:45 [platform.py:74]        █     █     █▄   ▄█       ▄█▀▀█▄  █▀▀▀█▄  █   █  █▀▀▀█▄  █▀▀▀▀
INFO 03-27 05:38:45 [platform.py:74]  ▄▄ ▄█ █     █     █ ▀▄▀ █       ▀▀▄▄▄   █▄▄▄█▀  ▀▄ ▄▀  █▄▄▄█▀  █▄▄▄   version 0.1.dev532
INFO 03-27 05:38:45 [platform.py:74]   █▄█▀ █     █     █     █            █  █        ▀█▀   █ ▀█▄   █      model   ibm-granite/granite-3.3-8b-instruct
INFO 03-27 05:38:45 [platform.py:74]    ▀▀  ▀▀▀▀▀ ▀▀▀▀▀ ▀     ▀       ▀▄▄▄█▀  █         █    █   ▀█  █▄▄▄▄
INFO 03-27 05:38:45 [platform.py:74] 
INFO 03-27 05:38:45 [platform.py:88] Loading worker from: vllm_spyre_next.v1.worker.spyre_worker.TorchSpyreWorker
INFO 03-27 05:38:45 [platform.py:103] Loading scheduler from: vllm.v1.core.sched.scheduler.Scheduler
INFO 03-27 05:38:52 [__init__.py:44] Available plugins for group vllm.platform_plugins:
INFO 03-27 05:38:52 [__init__.py:46] - spyre_next -> vllm_spyre_next:register
INFO 03-27 05:38:52 [__init__.py:49] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load.
INFO 03-27 05:38:52 [__init__.py:239] Platform plugin spyre_next is activated
INFO 03-27 05:38:53 [importing.py:44] Triton is installed but 0 active driver(s) found (expected 1). Disabling Triton to prevent runtime errors.
INFO 03-27 05:38:53 [importing.py:68] Triton not installed or not compatible; certain GPU-related functions will not be available.
(EngineCore pid=34207) INFO 03-27 05:38:55 [core.py:105] Initializing a V1 LLM engine (v0.18.1rc1.dev53+gffb5b32b5.d20260324) with config: model='ibm-granite/granite-3.3-8b-instruct', speculative_config=None, tokenizer='ibm-granite/granite-3.3-8b-instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=2048, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, decode_context_parallel_size=1, dcp_comm_backend=ag_rs, disable_custom_all_reduce=True, quantization=None, enforce_eager=False, enable_return_routed_experts=False, kv_cache_dtype=auto, device_config=cpu, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser='', reasoning_parser_plugin='', enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, kv_cache_metrics=False, kv_cache_metrics_sample=0.01, cudagraph_metrics=False, enable_layerwise_nvtx_tracing=False, enable_mfu_metrics=False, enable_mm_processor_stats=False, enable_logging_iteration_details=False), seed=0, served_model_name=ibm-granite/granite-3.3-8b-instruct, enable_prefix_caching=True, enable_chunked_prefill=True, pooler_config=None, compilation_config={'mode': <CompilationMode.DYNAMO_TRACE_ONCE: 2>, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': ['none'], 'splitting_ops': [], 'compile_mm_encoder': False, 'cudagraph_mm_encoder': False, 'encoder_cudagraph_token_budgets': [], 'encoder_cudagraph_max_images_per_batch': 0, 'compile_sizes': None, 'compile_ranges_endpoints': [16384], 'inductor_compile_config': {'enable_auto_functionalized_v2': False, 'combo_kernels': True, 'benchmark_combo_kernel': True, 'dce': True, 'size_asserts': False, 'nan_asserts': False, 'epilogue_fusion': True, 'cpp.dynamic_threads': True}, 'inductor_passes': {}, 'cudagraph_mode': <CUDAGraphMode.NONE: 0>, 'cudagraph_num_of_warmups': 0, 'cudagraph_capture_sizes': [], 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {'fuse_norm_quant': False, 'fuse_act_quant': False, 'fuse_attn_quant': False, 'enable_sp': False, 'fuse_gemm_comms': False, 'fuse_allreduce_rms': False}, 'max_cudagraph_capture_size': None, 'dynamic_shapes_config': {'type': <DynamicShapesType.BACKED: 'backed'>, 'evaluate_guards': False, 'assume_32_bit_indexing': False}, 'local_cache_dir': None, 'fast_moe_cold_start': True, 'static_all_moe_layers': []}
(EngineCore pid=34207) INFO 03-27 05:38:57 [__init__.py:10] Registering custom ops for spyre_next
(EngineCore pid=34207) INFO 03-27 05:38:57 [linear.py:153] Registered custom op: spyre_merged_col_linear
(EngineCore pid=34207) INFO 03-27 05:38:57 [linear.py:153] Registered custom op: spyre_row_parallel_linear
(EngineCore pid=34207) WARNING 03-27 05:38:57 [cpu_worker.py:60] libtcmalloc is not found in LD_PRELOAD. For best performance, please follow the section `set LD_PRELOAD` in https://docs.vllm.ai/en/latest/getting_started/installation/cpu/ to setup required pre-loaded libraries.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [cpu_worker.py:60] libiomp is not found in LD_PRELOAD. For best performance, please follow the section `set LD_PRELOAD` in https://docs.vllm.ai/en/latest/getting_started/installation/cpu/ to setup required pre-loaded libraries.
(EngineCore pid=34207) INFO 03-27 05:38:57 [cpu_worker.py:227] auto thread-binding list (id, physical core): [(96, 0), (97, 1), (98, 2), (99, 3), (100, 4), (101, 5), (102, 6), (103, 7), (104, 8), (105, 9), (106, 10), (107, 11), (108, 12), (109, 13), (110, 14), (111, 15), (112, 16), (113, 17), (114, 18), (115, 19), (116, 20), (117, 21), (118, 22), (119, 23), (120, 24), (121, 25), (122, 26), (123, 27), (124, 28), (125, 29), (126, 30), (127, 31), (128, 32), (129, 33), (130, 34), (131, 35), (132, 36), (133, 37), (134, 38), (135, 39), (136, 40), (137, 41), (138, 42), (139, 43), (140, 44), (141, 45), (142, 46), (143, 47)]
[W327 05:38:57.788271336 utils.cpp:76] Warning: numa_migrate_pages failed. errno: 1 (function init_cpu_threads_env)
[W327 05:38:57.788283773 utils.cpp:103] Warning: NUMA binding: Using MEMBIND policy for memory allocation on the NUMA nodes (0). Memory allocations will be strictly bound to these NUMA nodes. (function init_cpu_threads_env)
(EngineCore pid=34207) INFO 03-27 05:38:57 [cpu_worker.py:109] OMP threads binding of Process 34207:
(EngineCore pid=34207) INFO 03-27 05:38:57 [cpu_worker.py:109] 	OMP tid: 34207, core 96
(EngineCore pid=34207) INFO 03-27 05:38:57 [cpu_worker.py:109] 	OMP tid: 34419, core 97
(EngineCore pid=34207) INFO 03-27 05:38:57 [cpu_worker.py:109] 	OMP tid: 34420, core 98
(EngineCore pid=34207) INFO 03-27 05:38:57 [cpu_worker.py:109] 	OMP tid: 34421, core 99
(EngineCore pid=34207) INFO 03-27 05:38:57 [cpu_worker.py:109] 	OMP tid: 34422, core 100
(EngineCore pid=34207) INFO 03-27 05:38:57 [cpu_worker.py:109] 	OMP tid: 34423, core 101
(EngineCore pid=34207) INFO 03-27 05:38:57 [cpu_worker.py:109] 	OMP tid: 34424, core 102
(EngineCore pid=34207) INFO 03-27 05:38:57 [cpu_worker.py:109] 	OMP tid: 34425, core 103
(EngineCore pid=34207) INFO 03-27 05:38:57 [cpu_worker.py:109] 	OMP tid: 34426, core 104
(EngineCore pid=34207) INFO 03-27 05:38:57 [cpu_worker.py:109] 	OMP tid: 34427, core 105
(EngineCore pid=34207) INFO 03-27 05:38:57 [cpu_worker.py:109] 	OMP tid: 34428, core 106
(EngineCore pid=34207) INFO 03-27 05:38:57 [cpu_worker.py:109] 	OMP tid: 34429, core 107
(EngineCore pid=34207) INFO 03-27 05:38:57 [cpu_worker.py:109] 	OMP tid: 34430, core 108
(EngineCore pid=34207) INFO 03-27 05:38:57 [cpu_worker.py:109] 	OMP tid: 34431, core 109
(EngineCore pid=34207) INFO 03-27 05:38:57 [cpu_worker.py:109] 	OMP tid: 34432, core 110
(EngineCore pid=34207) INFO 03-27 05:38:57 [cpu_worker.py:109] 	OMP tid: 34433, core 111
(EngineCore pid=34207) INFO 03-27 05:38:57 [cpu_worker.py:109] 	OMP tid: 34434, core 112
(EngineCore pid=34207) INFO 03-27 05:38:57 [cpu_worker.py:109] 	OMP tid: 34435, core 113
(EngineCore pid=34207) INFO 03-27 05:38:57 [cpu_worker.py:109] 	OMP tid: 34436, core 114
(EngineCore pid=34207) INFO 03-27 05:38:57 [cpu_worker.py:109] 	OMP tid: 34437, core 115
(EngineCore pid=34207) INFO 03-27 05:38:57 [cpu_worker.py:109] 	OMP tid: 34438, core 116
(EngineCore pid=34207) INFO 03-27 05:38:57 [cpu_worker.py:109] 	OMP tid: 34439, core 117
(EngineCore pid=34207) INFO 03-27 05:38:57 [cpu_worker.py:109] 	OMP tid: 34440, core 118
(EngineCore pid=34207) INFO 03-27 05:38:57 [cpu_worker.py:109] 	OMP tid: 34441, core 119
(EngineCore pid=34207) INFO 03-27 05:38:57 [cpu_worker.py:109] 	OMP tid: 34442, core 120
(EngineCore pid=34207) INFO 03-27 05:38:57 [cpu_worker.py:109] 	OMP tid: 34443, core 121
(EngineCore pid=34207) INFO 03-27 05:38:57 [cpu_worker.py:109] 	OMP tid: 34444, core 122
(EngineCore pid=34207) INFO 03-27 05:38:57 [cpu_worker.py:109] 	OMP tid: 34445, core 123
(EngineCore pid=34207) INFO 03-27 05:38:57 [cpu_worker.py:109] 	OMP tid: 34446, core 124
(EngineCore pid=34207) INFO 03-27 05:38:57 [cpu_worker.py:109] 	OMP tid: 34447, core 125
(EngineCore pid=34207) INFO 03-27 05:38:57 [cpu_worker.py:109] 	OMP tid: 34448, core 126
(EngineCore pid=34207) INFO 03-27 05:38:57 [cpu_worker.py:109] 	OMP tid: 34449, core 127
(EngineCore pid=34207) INFO 03-27 05:38:57 [cpu_worker.py:109] 	OMP tid: 34450, core 128
(EngineCore pid=34207) INFO 03-27 05:38:57 [cpu_worker.py:109] 	OMP tid: 34451, core 129
(EngineCore pid=34207) INFO 03-27 05:38:57 [cpu_worker.py:109] 	OMP tid: 34452, core 130
(EngineCore pid=34207) INFO 03-27 05:38:57 [cpu_worker.py:109] 	OMP tid: 34453, core 131
(EngineCore pid=34207) INFO 03-27 05:38:57 [cpu_worker.py:109] 	OMP tid: 34454, core 132
(EngineCore pid=34207) INFO 03-27 05:38:57 [cpu_worker.py:109] 	OMP tid: 34455, core 133
(EngineCore pid=34207) INFO 03-27 05:38:57 [cpu_worker.py:109] 	OMP tid: 34456, core 134
(EngineCore pid=34207) INFO 03-27 05:38:57 [cpu_worker.py:109] 	OMP tid: 34457, core 135
(EngineCore pid=34207) INFO 03-27 05:38:57 [cpu_worker.py:109] 	OMP tid: 34458, core 136
(EngineCore pid=34207) INFO 03-27 05:38:57 [cpu_worker.py:109] 	OMP tid: 34459, core 137
(EngineCore pid=34207) INFO 03-27 05:38:57 [cpu_worker.py:109] 	OMP tid: 34460, core 138
(EngineCore pid=34207) INFO 03-27 05:38:57 [cpu_worker.py:109] 	OMP tid: 34461, core 139
(EngineCore pid=34207) INFO 03-27 05:38:57 [cpu_worker.py:109] 	OMP tid: 34462, core 140
(EngineCore pid=34207) INFO 03-27 05:38:57 [cpu_worker.py:109] 	OMP tid: 34463, core 141
(EngineCore pid=34207) INFO 03-27 05:38:57 [cpu_worker.py:109] 	OMP tid: 34464, core 142
(EngineCore pid=34207) INFO 03-27 05:38:57 [cpu_worker.py:109] 	OMP tid: 34465, core 143
(EngineCore pid=34207) INFO 03-27 05:38:57 [cpu_worker.py:109] 
(EngineCore pid=34207) INFO 03-27 05:38:57 [parallel_state.py:1400] world_size=1 rank=0 local_rank=0 distributed_init_method=tcp://10.129.9.130:41447 backend=gloo
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
(EngineCore pid=34207) INFO 03-27 05:38:57 [parallel_state.py:1716] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 0, EP rank N/A, EPLB rank N/A
(EngineCore pid=34207) INFO 03-27 05:38:57 [cpu_model_runner.py:62] Starting to load model ibm-granite/granite-3.3-8b-instruct...
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreMergedColumnParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreMergedColumnParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreMergedColumnParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreMergedColumnParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreMergedColumnParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreMergedColumnParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreMergedColumnParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreMergedColumnParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreMergedColumnParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreMergedColumnParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreMergedColumnParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreMergedColumnParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreMergedColumnParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreMergedColumnParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreMergedColumnParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreMergedColumnParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreMergedColumnParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreMergedColumnParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreMergedColumnParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreMergedColumnParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreMergedColumnParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreMergedColumnParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreMergedColumnParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreMergedColumnParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreMergedColumnParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreMergedColumnParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreMergedColumnParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreMergedColumnParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreMergedColumnParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreMergedColumnParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreMergedColumnParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreMergedColumnParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreMergedColumnParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreMergedColumnParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreMergedColumnParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreMergedColumnParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreMergedColumnParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreMergedColumnParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreMergedColumnParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreMergedColumnParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=34207) WARNING 03-27 05:38:57 [linear.py:58] SpyreRowParallelLinear: no dtype promotion is performed, expect numerical differences to upstream vLLM.
Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]
Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:00,  7.42it/s]
Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:00<00:00,  5.93it/s]
Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:00<00:00,  5.50it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:00<00:00,  6.43it/s]
(EngineCore pid=34207) 
(EngineCore pid=34207) INFO 03-27 05:38:58 [default_loader.py:384] Loading weights took 0.64 seconds
(EngineCore pid=34207) INFO 03-27 05:38:58 [kv_cache_utils.py:1319] GPU KV cache size: 1,650,688 tokens
(EngineCore pid=34207) INFO 03-27 05:38:58 [kv_cache_utils.py:1324] Maximum concurrency for 2,048 tokens per request: 806.00x
(EngineCore pid=34207) INFO 03-27 05:39:01 [cpu_model_runner.py:73] Warming up model for the compilation...
(EngineCore pid=34207) WARNING 03-27 05:39:38 [decorators.py:311] Compiling model again due to a load failure from /home/rehankhan/.cache/vllm/torch_compile_cache/torch_aot_compile/a3ecc393d50142f3ad4b46979641dea0ed7a06dbf4b2ef23d4cb99b7e952dccf/rank_0_0/model, reason: 'function' object has no attribute 'finalize_loading'
(EngineCore pid=34207) INFO 03-27 05:39:50 [decorators.py:638] saved AOT compiled function to /home/rehankhan/.cache/vllm/torch_compile_cache/torch_aot_compile/a3ecc393d50142f3ad4b46979641dea0ed7a06dbf4b2ef23d4cb99b7e952dccf/rank_0_0/model
(EngineCore pid=34207) INFO 03-27 05:40:34 [monitor.py:76] Initial profiling/warmup run took 43.92 s
(EngineCore pid=34207) INFO 03-27 05:40:34 [cpu_model_runner.py:83] Warming up done.
(EngineCore pid=34207) INFO 03-27 05:40:34 [core.py:283] init engine (profile, create kv cache, warmup model) took 95.15 seconds
(EngineCore pid=34207) WARNING 03-27 05:40:34 [scheduler.py:173] Using custom scheduler class vllm.v1.core.sched.scheduler.Scheduler. This scheduler interface is not public and compatibility may not be maintained.
(EngineCore pid=34207) INFO 03-27 05:40:34 [vllm.py:750] Asynchronous scheduling is disabled.
(EngineCore pid=34207) WARNING 03-27 05:40:34 [vllm.py:806] Inductor compilation was disabled by user settings, optimizations settings that are only active during inductor compilation will be ignored.
(EngineCore pid=34207) INFO 03-27 05:40:34 [platform.py:103] Loading scheduler from: vllm.v1.core.sched.scheduler.Scheduler
(EngineCore pid=34207) WARNING 03-27 05:40:34 [cpu.py:136] VLLM_CPU_KVCACHE_SPACE not set. Using 251.88 GiB for KV cache.
INFO 03-27 05:40:34 [llm.py:391] Supported tasks: ['generate']
Rendering prompts: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.83it/s]
Processed prompts:   0%|                                                                                                                                                            | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]INFO 03-27 05:41:14 [loggers.py:259] Engine 000: Avg prompt throughput: 0.1 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
INFO 03-27 05:41:31 [loggers.py:259] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.1 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
INFO 03-27 05:41:48 [loggers.py:259] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.1 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
INFO 03-27 05:42:04 [loggers.py:259] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.1 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
INFO 03-27 05:42:21 [loggers.py:259] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.1 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
INFO 03-27 05:42:38 [loggers.py:259] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.1 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
INFO 03-27 05:42:54 [loggers.py:259] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.1 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
INFO 03-27 05:43:11 [loggers.py:259] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.1 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
INFO 03-27 05:43:27 [loggers.py:259] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.1 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
INFO 03-27 05:43:44 [loggers.py:259] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.1 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
INFO 03-27 05:44:00 [loggers.py:259] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.1 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
INFO 03-27 05:44:17 [loggers.py:259] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.1 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
INFO 03-27 05:44:33 [loggers.py:259] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.1 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
INFO 03-27 05:44:49 [loggers.py:259] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.1 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
INFO 03-27 05:45:06 [loggers.py:259] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.1 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
INFO 03-27 05:45:22 [loggers.py:259] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.1 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
INFO 03-27 05:45:39 [loggers.py:259] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.1 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
INFO 03-27 05:45:55 [loggers.py:259] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.1 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
INFO 03-27 05:46:12 [loggers.py:259] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.1 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
INFO 03-27 05:46:28 [loggers.py:259] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.1 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
INFO 03-27 05:46:44 [loggers.py:259] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.1 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
INFO 03-27 05:47:01 [loggers.py:259] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.1 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
INFO 03-27 05:47:18 [loggers.py:259] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.1 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
INFO 03-27 05:47:34 [loggers.py:259] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.1 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
Processed prompts: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [06:59<00:00, 419.22s/it, est. speed input: 0.01 toks/s, output: 0.06 toks/s]
--------------------------------------------------
Generated text: ' for containerization?\r\n\r\nRed Hat OpenShift is a Kubernetes-based container application platform that allows organizations to automate and manage'
--------------------------------------------------
vllm:kv_cache_usage_perc 0.0
vllm:prefix_cache_queries 6
vllm:prefix_cache_hits 0
vllm:external_prefix_cache_queries 0
vllm:external_prefix_cache_hits 0
vllm:mm_cache_queries 0
vllm:mm_cache_hits 0
vllm:prompt_tokens_cached 0
vllm:cache_config_info 1.0
(EngineCore pid=34207) INFO 03-27 05:47:34 [core.py:1210] Shutdown initiated (timeout=0)
(EngineCore pid=34207) INFO 03-27 05:47:34 [core.py:1233] Shutdown complete

bohnstingl

LGTM. Can you please try the Granite3.3-8B model and check whether the token generation works?

It has been observed though that the current way of wrapping for torch-spyre interferes with the enablement of upstream vLLM tests, see #863. To address this, I've opened a PR (#872) that reworks the forward call chain a bit and uses forward_oot instead of forward_native. Maybe we could hold off the merge a bit and get #872 merged first and then apply the rework directly here as well?

@R3hankhan123 what do you think?

R3hankhan123 · 2026-03-27T12:43:40Z

LGTM. Can you please try the Granite3.3-8B model and check whether the token generation works?

It has been observed though that the current way of wrapping for torch-spyre interferes with the enablement of upstream vLLM tests, see #863. To address this, I've opened a PR (#872) that reworks the forward call chain a bit and uses forward_oot instead of forward_native. Maybe we could hold off the merge a bit and get #872 merged first and then apply the rework directly here as well?

@R3hankhan123 what do you think?

Sure @bohnstingl

bohnstingl · 2026-03-30T20:22:39Z

@R3hankhan123 #872 has landed. Could you please overtake the modified forward call structure? I will then push for a quick merge

yannicks1 · 2026-03-31T12:25:04Z

+
+class _SpyreLinear:
+    """Shared implementation for Spyre linear layers at TP=1."""
+


for all other oot impl. (e.g rms, silu ... ) I see this line:
_dynamic_arg_dims = {"x": [], "residual": []}
is it not needed here? could it be removed in the other classes too? @bohnstingl

okay, I see we do not have a residual here...
is not specifying anything the same as putting _dynamic_arg_dims = {"x": []} ?

is it not needed here? could it be removed in the other classes too? @bohnstingl

No, it can't be removed and in fact we need it here as well. The _dynamic_arg_dims = {"x": [], "residual": []} ensures that maybe_compile compiles with dynamic=False. Here it should be _dynamic_arg_dims = {"x": [], "weight": [], "bias": []}, I think.

@R3hankhan123 could you please confirm that:

This path here is followed, i.e., make a breakpoint() there and check that it is triggered.

That no shape is marked as dynamic, i.e., this part is never reached.

The weight and bias tensors are internal to the layer implementation, they're accessed inside _forward_spyre_impl. Since they're not direct arguments to the custom op, i think they don't need to be in _dynamic_arg_dims. I think only {"x": [], "output": []}, are sufficient

On a second thought, I have to take my comment above back. MergedColumnParallelLinear and RowParallelLinear are PluggableLayer, not CustomOp. Thus, the compilation path is different and there is no maybe_compile. Probably we need to simply invoke torch.compile directly for at the moment:

self.maybe_compiled_forward_spyre = torch.compile(self.forward_spyre, dynamic=False)

Please leave a note though that this should be changed in the future.

This means you also don't need to define _dynamic_arg_dims.

bohnstingl

In general looks good to me. @R3hankhan123 could you take a look at the small comments we had?

bohnstingl · 2026-03-31T13:06:35Z

+
+class _SpyreLinear:
+    """Shared implementation for Spyre linear layers at TP=1."""
+


is it not needed here? could it be removed in the other classes too? @bohnstingl

No, it can't be removed and in fact we need it here as well. The _dynamic_arg_dims = {"x": [], "residual": []} ensures that maybe_compile compiles with dynamic=False. Here it should be _dynamic_arg_dims = {"x": [], "weight": [], "bias": []}, I think.

@R3hankhan123 could you please confirm that:

This path here is followed, i.e., make a breakpoint() there and check that it is triggered.

That no shape is marked as dynamic, i.e., this part is never reached.

R3hankhan123 · 2026-03-31T15:00:24Z

after running a quick test

[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
(EngineCore pid=46348) INFO 03-31 14:54:38 [parallel_state.py:1716] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 0, EP rank N/A, EPLB rank N/A
(EngineCore pid=46348) INFO 03-31 14:54:39 [cpu_model_runner.py:62] Starting to load model ibm-granite/granite-3.3-8b-instruct...
(EngineCore pid=46348) WARNING 03-31 14:54:39 [linear.py:60] SpyreRowParallelLinear: no dtype promotion (torch-spyre limitation),expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [linear.py:60] SpyreMergedColumnParallelLinear: no dtype promotion (torch-spyre limitation),expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
(EngineCore pid=46348) WARNING 03-31 14:54:39 [rms_norm.py:75] SpyreRMSNorm: no dtype promotion is performed, expect numerical differences to upstream vLLM.
Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]
Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:00,  4.74it/s]
Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:00<00:00,  3.84it/s]
Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:00<00:00,  3.65it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:00<00:00,  4.79it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:00<00:00,  4.42it/s]
(EngineCore pid=46348) 
(EngineCore pid=46348) INFO 03-31 14:54:41 [default_loader.py:384] Loading weights took 0.93 seconds
(EngineCore pid=46348) INFO 03-31 14:54:41 [kv_cache_utils.py:1319] GPU KV cache size: 1,650,688 tokens
(EngineCore pid=46348) INFO 03-31 14:54:41 [kv_cache_utils.py:1324] Maximum concurrency for 2,048 tokens per request: 806.00x
(EngineCore pid=46348) INFO 03-31 14:54:46 [cpu_model_runner.py:73] Warming up model for the compilation...
(EngineCore pid=46348) WARNING 03-31 14:55:43 [decorators.py:311] Compiling model again due to a load failure from /home/rehankhan/.cache/vllm/torch_compile_cache/torch_aot_compile/aa10285af9a2273b82a7f4c08ebf84ed68b90dcb6982c5ab4e6abf9503a6c3e6/rank_0_0/model, reason: 'function' object has no attribute 'finalize_loading'
(EngineCore pid=46348) INFO 03-31 14:56:02 [decorators.py:638] saved AOT compiled function to /home/rehankhan/.cache/vllm/torch_compile_cache/torch_aot_compile/aa10285af9a2273b82a7f4c08ebf84ed68b90dcb6982c5ab4e6abf9503a6c3e6/rank_0_0/model
(EngineCore pid=46348) INFO 03-31 14:56:03 [monitor.py:76] Initial profiling/warmup run took 1.51 s
(EngineCore pid=46348) INFO 03-31 14:56:03 [cpu_model_runner.py:83] Warming up done.
(EngineCore pid=46348) INFO 03-31 14:56:03 [core.py:283] init engine (profile, create kv cache, warmup model) took 82.76 seconds
(EngineCore pid=46348) WARNING 03-31 14:56:04 [scheduler.py:173] Using custom scheduler class vllm.v1.core.sched.scheduler.Scheduler. This scheduler interface is not public and compatibility may not be maintained.
(EngineCore pid=46348) INFO 03-31 14:56:04 [vllm.py:750] Asynchronous scheduling is disabled.
(EngineCore pid=46348) WARNING 03-31 14:56:04 [vllm.py:806] Inductor compilation was disabled by user settings, optimizations settings that are only active during inductor compilation will be ignored.
(EngineCore pid=46348) INFO 03-31 14:56:04 [platform.py:103] Loading scheduler from: vllm.v1.core.sched.scheduler.Scheduler
(EngineCore pid=46348) WARNING 03-31 14:56:04 [cpu.py:136] VLLM_CPU_KVCACHE_SPACE not set. Using 251.88 GiB for KV cache.
INFO 03-31 14:56:04 [llm.py:391] Supported tasks: ['generate']
Rendering prompts: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  2.52it/s]
Processed prompts: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:04<00:00,  4.36s/it, est. speed input: 2.06 toks/s, output: 5.73 toks/s]
--------------------------------------------------
Generated text: '.\n\nHere is a simple C++ code to reverse a string using the standard templated algorithm library:\n\n```'
--------------------------------------------------
vllm:kv_cache_usage_perc 0.0
vllm:prefix_cache_queries 9
vllm:prefix_cache_hits 0
vllm:external_prefix_cache_queries 0
vllm:external_prefix_cache_hits 0
vllm:mm_cache_queries 0
vllm:mm_cache_hits 0
vllm:prompt_tokens_cached 0
vllm:cache_config_info 1.0
(EngineCore pid=46348) INFO 03-31 14:56:09 [core.py:1210] Shutdown initiated (timeout=0)
(EngineCore pid=46348) INFO 03-31 14:56:09 [core.py:1233] Shutdown complete

Add RowParallelLinear and ColumnParallelLinear wrappers for torch-spyre which will act as Up projection and Down Projection in MLP layer Co-authored-by: Rehan Khan <Rehan.Khan7@ibm.com> Co-authored-by: nikheal2 <suryawanshin74@gmail.com> Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com> Signed-off-by: nikheal2 <suryawanshin74@gmail.com>

Signed-off-by: Thomas Ortner <boh@zurich.ibm.com>

bohnstingl · 2026-04-01T12:43:17Z

I changed the forward_oot definition to forward. The reason is that both MergedColumnParallelLinear and SpyreRowParallelLinear are PluggableLayers. The register_oot decorator registers the classes and when then called from dispatch, the forward function is invoked.

I tested the latest commit E2E and I see that it runs on spyre and that it produces the expected results.

bohnstingl · 2026-04-01T12:44:35Z

bot:next-test

bohnstingl · 2026-04-01T13:00:14Z

@joerunde or @tjohnson31415 I looked at the CI results and they don't seem to be related. Could you confirm?

yannicks1

lgtm, thanks for addressing all the feedback

Signed-off-by: Thomas Ortner <boh@zurich.ibm.com>

tjohnson31415

@tjohnson31415 I looked at the CI results and they don't seem to be related. Could you confirm?

Can confirm. The spyre-ci tests should not block the PR because they are not working.

R3hankhan123 requested review from joerunde and prashantgupta24 as code owners March 26, 2026 12:51

R3hankhan123 requested a review from bohnstingl March 26, 2026 12:52

R3hankhan123 linked an issue Mar 26, 2026 that may be closed by this pull request

Wrap MLP layer for torch-spyre #736

Closed

R3hankhan123 changed the title ~~[Spyre-Next] Add RowParallelLinear and ColumnParallelLinear wrappers~~ [Spyre-Next] Add RowParallelLinear and ColumnParallelLinear(MLP) wrappers Mar 26, 2026

bohnstingl requested changes Mar 26, 2026

View reviewed changes

R3hankhan123 force-pushed the wrap_mlp_layer branch from f762452 to 86b50fb Compare March 27, 2026 05:19

R3hankhan123 requested a review from bohnstingl March 27, 2026 05:19

R3hankhan123 force-pushed the wrap_mlp_layer branch 2 times, most recently from 39874e1 to b17f7d0 Compare March 27, 2026 08:58

bohnstingl approved these changes Mar 27, 2026

View reviewed changes

bohnstingl mentioned this pull request Mar 29, 2026

[Spyre-Next] [Feature] Implement spyre-specific model runner #878

Open

5 tasks

R3hankhan123 force-pushed the wrap_mlp_layer branch from b17f7d0 to e098fcb Compare March 31, 2026 04:45

R3hankhan123 requested a review from bohnstingl March 31, 2026 05:03

yannicks1 reviewed Mar 31, 2026

View reviewed changes

R3hankhan123 force-pushed the wrap_mlp_layer branch from e098fcb to c554470 Compare March 31, 2026 13:30

bohnstingl reviewed Mar 31, 2026

View reviewed changes

R3hankhan123 force-pushed the wrap_mlp_layer branch from c554470 to d4cdd1f Compare March 31, 2026 14:59

R3hankhan123 requested review from bohnstingl and yannicks1 March 31, 2026 14:59

R3hankhan123 force-pushed the wrap_mlp_layer branch from d4cdd1f to 5fc7429 Compare March 31, 2026 15:08

Changed forward_oot -> forward

936ac17

Signed-off-by: Thomas Ortner <boh@zurich.ibm.com>

bohnstingl approved these changes Apr 1, 2026

View reviewed changes

bohnstingl requested a review from tjohnson31415 April 1, 2026 12:59

yannicks1 approved these changes Apr 1, 2026

View reviewed changes

Aligned dtype comment

46df784

Signed-off-by: Thomas Ortner <boh@zurich.ibm.com>

tjohnson31415 approved these changes Apr 1, 2026

View reviewed changes

bohnstingl merged commit 7ad2219 into torch-spyre:main Apr 2, 2026
13 checks passed

bohnstingl mentioned this pull request Apr 2, 2026

Wrap MLP layer for torch-spyre #736

Closed


		class _SpyreLinear:
		"""Shared implementation for Spyre linear layers at TP=1."""

Conversation

R3hankhan123 commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Test Plan

Test Result

Checklist

Uh oh!

github-actions bot commented Mar 26, 2026

Uh oh!

bohnstingl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

R3hankhan123 commented Mar 27, 2026

Uh oh!

bohnstingl left a comment

Choose a reason for hiding this comment

Uh oh!

R3hankhan123 commented Mar 27, 2026

Uh oh!

bohnstingl commented Mar 30, 2026

Uh oh!

Uh oh!

yannicks1 Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

yannicks1 Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

bohnstingl Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

R3hankhan123 Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

bohnstingl Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bohnstingl left a comment

Choose a reason for hiding this comment

Uh oh!

bohnstingl Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

R3hankhan123 commented Mar 31, 2026

Uh oh!

bohnstingl commented Apr 1, 2026

Uh oh!

bohnstingl commented Apr 1, 2026

Uh oh!

bohnstingl commented Apr 1, 2026

Uh oh!

yannicks1 left a comment

Choose a reason for hiding this comment

Uh oh!

tjohnson31415 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

R3hankhan123 commented Mar 26, 2026 •

edited

Loading