- 
                Notifications
    You must be signed in to change notification settings 
- Fork 266
Description
⚙️ Your current environment
The output of python collect_env.py
### Environment Information ###
Operating System: `Linux-5.4.0-125-generic-x86_64-with-glibc2.35`
Python Version: `3.12.11 (main, Jun  4 2025, 08:56:18) [GCC 11.4.0]`
llm-compressor Version: `0.8.1`
compressed-tensors Version: `0.12.3a20251013`
transformers Version: `4.57.1`
torch Version: `2.8.0+cu128`
CUDA Devices: `['NVIDIA A100-SXM4-80GB']`
AMD Devices: `None`
🐛 Describe the bug
/usr/local/lib/python3.12/dist-packages/torch/cuda/init.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
import pynvml  # type: ignore[import]
torch_dtype is deprecated! Use dtype instead!
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:34<00:00, 17.45s/it]
Generating test split: 3532 examples [00:02, 1485.15 examples/s]
Map: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 12.18 examples/s]
2025-10-17T00:56:56.307532-0700 | reset | INFO - Compression lifecycle reset
2025-10-17T00:56:56.315648-0700 | _create_default_logger | INFO - Logging all LLM Compressor modifier-level logs to sparse_logs/17-10-2025_00.56.56.log
2025-10-17T00:56:56.317732-0700 | from_modifiers | INFO - Creating recipe from modifiers
2025-10-17T00:56:56.375590-0700 | initialize | INFO - Compression lifecycle initialized for 1 modifiers
2025-10-17T00:56:56.375803-0700 | IndependentPipeline | INFO - Inferred SequentialPipeline for GPTQModifier
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Traceback (most recent call last):
File "/nfs/AE/zhanghong/llm-compressor-0.8.1/examples/multimodal_vision/zh_qwen_3_vl_4b_example.py", line 85, in 
oneshot(
File "/nfs/AE/zhanghong/dev/llm-compressor/src/llmcompressor/entrypoints/oneshot.py", line 330, in oneshot
one_shot()
File "/nfs/AE/zhanghong/dev/llm-compressor/src/llmcompressor/entrypoints/oneshot.py", line 158, in call
self.apply_recipe_modifiers(
File "/nfs/AE/zhanghong/dev/llm-compressor/src/llmcompressor/entrypoints/oneshot.py", line 201, in apply_recipe_modifiers
pipeline(
File "/nfs/AE/zhanghong/dev/llm-compressor/src/llmcompressor/pipelines/independent/pipeline.py", line 45, in call
pipeline(model, dataloader, dataset_args)
File "/nfs/AE/zhanghong/dev/llm-compressor/src/llmcompressor/pipelines/sequential/pipeline.py", line 71, in call
subgraphs = trace_subgraphs(model, sample_input, sequential_targets, ignore)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nfs/AE/zhanghong/dev/llm-compressor/src/llmcompressor/pipelines/sequential/helpers.py", line 125, in trace_subgraphs
tracer.trace(
File "/usr/local/lib/python3.12/dist-packages/transformers/utils/fx.py", line 1316, in trace
self.graph = super().trace(root, concrete_args=concrete_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/fx/_symbolic_trace.py", line 850, in trace
(self.create_arg(fn(*args)),),
^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/transformers/utils/generic.py", line -1, in wrapper
File "/usr/local/lib/python3.12/dist-packages/transformers/utils/fx.py", line 681, in getattr
return HFAttribute(self, k)
^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/transformers/utils/fx.py", line 700, in init
self.install_metadata(getattr(self.root._metadata, attr))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'Tensor' object has no attribute 'get'. Did you mean: 'det'?
🛠️ Steps to reproduce
No response