Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 11 additions & 6 deletions mistralrs-bench/src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@ use cli_table::{format::Justify, print_stdout, Cell, CellStruct, Style, Table};
use mistralrs_core::{
get_auto_device_map_params, get_model_dtype, initialize_logging, paged_attn_supported,
parse_isq_value, Constraint, DefaultSchedulerMethod, DeviceLayerMapMetadata, DeviceMapMetadata,
DeviceMapSetting, DrySamplingParams, IsqType, Loader, LoaderBuilder, MemoryGpuConfig,
MistralRs, MistralRsBuilder, ModelSelected, NormalRequest, PagedAttentionConfig, Request,
RequestMessage, Response, SamplingParams, SchedulerConfig, TokenSource, Usage,
DeviceMapSetting, DrySamplingParams, Loader, LoaderBuilder, MemoryGpuConfig, MistralRs,
MistralRsBuilder, ModelSelected, NormalRequest, PagedAttentionConfig, Request, RequestMessage,
Response, SamplingParams, SchedulerConfig, TokenSource, Usage,
};
use std::sync::Arc;
use std::{fmt::Display, num::NonZeroUsize};
Expand Down Expand Up @@ -300,8 +300,8 @@ struct Args {
num_device_layers: Option<Vec<String>>,

/// In-situ quantization to apply.
#[arg(long = "isq", value_parser = parse_isq_value)]
in_situ_quant: Option<IsqType>,
#[arg(long = "isq")]
in_situ_quant: Option<String>,

Comment on lines +303 to 305
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Early-stage CLI validation for --isq was lost – re-introduce via value_parser

By changing the field to Option<String> without a value_parser, invalid ISQ values will now silently pass clap’s parsing stage and only be caught (maybe) much later in the program flow. You can keep the defer-to-device behaviour while still validating user input early:

-#[arg(long = "isq")]
-in_situ_quant: Option<String>,
+#[arg(long = "isq", value_parser = |s: &str| parse_isq_value(s, None))]
+in_situ_quant: Option<IsqType>,

This preserves the new string syntax at the CLI, catches typos immediately, and still allows you to decide later whether the chosen ISQ is legal for the detected device.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#[arg(long = "isq")]
in_situ_quant: Option<String>,
#[arg(long = "isq", value_parser = |s: &str| parse_isq_value(s, None))]
in_situ_quant: Option<IsqType>,
🤖 Prompt for AI Agents
In mistralrs-bench/src/main.rs around lines 303 to 305, the CLI argument for
--isq was changed to Option<String> without a value_parser, causing loss of
early validation and allowing invalid values to pass silently. To fix this,
reintroduce a value_parser for the in_situ_quant field that validates the input
string against allowed ISQ values during CLI parsing. This will catch invalid
inputs immediately while preserving the option to defer detailed validation to
later in the program.

/// GPU memory to allocate for KV cache with PagedAttention in MBs.
/// PagedAttention is supported on CUDA and Metal. It is automatically activated on CUDA but not on Metal.
Expand Down Expand Up @@ -490,14 +490,19 @@ fn main() -> anyhow::Result<()> {
(_, _, _, _, _, _) => None,
};

let isq = args
.in_situ_quant
.as_ref()
.and_then(|isq| parse_isq_value(isq, Some(&device)).ok());

Comment on lines +493 to +497
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Parsing errors are swallowed – user never knows their ISQ argument was ignored

ok() converts the Result into an Option, discarding the error. If the user types an invalid value (e.g. --isq q9k) the model will just load un-quantised, giving confusing behaviour.

-let isq = args
-    .in_situ_quant
-    .as_ref()
-    .and_then(|isq| parse_isq_value(isq, Some(&device)).ok());
+let isq = match &args.in_situ_quant {
+    Some(s) => Some(parse_isq_value(s, Some(&device))?),
+    None    => None,
+};

Now an invalid flag stops execution with a clear error instead of silently degrading.

🤖 Prompt for AI Agents
In mistralrs-bench/src/main.rs around lines 493 to 497, the code currently uses
ok() to convert the Result from parse_isq_value into an Option, which discards
any parsing errors and silently ignores invalid ISQ arguments. To fix this,
change the code to properly handle the Result by checking for errors and
returning or displaying a clear error message to the user if parsing fails,
preventing silent degradation and stopping execution on invalid input.

let pipeline = loader.load_model_from_hf(
None,
token_source,
&dtype,
&device,
false,
mapper,
args.in_situ_quant,
isq,
cache_config,
)?;
info!("Model loaded.");
Expand Down
23 changes: 12 additions & 11 deletions mistralrs-core/src/pipeline/isq.rs
Original file line number Diff line number Diff line change
Expand Up @@ -59,19 +59,20 @@ pub const UQFF_MULTI_FILE_DELIMITER: &str = ";";
/// - `AFQ4`
/// - `AFQ6`
/// - `AFQ8`
pub fn parse_isq_value(s: &str) -> Result<IsqType, String> {
pub fn parse_isq_value(s: &str, device: Option<&Device>) -> Result<IsqType, String> {
let is_metal = device.map(|device| device.is_metal()).unwrap_or(false);
let tp = match s.to_lowercase().as_str() {
"2" if cfg!(feature = "metal") => IsqType::AFQ2,
"2" if !cfg!(feature = "metal") => IsqType::Q2K,
"3" if cfg!(feature = "metal") => IsqType::AFQ3,
"3" if !cfg!(feature = "metal") => IsqType::Q3K,
"4" if cfg!(feature = "metal") => IsqType::AFQ4,
"4" if !cfg!(feature = "metal") => IsqType::Q4K,
"2" if is_metal => IsqType::AFQ2,
"2" if !is_metal => IsqType::Q2K,
"3" if is_metal => IsqType::AFQ3,
"3" if !is_metal => IsqType::Q3K,
"4" if is_metal => IsqType::AFQ4,
"4" if !is_metal => IsqType::Q4K,
"5" => IsqType::Q5K,
"6" if cfg!(feature = "metal") => IsqType::AFQ6,
"6" if !cfg!(feature = "metal") => IsqType::Q6K,
"8" if cfg!(feature = "metal") => IsqType::AFQ8,
"8" if !cfg!(feature = "metal") => IsqType::Q8_0,
"6" if is_metal => IsqType::AFQ6,
"6" if !is_metal => IsqType::Q6K,
"8" if is_metal => IsqType::AFQ8,
"8" if !is_metal => IsqType::Q8_0,
"q4_0" => IsqType::Q4_0,
"q4_1" => IsqType::Q4_1,
"q5_0" => IsqType::Q5_0,
Expand Down
7 changes: 5 additions & 2 deletions mistralrs-core/src/pipeline/loaders/normal_loaders.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3004,8 +3004,8 @@ impl IsqModelLoader for Qwen3Loader {
Regex::new(r"layers\.(\d+)\.mlp\.down_proj\.(weight|bias)$")?,
])
}
fn immediate_isq_predicates_moqe(&self, config: &str) -> Result<Vec<Regex>> {
self.isq_layer_regexes_moqe(config)
fn immediate_isq_predicates(&self, config: &str) -> Result<Vec<Regex>> {
self.isq_layer_regexes(config)
}
}

Expand Down Expand Up @@ -3189,6 +3189,9 @@ impl IsqModelLoader for Qwen3MoELoader {
Regex::new(r"layers\.(\d+)\.mlp\.experts\.(\d+)\.down_proj\.(weight|bias)$")?,
])
}
fn immediate_isq_predicates(&self, config: &str) -> Result<Vec<Regex>> {
self.isq_layer_regexes(config)
}
fn immediate_isq_predicates_moqe(&self, config: &str) -> Result<Vec<Regex>> {
self.isq_layer_regexes_moqe(config)
}
Expand Down
4 changes: 4 additions & 0 deletions mistralrs-core/src/pipeline/normal.rs
Original file line number Diff line number Diff line change
Expand Up @@ -478,6 +478,10 @@ impl Loader for NormalLoader {
} else {
self.inner.immediate_isq_predicates(&config)?
};
info!("Applying ISQ to {in_situ_quant:?}");
if predicates.is_empty() {
warn!("No predicates for this model and ISQ setting detected. ISQ will not be applied to any weights!");
}
mistralrs_quant::set_immediate_isq(in_situ_quant, predicates);
false
} else {
Expand Down
4 changes: 4 additions & 0 deletions mistralrs-core/src/pipeline/vision.rs
Original file line number Diff line number Diff line change
Expand Up @@ -409,6 +409,10 @@ impl Loader for VisionLoader {
&& self.config.write_uqff.is_none()
{
let predicates = self.inner.immediate_isq_predicates(&config)?;
info!("Applying ISQ to {in_situ_quant:?}");
if predicates.is_empty() {
warn!("No predicates for this model and ISQ setting detected. ISQ will not be applied to any weights!");
}
mistralrs_quant::set_immediate_isq(in_situ_quant, predicates);
false
} else {
Expand Down
2 changes: 1 addition & 1 deletion mistralrs-core/src/topology/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ impl Topology {
}
let range = CustomRange { start, end };
let isq = if let Some(isq) = isq {
Some(parse_isq_value(&isq).map_err(anyhow::Error::msg)?)
Some(parse_isq_value(&isq, None).map_err(anyhow::Error::msg)?)
} else {
None
};
Expand Down
4 changes: 2 additions & 2 deletions mistralrs-pyo3/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -682,7 +682,7 @@ impl Runner {

let device = get_device(seed).as_ref().map_err(PyApiErr::from)?;
let isq = if let Some(isq) = in_situ_quant {
Some(parse_isq_value(&isq).map_err(PyApiErr::from)?)
Some(parse_isq_value(&isq, Some(device)).map_err(PyApiErr::from)?)
} else {
None
};
Expand Down Expand Up @@ -1416,7 +1416,7 @@ impl Runner {
/// Send a request to re-ISQ the model. If the model was loaded as GGUF or GGML
/// then nothing will happen.
fn send_re_isq(&self, dtype: String) -> PyApiResult<()> {
let request = _Request::ReIsq(parse_isq_value(&dtype)?);
let request = _Request::ReIsq(parse_isq_value(&dtype, None)?);
self.runner.get_sender()?.blocking_send(request).unwrap();
Ok(())
}
Expand Down
22 changes: 16 additions & 6 deletions mistralrs-server/src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ use clap::Parser;
use mistralrs_core::{
get_auto_device_map_params, get_model_dtype, get_tgt_non_granular_index, initialize_logging,
paged_attn_supported, parse_isq_value, BertEmbeddingModel, DefaultSchedulerMethod,
DeviceLayerMapMetadata, DeviceMapMetadata, DeviceMapSetting, IsqType, Loader, LoaderBuilder,
DeviceLayerMapMetadata, DeviceMapMetadata, DeviceMapSetting, Loader, LoaderBuilder,
MemoryGpuConfig, MistralRs, MistralRsBuilder, ModelSelected, PagedAttentionConfig, Request,
SchedulerConfig, TokenSource,
};
Expand Down Expand Up @@ -119,8 +119,8 @@ struct Args {
num_device_layers: Option<Vec<String>>,

/// In-situ quantization to apply.
#[arg(long = "isq", value_parser = parse_isq_value)]
in_situ_quant: Option<IsqType>,
#[arg(long = "isq")]
in_situ_quant: Option<String>,
Comment on lines +122 to +123
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Re-add clap-level validation for --isq

Same concern as in the bench binary: turning the field into Option<String> removes early validation. Add a value_parser or custom ArgEnum so that typos are rejected before the (potentially expensive) model load begins.

🤖 Prompt for AI Agents
In mistralrs-server/src/main.rs around lines 122 to 123, the in_situ_quant field
is defined as Option<String> without clap-level validation, which allows typos
to pass through. To fix this, replace Option<String> with a type that uses
clap's value_parser or a custom ArgEnum to enforce valid values at argument
parsing time, ensuring invalid inputs are rejected early before model loading.


/// GPU memory to allocate for KV cache with PagedAttention in MBs.
/// PagedAttention is supported on CUDA and Metal. It is automatically activated on CUDA but not on Metal.
Expand Down Expand Up @@ -223,7 +223,7 @@ async fn re_isq(
) -> Result<String, String> {
let repr = format!("Re ISQ: {:?}", request.ggml_type);
MistralRs::maybe_log_request(state.clone(), repr.clone());
let request = Request::ReIsq(parse_isq_value(&request.ggml_type)?);
let request = Request::ReIsq(parse_isq_value(&request.ggml_type, None)?);
state.get_sender().unwrap().send(request).await.unwrap();
Ok(repr)
}
Expand Down Expand Up @@ -300,7 +300,12 @@ async fn main() -> Result<()> {
.build()?;

#[cfg(feature = "metal")]
let device = Device::new_metal(0)?;
let device = if args.cpu {
args.no_paged_attn = true;
Device::Cpu
} else {
Device::new_metal(0)?
};
#[cfg(not(feature = "metal"))]
let device = if args.cpu {
args.no_paged_attn = true;
Expand Down Expand Up @@ -426,14 +431,19 @@ async fn main() -> Result<()> {
(_, _, _, _, _, _) => None,
};

let isq = args
.in_situ_quant
.as_ref()
.and_then(|isq| parse_isq_value(isq, Some(&device)).ok());

Comment on lines +434 to +438
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Silent drop of invalid --isq value – propagate the error

Identical issue to the bench tool: using .ok() discards the error.

-let isq = args
-    .in_situ_quant
-    .as_ref()
-    .and_then(|isq| parse_isq_value(isq, Some(&device)).ok());
+let isq = match &args.in_situ_quant {
+    Some(s) => Some(parse_isq_value(s, Some(&device))?),
+    None    => None,
+};

This will return a 500 at startup with a clear error instead of quietly ignoring the flag, which is preferable for an operator-facing server binary.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let isq = args
.in_situ_quant
.as_ref()
.and_then(|isq| parse_isq_value(isq, Some(&device)).ok());
let isq = match &args.in_situ_quant {
Some(s) => Some(parse_isq_value(s, Some(&device))?),
None => None,
};
🤖 Prompt for AI Agents
In mistralrs-server/src/main.rs around lines 434 to 438, the code silently drops
errors from parsing the `--isq` flag by using `.ok()`, which discards the error.
Modify the code to propagate the error instead of ignoring it, so that invalid
`--isq` values cause a clear startup failure with an appropriate error message,
preventing silent misconfiguration.

let pipeline = loader.load_model_from_hf(
None,
args.token_source,
&dtype,
&device,
false,
mapper,
args.in_situ_quant,
isq,
cache_config,
)?;
info!("Model loaded.");
Expand Down
2 changes: 1 addition & 1 deletion mistralrs/examples/perplexity/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ async fn main() -> Result<()> {
let args = Args::parse();

let quant = if let Some(isq) = &args.isq {
Some(parse_isq_value(isq).map_err(anyhow::Error::msg)?)
Some(parse_isq_value(isq, None).map_err(anyhow::Error::msg)?)
} else {
None
};
Expand Down
Loading