Remove model_name param from Whisper-Metal (#15798)

seyeong-han · web-flow · commit 6ca1db6f4a0d · 2025-11-14T19:52:43.000-05:00
# Issue I found that the tokenizer was wrong when I run `open-ai/whisper-tiny` model using `whisper_runner` by looking at the transcription result. - Expected ``` <\|en\|><\|transcribe\|><\|notimestamps\|> This week, I traveled to Chicago to deliver my final farewell address to the nation, following in the tradition of presidents before me. It was not opportunity to say thank you. Whether we've seen IDI or rarely agreed at all, my conversations with you, the American people, in living rooms and schools,<\|endoftext\|> ``` - Result ``` <|startoftranscript|><|translate|><|10.00|> So.<|24.00|><|endoftext|> ``` Since HuggingFace has updated all Whisper model tokenizers to the v3 format, we don't need to care about the `decoder_start_token_id` manually. ## Solution - **Removed `model_name` argument** from `run.sh` and `main.cpp` - Hardcoded `decoder_start_token_id=50258` for all models - Fixes tokenizer compatibility issue where all Whisper models from HuggingFace now use the v3 tokenizer format - Eliminates confusion about which model name to pass at runtime @manuelcandales
diff --git a/examples/models/whisper/main.cpp b/examples/models/whisper/main.cpp
@@ -39,10 +39,6 @@ DEFINE_string(
     audio_path,
     "",
     "Path to input audio file. Accepts .wav or raw float .bin.");
-DEFINE_string(
-    model_name,
-    "base",
-    "Whisper model name (base, small, medium, large, large-v2, large-v3, large-v3-turbo).");
 DEFINE_double(
     temperature,
     0.0,
@@ -114,21 +110,10 @@ int main(int argc, char** argv) {
   config.max_new_tokens = FLAGS_max_new_tokens;
   config.temperature = static_cast<float>(FLAGS_temperature);
 
-  // Set decoder_start_token_id based on model version
-  if (FLAGS_model_name == "large-v2" || FLAGS_model_name == "large-v3" ||
-      FLAGS_model_name == "large-v3-turbo") {
-    config.decoder_start_token_id = 50258;
-    ET_LOG(
-        Info,
-        "Using decoder_start_token_id=50258 for model: %s",
-        FLAGS_model_name.c_str());
-  } else {
-    config.decoder_start_token_id = 50257;
-    ET_LOG(
-        Info,
-        "Using decoder_start_token_id=50257 for model: %s",
-        FLAGS_model_name.c_str());
-  }
+  // All Whisper models from HuggingFace now use the v3 tokenizer format
+  // where token 50257 = <|endoftext|> and token 50258 = <|startoftranscript|>
+  config.decoder_start_token_id = 50258;
+  ET_LOG(Info, "Using decoder_start_token_id=50258");
 
   auto result =
       runner.transcribe(features, config, [&](const std::string& piece) {