Skip to content

Commit 6ca1db6

Browse files
authored
Remove model_name param from Whisper-Metal (#15798)
# Issue I found that the tokenizer was wrong when I run `open-ai/whisper-tiny` model using `whisper_runner` by looking at the transcription result. - Expected ``` <\|en\|><\|transcribe\|><\|notimestamps\|> This week, I traveled to Chicago to deliver my final farewell address to the nation, following in the tradition of presidents before me. It was not opportunity to say thank you. Whether we've seen IDI or rarely agreed at all, my conversations with you, the American people, in living rooms and schools,<\|endoftext\|> ``` - Result ``` <|startoftranscript|><|translate|><|10.00|> So.<|24.00|><|endoftext|> ``` Since HuggingFace has updated all Whisper model tokenizers to the v3 format, we don't need to care about the `decoder_start_token_id` manually. ## Solution - **Removed `model_name` argument** from `run.sh` and `main.cpp` - Hardcoded `decoder_start_token_id=50258` for all models - Fixes tokenizer compatibility issue where all Whisper models from HuggingFace now use the v3 tokenizer format - Eliminates confusion about which model name to pass at runtime @manuelcandales
1 parent a6c5921 commit 6ca1db6

File tree

1 file changed

+4
-19
lines changed

1 file changed

+4
-19
lines changed

examples/models/whisper/main.cpp

Lines changed: 4 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -39,10 +39,6 @@ DEFINE_string(
3939
audio_path,
4040
"",
4141
"Path to input audio file. Accepts .wav or raw float .bin.");
42-
DEFINE_string(
43-
model_name,
44-
"base",
45-
"Whisper model name (base, small, medium, large, large-v2, large-v3, large-v3-turbo).");
4642
DEFINE_double(
4743
temperature,
4844
0.0,
@@ -114,21 +110,10 @@ int main(int argc, char** argv) {
114110
config.max_new_tokens = FLAGS_max_new_tokens;
115111
config.temperature = static_cast<float>(FLAGS_temperature);
116112

117-
// Set decoder_start_token_id based on model version
118-
if (FLAGS_model_name == "large-v2" || FLAGS_model_name == "large-v3" ||
119-
FLAGS_model_name == "large-v3-turbo") {
120-
config.decoder_start_token_id = 50258;
121-
ET_LOG(
122-
Info,
123-
"Using decoder_start_token_id=50258 for model: %s",
124-
FLAGS_model_name.c_str());
125-
} else {
126-
config.decoder_start_token_id = 50257;
127-
ET_LOG(
128-
Info,
129-
"Using decoder_start_token_id=50257 for model: %s",
130-
FLAGS_model_name.c_str());
131-
}
113+
// All Whisper models from HuggingFace now use the v3 tokenizer format
114+
// where token 50257 = <|endoftext|> and token 50258 = <|startoftranscript|>
115+
config.decoder_start_token_id = 50258;
116+
ET_LOG(Info, "Using decoder_start_token_id=50258");
132117

133118
auto result =
134119
runner.transcribe(features, config, [&](const std::string& piece) {

0 commit comments

Comments
 (0)