Commit 6ca1db6
authored
Remove model_name param from Whisper-Metal (#15798)
# Issue
I found that the tokenizer was wrong when I run `open-ai/whisper-tiny`
model using `whisper_runner` by looking at the transcription result.
- Expected
```
<\|en\|><\|transcribe\|><\|notimestamps\|> This week, I traveled to Chicago to deliver my final farewell address to the nation, following in the tradition of presidents before me. It was not opportunity to say thank you. Whether we've seen IDI or rarely agreed at all, my conversations with you, the American people, in living rooms and schools,<\|endoftext\|>
```
- Result
```
<|startoftranscript|><|translate|><|10.00|> So.<|24.00|><|endoftext|>
```
Since HuggingFace has updated all Whisper model tokenizers to the v3
format, we don't need to care about the `decoder_start_token_id`
manually.
## Solution
- **Removed `model_name` argument** from `run.sh` and `main.cpp`
- Hardcoded `decoder_start_token_id=50258` for all models
- Fixes tokenizer compatibility issue where all Whisper models from
HuggingFace now use the v3 tokenizer format
- Eliminates confusion about which model name to pass at runtime
@manuelcandales1 parent a6c5921 commit 6ca1db6
1 file changed
+4
-19
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
39 | 39 | | |
40 | 40 | | |
41 | 41 | | |
42 | | - | |
43 | | - | |
44 | | - | |
45 | | - | |
46 | 42 | | |
47 | 43 | | |
48 | 44 | | |
| |||
114 | 110 | | |
115 | 111 | | |
116 | 112 | | |
117 | | - | |
118 | | - | |
119 | | - | |
120 | | - | |
121 | | - | |
122 | | - | |
123 | | - | |
124 | | - | |
125 | | - | |
126 | | - | |
127 | | - | |
128 | | - | |
129 | | - | |
130 | | - | |
131 | | - | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
132 | 117 | | |
133 | 118 | | |
134 | 119 | | |
| |||
0 commit comments