max tokens and split on word params doesn't work #156

thewh1teagle · 2024-06-01T19:41:34Z

I'm trying to enable set_max_tokens along with set_split_on_word to provide a way to set max word per sentence but when I set split_on_word to true and max_tokens to anything more than 0 then the transcription happens very fast but with gibberish and only 2 sentence for long audio

In original whisper.cpp cli program it works as expected with max length per line.

The text was updated successfully, but these errors were encountered:

arizhih · 2024-06-25T11:54:54Z

@thewh1teagle Hi, as I can see in whisper.cpp split_on_word works with max_len and not with max_tokens parameter. Also they implicitly enable token_timestamps when max_len > 0

wparams.token_timestamps = params.output_wts || params.output_jsn_full || params.max_len > 0;
wparams.max_len          = params.output_wts && params.max_len == 0 ? 60 : params.max_len;
wparams.split_on_word    = params.split_on_word;

and later split_on_words works only when token_timestams == true and max_len > 0

https://github.com/ggerganov/whisper.cpp/blob/bf4cb4abad4e35c74b387df034cc4ac7b22e5fe6/whisper.cpp#L6224

So try to enable token_timestamps and split_on_words flags and set max_len to the desired maximum segment length in characters. Hope it helps.

thewh1teagle · 2024-06-26T15:22:51Z

@arizhih
Thanks!
I'm looking for split it per word so users can easily select max words per sentence. it's useful for creating video captions where you have limit in the width in the screen.
Splitting it per letters is harder / not accurate.
Is there a way to achieve it through word splitting?

I have another idea.
I can enable token timestamps and take how many words I want. however It may be less accurate and may split in the middle of sentence, does whisper.cpp split sentences smarter by default?

arizhih · 2024-06-26T16:19:53Z

By default whisper produce from 1 to N segments with different length.

When you set token_timestamps and max_len whisper will split large segments into multiple segments, each of them not greater than max_len. If you add split_on_word then each segment will be a little bit larger( to the end of the last word).

It doesn't affect how it produce sentences at all, just how it returns segments.

thewh1teagle · 2024-06-27T17:42:44Z

It doesn't affect how it produce sentences at all, just how it returns segments.

Thanks, so I understand that it's not the right way to produce max words per sentence.
I thought about simpler way: getting token timestamps from whisper and then I can build the sentences in the way I want with max words per sentence.

However, when using token timestamps it produce incorrect tokens, or at least it looks incorrect since it count symbols as single tokens.

regular

[
    {
        "start": 0,
        "stop": 520,
        "text": " It's whoever, not whomever. That's whomever. No whomever is never actually right."
    },
    {
        "start": 520,
        "stop": 934,
        "text": " Well sometimes it's right. Michael is right. It's a made-up word used to trick"
    },
    {
        "start": 934,
        "stop": 1418,
        "text": " students. No actually whomever is the formal version of the word. Obviously"
    },
    {
        "start": 1418,
        "stop": 1792,
        "text": " it's a real word, but I don't know when to use it correctly. Not a native speaker."
    },
    {
        "start": 1792,
        "stop": 2200,
        "text": " I know what's right, but I'm not gonna say because you're all jerks who didn't"
    },
    {
        "start": 2200,
        "stop": 2540,
        "text": " come see my band last night. Do you really know which one is correct? I don't know."
    },
    {
        "start": 2540,
        "stop": 2942,
        "text": " It's whom when it's the object of the sentence and who when is the subject. That"
    },
    {
        "start": 2942,
        "stop": 4942,
        "text": " sounds right. Well it sounds right but is it? How did Ryan use it as an object? As an object. Ryan used me as an object. How did he use it again? It was Ryan wanted Michael the subject to explain the computer system, the object, to whomever, meaning us, the indirect object, which is the correct usage of the word."
    }
]

token timestamps

[
    {
        "start": 0,
        "stop": 14,
        "text": " It"
    },
    {
        "start": 14,
        "stop": 28,
        "text": "'s"
    },
    {
        "start": 28,
        "stop": 79,
        "text": " whoever"
    },
    {
        "start": 93,
        "stop": 93,
        "text": ","
    },
    {
        "start": 94,
        "stop": 115,
        "text": " not"
    },
    {
        "start": 122,
        "stop": 129,
        "text": " wh"
    },
    {
        "start": 129,
        "stop": 147,
        "text": "ome"
    },
    {
        "start": 152,
        "stop": 173,
        "text": "ver"
    },
    {
        "start": 173,
        "stop": 200,
        "text": "."
    },
    {
        "start": 200,
        "stop": 223,
        "text": " That"
    },
    {
        "start": 223,
        "stop": 233,
        "text": "'s"
    },
    {
        "start": 234,
        "stop": 245,
        "text": " wh"
    },
    {
        "start": 245,
        "stop": 262,
        "text": "ome"
    },
    {
        "start": 262,
        "stop": 278,
        "text": "ver"
    },
    {
        "start": 279,
        "stop": 298,
        "text": "."
    },
    {
        "start": 304,
        "stop": 313,
        "text": " No"
    },
    {
        "start": 313,
        "stop": 326,
        "text": " wh"
    },
    {
        "start": 326,
        "stop": 345,
        "text": "ome"
    },
    {
        "start": 345,
        "stop": 364,
        "text": "ver"
    },
    {
        "start": 364,
        "stop": 365,
        "text": " is"
    },
    {
        "start": 380,
        "stop": 410,
        "text": " never"
    },
    {
        "start": 410,
        "stop": 463,
        "text": " actually"
    },
    {
        "start": 463,
        "stop": 496,
        "text": " right"
    },
    {
        "start": 496,
        "stop": 520,
        "text": "."
    },
    {
        "start": 520,
        "stop": 544,
        "text": " Well"
    },
    {
        "start": 544,
        "stop": 597,
        "text": " sometimes"
    },
    {
        "start": 597,
        "stop": 609,
        "text": " it"
    },
    {
        "start": 609,
        "stop": 615,
        "text": "'s"
    },
    {
        "start": 623,
        "stop": 649,
        "text": " right"
    },
    {
        "start": 649,
        "stop": 658,
        "text": "."
    },
    {
        "start": 667,
        "stop": 706,
        "text": " Michael"
    },
    {
        "start": 707,
        "stop": 718,
        "text": " is"
    },
    {
        "start": 718,
        "stop": 741,
        "text": " right"
    },
    {
        "start": 752,
        "stop": 765,
        "text": "."
    },
    {
        "start": 765,
        "stop": 777,
        "text": " It"
    },
    {
        "start": 777,
        "stop": 788,
        "text": "'s"
    },
    {
        "start": 788,
        "stop": 794,
        "text": " a"
    },
    {
        "start": 794,
        "stop": 818,
        "text": " made"
    },
    {
        "start": 818,
        "stop": 819,
        "text": "-"
    },
    {
        "start": 831,
        "stop": 834,
        "text": "up"
    },
    {
        "start": 834,
        "stop": 855,
        "text": " word"
    },
    {
        "start": 858,
        "stop": 879,
        "text": " used"
    },
    {
        "start": 886,
        "stop": 894,
        "text": " to"
    },
    {
        "start": 894,
        "stop": 931,
        "text": " trick"
    },
    {
        "start": 936,
        "stop": 990,
        "text": " students"
    },
    {
        "start": 990,
        "stop": 1008,
        "text": "."
    },
    {
        "start": 1010,
        "stop": 1012,
        "text": " No"
    },
    {
        "start": 1037,
        "stop": 1079,
        "text": " actually"
    },
    {
        "start": 1095,
        "stop": 1095,
        "text": " wh"
    },
    {
        "start": 1095,
        "stop": 1116,
        "text": "ome"
    },
    {
        "start": 1132,
        "stop": 1137,
        "text": "ver"
    },
    {
        "start": 1137,
        "stop": 1151,
        "text": " is"
    },
    {
        "start": 1151,
        "stop": 1172,
        "text": " the"
    },
    {
        "start": 1172,
        "stop": 1214,
        "text": " formal"
    },
    {
        "start": 1214,
        "stop": 1263,
        "text": " version"
    },
    {
        "start": 1263,
        "stop": 1277,
        "text": " of"
    },
    {
        "start": 1277,
        "stop": 1298,
        "text": " the"
    },
    {
        "start": 1298,
        "stop": 1326,
        "text": " word"
    },
    {
        "start": 1326,
        "stop": 1347,
        "text": "."
    },
    {
        "start": 1347,
        "stop": 1417,
        "text": " Obviously"
    },
    {
        "start": 1418,
        "stop": 1428,
        "text": " it"
    },
    {
        "start": 1428,
        "stop": 1435,
        "text": "'s"
    },
    {
        "start": 1440,
        "stop": 1443,
        "text": " a"
    },
    {
        "start": 1443,
        "stop": 1464,
        "text": " real"
    },
    {
        "start": 1464,
        "stop": 1485,
        "text": " word"
    },
    {
        "start": 1485,
        "stop": 1494,
        "text": ","
    },
    {
        "start": 1494,
        "stop": 1505,
        "text": " but"
    },
    {
        "start": 1509,
        "stop": 1512,
        "text": " I"
    },
    {
        "start": 1522,
        "stop": 1530,
        "text": " don"
    },
    {
        "start": 1530,
        "stop": 1538,
        "text": "'t"
    },
    {
        "start": 1547,
        "stop": 1561,
        "text": " know"
    },
    {
        "start": 1561,
        "stop": 1582,
        "text": " when"
    },
    {
        "start": 1582,
        "stop": 1592,
        "text": " to"
    },
    {
        "start": 1592,
        "stop": 1607,
        "text": " use"
    },
    {
        "start": 1607,
        "stop": 1617,
        "text": " it"
    },
    {
        "start": 1617,
        "stop": 1664,
        "text": " correctly"
    },
    {
        "start": 1664,
        "stop": 1678,
        "text": "."
    },
    {
        "start": 1678,
        "stop": 1694,
        "text": " Not"
    },
    {
        "start": 1694,
        "stop": 1698,
        "text": " a"
    },
    {
        "start": 1699,
        "stop": 1730,
        "text": " native"
    },
    {
        "start": 1730,
        "stop": 1761,
        "text": " speaker"
    },
    {
        "start": 1767,
        "stop": 1792,
        "text": "."
    },
    {
        "start": 1792,
        "stop": 1798,
        "text": " I"
    },
    {
        "start": 1800,
        "stop": 1823,
        "text": " know"
    },
    {
        "start": 1823,
        "stop": 1848,
        "text": " what"
    },
    {
        "start": 1848,
        "stop": 1860,
        "text": "'s"
    },
    {
        "start": 1860,
        "stop": 1881,
        "text": " right"
    },
    {
        "start": 1889,
        "stop": 1903,
        "text": ","
    },
    {
        "start": 1904,
        "stop": 1910,
        "text": " but"
    },
    {
        "start": 1923,
        "stop": 1927,
        "text": " I"
    },
    {
        "start": 1927,
        "stop": 1939,
        "text": "'m"
    },
    {
        "start": 1939,
        "stop": 1957,
        "text": " not"
    },
    {
        "start": 1957,
        "stop": 1988,
        "text": " gonna"
    },
    {
        "start": 1988,
        "stop": 2005,
        "text": " say"
    },
    {
        "start": 2005,
        "stop": 2023,
        "text": " because"
    },
    {
        "start": 2050,
        "stop": 2067,
        "text": " you"
    },
    {
        "start": 2067,
        "stop": 2085,
        "text": "'re"
    },
    {
        "start": 2085,
        "stop": 2103,
        "text": " all"
    },
    {
        "start": 2103,
        "stop": 2120,
        "text": " jer"
    },
    {
        "start": 2125,
        "stop": 2133,
        "text": "ks"
    },
    {
        "start": 2133,
        "stop": 2148,
        "text": " who"
    },
    {
        "start": 2157,
        "stop": 2175,
        "text": " didn"
    },
    {
        "start": 2177,
        "stop": 2199,
        "text": "'t"
    },
    {
        "start": 2206,
        "stop": 2218,
        "text": " come"
    },
    {
        "start": 2218,
        "stop": 2231,
        "text": " see"
    },
    {
        "start": 2231,
        "stop": 2240,
        "text": " my"
    },
    {
        "start": 2240,
        "stop": 2258,
        "text": " band"
    },
    {
        "start": 2258,
        "stop": 2276,
        "text": " last"
    },
    {
        "start": 2276,
        "stop": 2293,
        "text": " night"
    },
    {
        "start": 2301,
        "stop": 2312,
        "text": "."
    },
    {
        "start": 2312,
        "stop": 2321,
        "text": " Do"
    },
    {
        "start": 2321,
        "stop": 2334,
        "text": " you"
    },
    {
        "start": 2334,
        "stop": 2361,
        "text": " really"
    },
    {
        "start": 2361,
        "stop": 2379,
        "text": " know"
    },
    {
        "start": 2379,
        "stop": 2402,
        "text": " which"
    },
    {
        "start": 2402,
        "stop": 2411,
        "text": " one"
    },
    {
        "start": 2417,
        "stop": 2424,
        "text": " is"
    },
    {
        "start": 2424,
        "stop": 2456,
        "text": " correct"
    },
    {
        "start": 2456,
        "stop": 2457,
        "text": "?"
    },
    {
        "start": 2471,
        "stop": 2473,
        "text": " I"
    },
    {
        "start": 2473,
        "stop": 2486,
        "text": " don"
    },
    {
        "start": 2486,
        "stop": 2504,
        "text": "'t"
    },
    {
        "start": 2504,
        "stop": 2507,
        "text": " know"
    },
    {
        "start": 2524,
        "stop": 2540,
        "text": "."
    },
    {
        "start": 2540,
        "stop": 2551,
        "text": " It"
    },
    {
        "start": 2551,
        "stop": 2561,
        "text": "'s"
    },
    {
        "start": 2574,
        "stop": 2584,
        "text": " whom"
    },
    {
        "start": 2591,
        "stop": 2608,
        "text": " when"
    },
    {
        "start": 2608,
        "stop": 2619,
        "text": " it"
    },
    {
        "start": 2619,
        "stop": 2630,
        "text": "'s"
    },
    {
        "start": 2630,
        "stop": 2647,
        "text": " the"
    },
    {
        "start": 2647,
        "stop": 2682,
        "text": " object"
    },
    {
        "start": 2682,
        "stop": 2693,
        "text": " of"
    },
    {
        "start": 2693,
        "stop": 2710,
        "text": " the"
    },
    {
        "start": 2710,
        "stop": 2756,
        "text": " sentence"
    },
    {
        "start": 2756,
        "stop": 2773,
        "text": " and"
    },
    {
        "start": 2773,
        "stop": 2790,
        "text": " who"
    },
    {
        "start": 2790,
        "stop": 2813,
        "text": " when"
    },
    {
        "start": 2813,
        "stop": 2824,
        "text": " is"
    },
    {
        "start": 2824,
        "stop": 2841,
        "text": " the"
    },
    {
        "start": 2841,
        "stop": 2879,
        "text": " subject"
    },
    {
        "start": 2881,
        "stop": 2905,
        "text": "."
    },
    {
        "start": 2917,
        "stop": 2942,
        "text": " That"
    },
    {
        "start": 2942,
        "stop": 2969,
        "text": " sounds"
    },
    {
        "start": 2969,
        "stop": 2992,
        "text": " right"
    },
    {
        "start": 2997,
        "stop": 3005,
        "text": "."
    },
    {
        "start": 3005,
        "stop": 3016,
        "text": " Well"
    },
    {
        "start": 3026,
        "stop": 3032,
        "text": " it"
    },
    {
        "start": 3032,
        "stop": 3059,
        "text": " sounds"
    },
    {
        "start": 3059,
        "stop": 3082,
        "text": " right"
    },
    {
        "start": 3082,
        "stop": 3095,
        "text": " but"
    },
    {
        "start": 3095,
        "stop": 3103,
        "text": " is"
    },
    {
        "start": 3104,
        "stop": 3113,
        "text": " it"
    },
    {
        "start": 3113,
        "stop": 3126,
        "text": "?"
    },
    {
        "start": 3126,
        "stop": 3139,
        "text": " How"
    },
    {
        "start": 3139,
        "stop": 3152,
        "text": " did"
    },
    {
        "start": 3152,
        "stop": 3170,
        "text": " Ryan"
    },
    {
        "start": 3170,
        "stop": 3183,
        "text": " use"
    },
    {
        "start": 3183,
        "stop": 3192,
        "text": " it"
    },
    {
        "start": 3192,
        "stop": 3201,
        "text": " as"
    },
    {
        "start": 3201,
        "stop": 3210,
        "text": " an"
    },
    {
        "start": 3210,
        "stop": 3237,
        "text": " object"
    },
    {
        "start": 3237,
        "stop": 3250,
        "text": "?"
    },
    {
        "start": 3250,
        "stop": 3256,
        "text": " As"
    },
    {
        "start": 3260,
        "stop": 3268,
        "text": " an"
    },
    {
        "start": 3268,
        "stop": 3295,
        "text": " object"
    },
    {
        "start": 3295,
        "stop": 3320,
        "text": "."
    },
    {
        "start": 3335,
        "stop": 3358,
        "text": " Ryan"
    },
    {
        "start": 3358,
        "stop": 3392,
        "text": " used"
    },
    {
        "start": 3392,
        "stop": 3409,
        "text": " me"
    },
    {
        "start": 3409,
        "stop": 3426,
        "text": " as"
    },
    {
        "start": 3426,
        "stop": 3442,
        "text": " an"
    },
    {
        "start": 3442,
        "stop": 3464,
        "text": " object"
    },
    {
        "start": 3503,
        "stop": 3521,
        "text": "."
    },
    {
        "start": 3521,
        "stop": 3547,
        "text": " How"
    },
    {
        "start": 3547,
        "stop": 3566,
        "text": " did"
    },
    {
        "start": 3573,
        "stop": 3587,
        "text": " he"
    },
    {
        "start": 3598,
        "stop": 3614,
        "text": " use"
    },
    {
        "start": 3627,
        "stop": 3633,
        "text": " it"
    },
    {
        "start": 3633,
        "stop": 3675,
        "text": " again"
    },
    {
        "start": 3676,
        "stop": 3708,
        "text": "?"
    },
    {
        "start": 3708,
        "stop": 3729,
        "text": " It"
    },
    {
        "start": 3730,
        "stop": 3763,
        "text": " was"
    },
    {
        "start": 3763,
        "stop": 3808,
        "text": " Ryan"
    },
    {
        "start": 3808,
        "stop": 3836,
        "text": " wanted"
    },
    {
        "start": 3840,
        "stop": 3878,
        "text": " Michael"
    },
    {
        "start": 3878,
        "stop": 3896,
        "text": " the"
    },
    {
        "start": 3896,
        "stop": 3952,
        "text": " subject"
    },
    {
        "start": 3964,
        "stop": 3976,
        "text": " to"
    },
    {
        "start": 3976,
        "stop": 4036,
        "text": " explain"
    },
    {
        "start": 4036,
        "stop": 4045,
        "text": " the"
    },
    {
        "start": 4051,
        "stop": 4085,
        "text": " computer"
    },
    {
        "start": 4085,
        "stop": 4105,
        "text": " system"
    },
    {
        "start": 4112,
        "stop": 4121,
        "text": ","
    },
    {
        "start": 4121,
        "stop": 4136,
        "text": " the"
    },
    {
        "start": 4136,
        "stop": 4182,
        "text": " object"
    },
    {
        "start": 4188,
        "stop": 4202,
        "text": ","
    },
    {
        "start": 4214,
        "stop": 4218,
        "text": " to"
    },
    {
        "start": 4218,
        "stop": 4231,
        "text": " wh"
    },
    {
        "start": 4241,
        "stop": 4259,
        "text": "ome"
    },
    {
        "start": 4259,
        "stop": 4281,
        "text": "ver"
    },
    {
        "start": 4289,
        "stop": 4300,
        "text": ","
    },
    {
        "start": 4300,
        "stop": 4359,
        "text": " meaning"
    },
    {
        "start": 4359,
        "stop": 4375,
        "text": " us"
    },
    {
        "start": 4375,
        "stop": 4391,
        "text": ","
    },
    {
        "start": 4391,
        "stop": 4424,
        "text": " the"
    },
    {
        "start": 4424,
        "stop": 4503,
        "text": " indirect"
    },
    {
        "start": 4506,
        "stop": 4568,
        "text": " object"
    },
    {
        "start": 4568,
        "stop": 4584,
        "text": ","
    },
    {
        "start": 4591,
        "stop": 4636,
        "text": " which"
    },
    {
        "start": 4641,
        "stop": 4659,
        "text": " is"
    },
    {
        "start": 4659,
        "stop": 4690,
        "text": " the"
    },
    {
        "start": 4690,
        "stop": 4755,
        "text": " correct"
    },
    {
        "start": 4755,
        "stop": 4755,
        "text": " usage"
    },
    {
        "start": 4755,
        "stop": 4755,
        "text": " of"
    },
    {
        "start": 4755,
        "stop": 4755,
        "text": " the"
    },
    {
        "start": 4755,
        "stop": 4755,
        "text": " word"
    },
    {
        "start": 4755,
        "stop": 4755,
        "text": "."
    }
]

Created with Vibe app.

arizhih · 2024-06-27T18:17:31Z

or at least it looks incorrect since it count symbols as single tokens.

It’s because token is not a word. Whisper has about 54000 tokens and all words is built from this tokens.

Maybe if you set max_len to 1 and enable option
split_on_word it produce one segment for each word.

thewh1teagle · 2024-06-27T18:31:00Z

Maybe if you set max_len to 1 and enable option
split_on_word it produce one segment for each word.

Same

  params.set_token_timestamps(true);
  params.set_split_on_word(true);
  params.set_max_len(1);

transcript.json

[
    {
        "start": 0,
        "stop": 14,
        "text": " It"
    },
    {
        "start": 14,
        "stop": 28,
        "text": "'s"
    },
    {
        "start": 28,
        "stop": 79,
        "text": " whoever"
    },
    {
        "start": 93,
        "stop": 93,
        "text": ","
    },
    {
        "start": 94,
        "stop": 115,
        "text": " not"
    },
    {
        "start": 122,
        "stop": 129,
        "text": " wh"
    },
    {
        "start": 129,
        "stop": 147,
        "text": "ome"
    },
    {
        "start": 152,
        "stop": 173,
        "text": "ver"
    },
    {
        "start": 173,
        "stop": 200,
        "text": "."
    },
    {
        "start": 200,
        "stop": 223,
        "text": " That"
    },
    {
        "start": 223,
        "stop": 233,
        "text": "'s"
    },
    {
        "start": 234,
        "stop": 245,
        "text": " wh"
    },
    {
        "start": 245,
        "stop": 262,
        "text": "ome"
    },
    {
        "start": 262,
        "stop": 278,
        "text": "ver"
    },
    {
        "start": 279,
        "stop": 298,
        "text": "."
    },
    {
        "start": 304,
        "stop": 313,
        "text": " No"
    },
    {
        "start": 313,
        "stop": 326,
        "text": " wh"
    },
    {
        "start": 326,
        "stop": 345,
        "text": "ome"
    },
    {
        "start": 345,
        "stop": 364,
        "text": "ver"
    },
    {
        "start": 364,
        "stop": 365,
        "text": " is"
    },
    {
        "start": 380,
        "stop": 410,
        "text": " never"
    },
    {
        "start": 410,
        "stop": 463,
        "text": " actually"
    },
    {
        "start": 463,
        "stop": 496,
        "text": " right"
    },
    {
        "start": 496,
        "stop": 520,
        "text": "."
    },
    {
        "start": 520,
        "stop": 544,
        "text": " Well"
    },
    {
        "start": 544,
        "stop": 597,
        "text": " sometimes"
    },
    {
        "start": 597,
        "stop": 609,
        "text": " it"
    },
    {
        "start": 609,
        "stop": 615,
        "text": "'s"
    },
    {
        "start": 623,
        "stop": 649,
        "text": " right"
    },
    {
        "start": 649,
        "stop": 658,
        "text": "."
    },
    {
        "start": 667,
        "stop": 706,
        "text": " Michael"
    },
    {
        "start": 707,
        "stop": 718,
        "text": " is"
    },
    {
        "start": 718,
        "stop": 741,
        "text": " right"
    },
    {
        "start": 752,
        "stop": 765,
        "text": "."
    },
    {
        "start": 765,
        "stop": 777,
        "text": " It"
    },
    {
        "start": 777,
        "stop": 788,
        "text": "'s"
    },
    {
        "start": 788,
        "stop": 794,
        "text": " a"
    },
    {
        "start": 794,
        "stop": 818,
        "text": " made"
    },
    {
        "start": 818,
        "stop": 819,
        "text": "-"
    },
    {
        "start": 831,
        "stop": 834,
        "text": "up"
    },
    {
        "start": 834,
        "stop": 855,
        "text": " word"
    },
    {
        "start": 858,
        "stop": 879,
        "text": " used"
    },
    {
        "start": 886,
        "stop": 894,
        "text": " to"
    },
    {
        "start": 894,
        "stop": 931,
        "text": " trick"
    },
    {
        "start": 936,
        "stop": 990,
        "text": " students"
    },
    {
        "start": 990,
        "stop": 1008,
        "text": "."
    },
    {
        "start": 1010,
        "stop": 1012,
        "text": " No"
    },
    {
        "start": 1037,
        "stop": 1079,
        "text": " actually"
    },
    {
        "start": 1095,
        "stop": 1095,
        "text": " wh"
    },
    {
        "start": 1095,
        "stop": 1116,
        "text": "ome"
    },
    {
        "start": 1132,
        "stop": 1137,
        "text": "ver"
    },
    {
        "start": 1137,
        "stop": 1151,
        "text": " is"
    },
    {
        "start": 1151,
        "stop": 1172,
        "text": " the"
    },
    {
        "start": 1172,
        "stop": 1214,
        "text": " formal"
    },
    {
        "start": 1214,
        "stop": 1263,
        "text": " version"
    },
    {
        "start": 1263,
        "stop": 1277,
        "text": " of"
    },
    {
        "start": 1277,
        "stop": 1298,
        "text": " the"
    },
    {
        "start": 1298,
        "stop": 1326,
        "text": " word"
    },
    {
        "start": 1326,
        "stop": 1347,
        "text": "."
    },
    {
        "start": 1347,
        "stop": 1417,
        "text": " Obviously"
    },
    {
        "start": 1418,
        "stop": 1428,
        "text": " it"
    },
    {
        "start": 1428,
        "stop": 1435,
        "text": "'s"
    },
    {
        "start": 1440,
        "stop": 1443,
        "text": " a"
    },
    {
        "start": 1443,
        "stop": 1464,
        "text": " real"
    },
    {
        "start": 1464,
        "stop": 1485,
        "text": " word"
    },
    {
        "start": 1485,
        "stop": 1494,
        "text": ","
    },
    {
        "start": 1494,
        "stop": 1505,
        "text": " but"
    },
    {
        "start": 1509,
        "stop": 1512,
        "text": " I"
    },
    {
        "start": 1522,
        "stop": 1530,
        "text": " don"
    },
    {
        "start": 1530,
        "stop": 1538,
        "text": "'t"
    },
    {
        "start": 1547,
        "stop": 1561,
        "text": " know"
    },
    {
        "start": 1561,
        "stop": 1582,
        "text": " when"
    },
    {
        "start": 1582,
        "stop": 1592,
        "text": " to"
    },
    {
        "start": 1592,
        "stop": 1607,
        "text": " use"
    },
    {
        "start": 1607,
        "stop": 1617,
        "text": " it"
    },
    {
        "start": 1617,
        "stop": 1664,
        "text": " correctly"
    },
    {
        "start": 1664,
        "stop": 1678,
        "text": "."
    },
    {
        "start": 1678,
        "stop": 1694,
        "text": " Not"
    },
    {
        "start": 1694,
        "stop": 1698,
        "text": " a"
    },
    {
        "start": 1699,
        "stop": 1730,
        "text": " native"
    },
    {
        "start": 1730,
        "stop": 1761,
        "text": " speaker"
    },
    {
        "start": 1767,
        "stop": 1792,
        "text": "."
    },
    {
        "start": 1792,
        "stop": 1798,
        "text": " I"
    },
    {
        "start": 1800,
        "stop": 1823,
        "text": " know"
    },
    {
        "start": 1823,
        "stop": 1848,
        "text": " what"
    },
    {
        "start": 1848,
        "stop": 1860,
        "text": "'s"
    },
    {
        "start": 1860,
        "stop": 1881,
        "text": " right"
    },
    {
        "start": 1889,
        "stop": 1903,
        "text": ","
    },
    {
        "start": 1904,
        "stop": 1910,
        "text": " but"
    },
    {
        "start": 1923,
        "stop": 1927,
        "text": " I"
    },
    {
        "start": 1927,
        "stop": 1939,
        "text": "'m"
    },
    {
        "start": 1939,
        "stop": 1957,
        "text": " not"
    },
    {
        "start": 1957,
        "stop": 1988,
        "text": " gonna"
    },
    {
        "start": 1988,
        "stop": 2005,
        "text": " say"
    },
    {
        "start": 2005,
        "stop": 2023,
        "text": " because"
    },
    {
        "start": 2050,
        "stop": 2067,
        "text": " you"
    },
    {
        "start": 2067,
        "stop": 2085,
        "text": "'re"
    },
    {
        "start": 2085,
        "stop": 2103,
        "text": " all"
    },
    {
        "start": 2103,
        "stop": 2120,
        "text": " jer"
    },
    {
        "start": 2125,
        "stop": 2133,
        "text": "ks"
    },
    {
        "start": 2133,
        "stop": 2148,
        "text": " who"
    },
    {
        "start": 2157,
        "stop": 2175,
        "text": " didn"
    },
    {
        "start": 2177,
        "stop": 2199,
        "text": "'t"
    },
    {
        "start": 2206,
        "stop": 2218,
        "text": " come"
    },
    {
        "start": 2218,
        "stop": 2231,
        "text": " see"
    },
    {
        "start": 2231,
        "stop": 2240,
        "text": " my"
    },
    {
        "start": 2240,
        "stop": 2258,
        "text": " band"
    },
    {
        "start": 2258,
        "stop": 2276,
        "text": " last"
    },
    {
        "start": 2276,
        "stop": 2293,
        "text": " night"
    },
    {
        "start": 2301,
        "stop": 2312,
        "text": "."
    },
    {
        "start": 2312,
        "stop": 2321,
        "text": " Do"
    },
    {
        "start": 2321,
        "stop": 2334,
        "text": " you"
    },
    {
        "start": 2334,
        "stop": 2361,
        "text": " really"
    },
    {
        "start": 2361,
        "stop": 2379,
        "text": " know"
    },
    {
        "start": 2379,
        "stop": 2402,
        "text": " which"
    },
    {
        "start": 2402,
        "stop": 2411,
        "text": " one"
    },
    {
        "start": 2417,
        "stop": 2424,
        "text": " is"
    },
    {
        "start": 2424,
        "stop": 2456,
        "text": " correct"
    },
    {
        "start": 2456,
        "stop": 2457,
        "text": "?"
    },
    {
        "start": 2471,
        "stop": 2473,
        "text": " I"
    },
    {
        "start": 2473,
        "stop": 2486,
        "text": " don"
    },
    {
        "start": 2486,
        "stop": 2504,
        "text": "'t"
    },
    {
        "start": 2504,
        "stop": 2507,
        "text": " know"
    },
    {
        "start": 2524,
        "stop": 2540,
        "text": "."
    },
    {
        "start": 2540,
        "stop": 2551,
        "text": " It"
    },
    {
        "start": 2551,
        "stop": 2561,
        "text": "'s"
    },
    {
        "start": 2574,
        "stop": 2584,
        "text": " whom"
    },
    {
        "start": 2591,
        "stop": 2608,
        "text": " when"
    },
    {
        "start": 2608,
        "stop": 2619,
        "text": " it"
    },
    {
        "start": 2619,
        "stop": 2630,
        "text": "'s"
    },
    {
        "start": 2630,
        "stop": 2647,
        "text": " the"
    },
    {
        "start": 2647,
        "stop": 2682,
        "text": " object"
    },
    {
        "start": 2682,
        "stop": 2693,
        "text": " of"
    },
    {
        "start": 2693,
        "stop": 2710,
        "text": " the"
    },
    {
        "start": 2710,
        "stop": 2756,
        "text": " sentence"
    },
    {
        "start": 2756,
        "stop": 2773,
        "text": " and"
    },
    {
        "start": 2773,
        "stop": 2790,
        "text": " who"
    },
    {
        "start": 2790,
        "stop": 2813,
        "text": " when"
    },
    {
        "start": 2813,
        "stop": 2824,
        "text": " is"
    },
    {
        "start": 2824,
        "stop": 2841,
        "text": " the"
    },
    {
        "start": 2841,
        "stop": 2879,
        "text": " subject"
    },
    {
        "start": 2881,
        "stop": 2905,
        "text": "."
    },
    {
        "start": 2917,
        "stop": 2942,
        "text": " That"
    },
    {
        "start": 2942,
        "stop": 2964,
        "text": " That"
    },
    {
        "start": 2964,
        "stop": 2993,
        "text": " sounds"
    },
    {
        "start": 2997,
        "stop": 3023,
        "text": " right"
    },
    {
        "start": 3026,
        "stop": 3042,
        "text": "."
    },
    {
        "start": 3042,
        "stop": 3047,
        "text": " Well"
    },
    {
        "start": 3052,
        "stop": 3057,
        "text": ","
    },
    {
        "start": 3057,
        "stop": 3062,
        "text": " it"
    },
    {
        "start": 3062,
        "stop": 3076,
        "text": " sounds"
    },
    {
        "start": 3077,
        "stop": 3089,
        "text": " right"
    },
    {
        "start": 3089,
        "stop": 3094,
        "text": ","
    },
    {
        "start": 3094,
        "stop": 3101,
        "text": " but"
    },
    {
        "start": 3101,
        "stop": 3106,
        "text": " is"
    },
    {
        "start": 3106,
        "stop": 3111,
        "text": " it"
    },
    {
        "start": 3111,
        "stop": 3121,
        "text": "?"
    },
    {
        "start": 3122,
        "stop": 3137,
        "text": " How"
    },
    {
        "start": 3137,
        "stop": 3152,
        "text": " did"
    },
    {
        "start": 3152,
        "stop": 3171,
        "text": " Ryan"
    },
    {
        "start": 3171,
        "stop": 3186,
        "text": " use"
    },
    {
        "start": 3186,
        "stop": 3196,
        "text": " it"
    },
    {
        "start": 3196,
        "stop": 3205,
        "text": ","
    },
    {
        "start": 3205,
        "stop": 3215,
        "text": " as"
    },
    {
        "start": 3215,
        "stop": 3223,
        "text": " an"
    },
    {
        "start": 3227,
        "stop": 3254,
        "text": " object"
    },
    {
        "start": 3254,
        "stop": 3272,
        "text": "?"
    },
    {
        "start": 3272,
        "stop": 3280,
        "text": " As"
    },
    {
        "start": 3280,
        "stop": 3288,
        "text": " an"
    },
    {
        "start": 3288,
        "stop": 3309,
        "text": " object"
    },
    {
        "start": 3309,
        "stop": 3324,
        "text": "."
    },
    {
        "start": 3324,
        "stop": 3353,
        "text": " Ryan"
    },
    {
        "start": 3353,
        "stop": 3382,
        "text": " used"
    },
    {
        "start": 3382,
        "stop": 3396,
        "text": " me"
    },
    {
        "start": 3396,
        "stop": 3410,
        "text": " as"
    },
    {
        "start": 3410,
        "stop": 3424,
        "text": " an"
    },
    {
        "start": 3424,
        "stop": 3466,
        "text": " object"
    },
    {
        "start": 3494,
        "stop": 3494,
        "text": "."
    },
    {
        "start": 3502,
        "stop": 3506,
        "text": " Is"
    },
    {
        "start": 3506,
        "stop": 3516,
        "text": " he"
    },
    {
        "start": 3520,
        "stop": 3549,
        "text": " right"
    },
    {
        "start": 3549,
        "stop": 3580,
        "text": " about"
    },
    {
        "start": 3580,
        "stop": 3605,
        "text": " that"
    },
    {
        "start": 3605,
        "stop": 3609,
        "text": "?"
    },
    {
        "start": 3627,
        "stop": 3640,
        "text": " How"
    },
    {
        "start": 3640,
        "stop": 3654,
        "text": " did"
    },
    {
        "start": 3654,
        "stop": 3663,
        "text": " he"
    },
    {
        "start": 3663,
        "stop": 3677,
        "text": " use"
    },
    {
        "start": 3677,
        "stop": 3686,
        "text": " it"
    },
    {
        "start": 3686,
        "stop": 3709,
        "text": " again"
    },
    {
        "start": 3709,
        "stop": 3726,
        "text": "?"
    },
    {
        "start": 3726,
        "stop": 3735,
        "text": " It"
    },
    {
        "start": 3735,
        "stop": 3749,
        "text": " was"
    },
    {
        "start": 3749,
        "stop": 3775,
        "text": "..."
    },
    {
        "start": 3794,
        "stop": 3814,
        "text": " Ryan"
    },
    {
        "start": 3814,
        "stop": 3847,
        "text": " wanted"
    },
    {
        "start": 3847,
        "stop": 3885,
        "text": " Michael"
    },
    {
        "start": 3885,
        "stop": 3897,
        "text": ","
    },
    {
        "start": 3897,
        "stop": 3914,
        "text": " the"
    },
    {
        "start": 3914,
        "stop": 3952,
        "text": " subject"
    },
    {
        "start": 3952,
        "stop": 3960,
        "text": ","
    },
    {
        "start": 3964,
        "stop": 3975,
        "text": " to"
    },
    {
        "start": 3975,
        "stop": 4014,
        "text": " explain"
    },
    {
        "start": 4014,
        "stop": 4031,
        "text": " the"
    },
    {
        "start": 4031,
        "stop": 4076,
        "text": " computer"
    },
    {
        "start": 4076,
        "stop": 4105,
        "text": " system"
    },
    {
        "start": 4109,
        "stop": 4120,
        "text": ","
    },
    {
        "start": 4120,
        "stop": 4137,
        "text": " the"
    },
    {
        "start": 4137,
        "stop": 4170,
        "text": " object"
    },
    {
        "start": 4170,
        "stop": 4194,
        "text": "."
    },
    {
        "start": 4214,
        "stop": 4227,
        "text": " Thank"
    },
    {
        "start": 4227,
        "stop": 4242,
        "text": " you"
    },
    {
        "start": 4247,
        "stop": 4265,
        "text": "."
    },
    {
        "start": 4265,
        "stop": 4278,
        "text": " To"
    },
    {
        "start": 4278,
        "stop": 4291,
        "text": " wh"
    },
    {
        "start": 4291,
        "stop": 4310,
        "text": "ome"
    },
    {
        "start": 4310,
        "stop": 4329,
        "text": "ver"
    },
    {
        "start": 4329,
        "stop": 4340,
        "text": ","
    },
    {
        "start": 4358,
        "stop": 4388,
        "text": " meaning"
    },
    {
        "start": 4388,
        "stop": 4401,
        "text": " us"
    },
    {
        "start": 4401,
        "stop": 4411,
        "text": ","
    },
    {
        "start": 4418,
        "stop": 4429,
        "text": " the"
    },
    {
        "start": 4433,
        "stop": 4486,
        "text": " indirect"
    },
    {
        "start": 4486,
        "stop": 4524,
        "text": " object"
    },
    {
        "start": 4525,
        "stop": 4546,
        "text": ","
    },
    {
        "start": 4549,
        "stop": 4573,
        "text": " which"
    },
    {
        "start": 4573,
        "stop": 4584,
        "text": " is"
    },
    {
        "start": 4584,
        "stop": 4600,
        "text": " the"
    },
    {
        "start": 4600,
        "stop": 4636,
        "text": " correct"
    },
    {
        "start": 4641,
        "stop": 4661,
        "text": " usage"
    },
    {
        "start": 4668,
        "stop": 4677,
        "text": " of"
    },
    {
        "start": 4677,
        "stop": 4693,
        "text": " the"
    },
    {
        "start": 4693,
        "stop": 4715,
        "text": " word"
    },
    {
        "start": 4715,
        "stop": 4736,
        "text": "."
    }
]

Maybe I have mistake in how I consume the segments
That's how I create the word segments:

core/src/model.rs#L134

arizhih · 2024-06-27T18:41:29Z

Maybe I have mistake in how I consume the segments That's how I create the word segments:

core/src/model.rs#L134

Yes, you get tokens, but you need to get segment text. Try to use this
let text = state.full_get_segment_text_lossy(s).context("failed to get segment")?;

thewh1teagle · 2024-06-27T18:50:22Z

Yes, you get tokens, but you need to get segment text. Try to use this
let text = state.full_get_segment_text_lossy(s).context("failed to get segment")?;

Notice that I said word segments, in general I already use there get_segment_text in the else statement. Do I need to use get_segment_text even in the loop of the num_tokens?

arizhih · 2024-06-27T19:05:36Z

My proposal was to use max_len 1 and split_on_word and I think that with this options each segment will be a single word.

So you don’t need to use tokens at all, only segments.

for s in 0..num_segments {
        let text = state.full_get_segment_text_lossy(s).context("failed to get segment")?;
        let start = state.full_get_segment_t0(s).context("failed to get start timestamp")?;
        let stop = state.full_get_segment_t1(s).context("failed to get end timestamp")?;
            segments.push(Segment { text, start, stop });
}

If this doesn’t help tomorrow I’ll give you example how to create words from tokens.

thewh1teagle · 2024-06-27T19:24:43Z

@arizhih

It worked!
I tried so many options there but didn't thought about this one

Thank you so much :)

thewh1teagle closed this as completed Jun 29, 2024

thewh1teagle mentioned this issue Jul 5, 2024

Incorrect timetstamps ggerganov/whisper.cpp#2271

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

max tokens and split on word params doesn't work #156

max tokens and split on word params doesn't work #156

thewh1teagle commented Jun 1, 2024

arizhih commented Jun 25, 2024

thewh1teagle commented Jun 26, 2024 •

edited

Loading

arizhih commented Jun 26, 2024

thewh1teagle commented Jun 27, 2024

arizhih commented Jun 27, 2024

thewh1teagle commented Jun 27, 2024

arizhih commented Jun 27, 2024

thewh1teagle commented Jun 27, 2024 •

edited

Loading

arizhih commented Jun 27, 2024 •

edited

Loading

thewh1teagle commented Jun 27, 2024

max tokens and split on word params doesn't work #156

max tokens and split on word params doesn't work #156

Comments

thewh1teagle commented Jun 1, 2024

arizhih commented Jun 25, 2024

thewh1teagle commented Jun 26, 2024 • edited Loading

arizhih commented Jun 26, 2024

thewh1teagle commented Jun 27, 2024

arizhih commented Jun 27, 2024

thewh1teagle commented Jun 27, 2024

arizhih commented Jun 27, 2024

thewh1teagle commented Jun 27, 2024 • edited Loading

arizhih commented Jun 27, 2024 • edited Loading

thewh1teagle commented Jun 27, 2024

thewh1teagle commented Jun 26, 2024 •

edited

Loading

thewh1teagle commented Jun 27, 2024 •

edited

Loading

arizhih commented Jun 27, 2024 •

edited

Loading