-
-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
max tokens and split on word params doesn't work #156
Comments
@thewh1teagle Hi, as I can see in whisper.cpp
and later So try to enable |
@arizhih I have another idea. |
By default whisper produce from 1 to N segments with different length. When you set It doesn't affect how it produce sentences at all, just how it returns segments. |
Thanks, so I understand that it's not the right way to produce max words per sentence. However, when using token timestamps it produce incorrect tokens, or at least it looks incorrect since it count symbols as single tokens. regular[
{
"start": 0,
"stop": 520,
"text": " It's whoever, not whomever. That's whomever. No whomever is never actually right."
},
{
"start": 520,
"stop": 934,
"text": " Well sometimes it's right. Michael is right. It's a made-up word used to trick"
},
{
"start": 934,
"stop": 1418,
"text": " students. No actually whomever is the formal version of the word. Obviously"
},
{
"start": 1418,
"stop": 1792,
"text": " it's a real word, but I don't know when to use it correctly. Not a native speaker."
},
{
"start": 1792,
"stop": 2200,
"text": " I know what's right, but I'm not gonna say because you're all jerks who didn't"
},
{
"start": 2200,
"stop": 2540,
"text": " come see my band last night. Do you really know which one is correct? I don't know."
},
{
"start": 2540,
"stop": 2942,
"text": " It's whom when it's the object of the sentence and who when is the subject. That"
},
{
"start": 2942,
"stop": 4942,
"text": " sounds right. Well it sounds right but is it? How did Ryan use it as an object? As an object. Ryan used me as an object. How did he use it again? It was Ryan wanted Michael the subject to explain the computer system, the object, to whomever, meaning us, the indirect object, which is the correct usage of the word."
}
] token timestamps[
{
"start": 0,
"stop": 14,
"text": " It"
},
{
"start": 14,
"stop": 28,
"text": "'s"
},
{
"start": 28,
"stop": 79,
"text": " whoever"
},
{
"start": 93,
"stop": 93,
"text": ","
},
{
"start": 94,
"stop": 115,
"text": " not"
},
{
"start": 122,
"stop": 129,
"text": " wh"
},
{
"start": 129,
"stop": 147,
"text": "ome"
},
{
"start": 152,
"stop": 173,
"text": "ver"
},
{
"start": 173,
"stop": 200,
"text": "."
},
{
"start": 200,
"stop": 223,
"text": " That"
},
{
"start": 223,
"stop": 233,
"text": "'s"
},
{
"start": 234,
"stop": 245,
"text": " wh"
},
{
"start": 245,
"stop": 262,
"text": "ome"
},
{
"start": 262,
"stop": 278,
"text": "ver"
},
{
"start": 279,
"stop": 298,
"text": "."
},
{
"start": 304,
"stop": 313,
"text": " No"
},
{
"start": 313,
"stop": 326,
"text": " wh"
},
{
"start": 326,
"stop": 345,
"text": "ome"
},
{
"start": 345,
"stop": 364,
"text": "ver"
},
{
"start": 364,
"stop": 365,
"text": " is"
},
{
"start": 380,
"stop": 410,
"text": " never"
},
{
"start": 410,
"stop": 463,
"text": " actually"
},
{
"start": 463,
"stop": 496,
"text": " right"
},
{
"start": 496,
"stop": 520,
"text": "."
},
{
"start": 520,
"stop": 544,
"text": " Well"
},
{
"start": 544,
"stop": 597,
"text": " sometimes"
},
{
"start": 597,
"stop": 609,
"text": " it"
},
{
"start": 609,
"stop": 615,
"text": "'s"
},
{
"start": 623,
"stop": 649,
"text": " right"
},
{
"start": 649,
"stop": 658,
"text": "."
},
{
"start": 667,
"stop": 706,
"text": " Michael"
},
{
"start": 707,
"stop": 718,
"text": " is"
},
{
"start": 718,
"stop": 741,
"text": " right"
},
{
"start": 752,
"stop": 765,
"text": "."
},
{
"start": 765,
"stop": 777,
"text": " It"
},
{
"start": 777,
"stop": 788,
"text": "'s"
},
{
"start": 788,
"stop": 794,
"text": " a"
},
{
"start": 794,
"stop": 818,
"text": " made"
},
{
"start": 818,
"stop": 819,
"text": "-"
},
{
"start": 831,
"stop": 834,
"text": "up"
},
{
"start": 834,
"stop": 855,
"text": " word"
},
{
"start": 858,
"stop": 879,
"text": " used"
},
{
"start": 886,
"stop": 894,
"text": " to"
},
{
"start": 894,
"stop": 931,
"text": " trick"
},
{
"start": 936,
"stop": 990,
"text": " students"
},
{
"start": 990,
"stop": 1008,
"text": "."
},
{
"start": 1010,
"stop": 1012,
"text": " No"
},
{
"start": 1037,
"stop": 1079,
"text": " actually"
},
{
"start": 1095,
"stop": 1095,
"text": " wh"
},
{
"start": 1095,
"stop": 1116,
"text": "ome"
},
{
"start": 1132,
"stop": 1137,
"text": "ver"
},
{
"start": 1137,
"stop": 1151,
"text": " is"
},
{
"start": 1151,
"stop": 1172,
"text": " the"
},
{
"start": 1172,
"stop": 1214,
"text": " formal"
},
{
"start": 1214,
"stop": 1263,
"text": " version"
},
{
"start": 1263,
"stop": 1277,
"text": " of"
},
{
"start": 1277,
"stop": 1298,
"text": " the"
},
{
"start": 1298,
"stop": 1326,
"text": " word"
},
{
"start": 1326,
"stop": 1347,
"text": "."
},
{
"start": 1347,
"stop": 1417,
"text": " Obviously"
},
{
"start": 1418,
"stop": 1428,
"text": " it"
},
{
"start": 1428,
"stop": 1435,
"text": "'s"
},
{
"start": 1440,
"stop": 1443,
"text": " a"
},
{
"start": 1443,
"stop": 1464,
"text": " real"
},
{
"start": 1464,
"stop": 1485,
"text": " word"
},
{
"start": 1485,
"stop": 1494,
"text": ","
},
{
"start": 1494,
"stop": 1505,
"text": " but"
},
{
"start": 1509,
"stop": 1512,
"text": " I"
},
{
"start": 1522,
"stop": 1530,
"text": " don"
},
{
"start": 1530,
"stop": 1538,
"text": "'t"
},
{
"start": 1547,
"stop": 1561,
"text": " know"
},
{
"start": 1561,
"stop": 1582,
"text": " when"
},
{
"start": 1582,
"stop": 1592,
"text": " to"
},
{
"start": 1592,
"stop": 1607,
"text": " use"
},
{
"start": 1607,
"stop": 1617,
"text": " it"
},
{
"start": 1617,
"stop": 1664,
"text": " correctly"
},
{
"start": 1664,
"stop": 1678,
"text": "."
},
{
"start": 1678,
"stop": 1694,
"text": " Not"
},
{
"start": 1694,
"stop": 1698,
"text": " a"
},
{
"start": 1699,
"stop": 1730,
"text": " native"
},
{
"start": 1730,
"stop": 1761,
"text": " speaker"
},
{
"start": 1767,
"stop": 1792,
"text": "."
},
{
"start": 1792,
"stop": 1798,
"text": " I"
},
{
"start": 1800,
"stop": 1823,
"text": " know"
},
{
"start": 1823,
"stop": 1848,
"text": " what"
},
{
"start": 1848,
"stop": 1860,
"text": "'s"
},
{
"start": 1860,
"stop": 1881,
"text": " right"
},
{
"start": 1889,
"stop": 1903,
"text": ","
},
{
"start": 1904,
"stop": 1910,
"text": " but"
},
{
"start": 1923,
"stop": 1927,
"text": " I"
},
{
"start": 1927,
"stop": 1939,
"text": "'m"
},
{
"start": 1939,
"stop": 1957,
"text": " not"
},
{
"start": 1957,
"stop": 1988,
"text": " gonna"
},
{
"start": 1988,
"stop": 2005,
"text": " say"
},
{
"start": 2005,
"stop": 2023,
"text": " because"
},
{
"start": 2050,
"stop": 2067,
"text": " you"
},
{
"start": 2067,
"stop": 2085,
"text": "'re"
},
{
"start": 2085,
"stop": 2103,
"text": " all"
},
{
"start": 2103,
"stop": 2120,
"text": " jer"
},
{
"start": 2125,
"stop": 2133,
"text": "ks"
},
{
"start": 2133,
"stop": 2148,
"text": " who"
},
{
"start": 2157,
"stop": 2175,
"text": " didn"
},
{
"start": 2177,
"stop": 2199,
"text": "'t"
},
{
"start": 2206,
"stop": 2218,
"text": " come"
},
{
"start": 2218,
"stop": 2231,
"text": " see"
},
{
"start": 2231,
"stop": 2240,
"text": " my"
},
{
"start": 2240,
"stop": 2258,
"text": " band"
},
{
"start": 2258,
"stop": 2276,
"text": " last"
},
{
"start": 2276,
"stop": 2293,
"text": " night"
},
{
"start": 2301,
"stop": 2312,
"text": "."
},
{
"start": 2312,
"stop": 2321,
"text": " Do"
},
{
"start": 2321,
"stop": 2334,
"text": " you"
},
{
"start": 2334,
"stop": 2361,
"text": " really"
},
{
"start": 2361,
"stop": 2379,
"text": " know"
},
{
"start": 2379,
"stop": 2402,
"text": " which"
},
{
"start": 2402,
"stop": 2411,
"text": " one"
},
{
"start": 2417,
"stop": 2424,
"text": " is"
},
{
"start": 2424,
"stop": 2456,
"text": " correct"
},
{
"start": 2456,
"stop": 2457,
"text": "?"
},
{
"start": 2471,
"stop": 2473,
"text": " I"
},
{
"start": 2473,
"stop": 2486,
"text": " don"
},
{
"start": 2486,
"stop": 2504,
"text": "'t"
},
{
"start": 2504,
"stop": 2507,
"text": " know"
},
{
"start": 2524,
"stop": 2540,
"text": "."
},
{
"start": 2540,
"stop": 2551,
"text": " It"
},
{
"start": 2551,
"stop": 2561,
"text": "'s"
},
{
"start": 2574,
"stop": 2584,
"text": " whom"
},
{
"start": 2591,
"stop": 2608,
"text": " when"
},
{
"start": 2608,
"stop": 2619,
"text": " it"
},
{
"start": 2619,
"stop": 2630,
"text": "'s"
},
{
"start": 2630,
"stop": 2647,
"text": " the"
},
{
"start": 2647,
"stop": 2682,
"text": " object"
},
{
"start": 2682,
"stop": 2693,
"text": " of"
},
{
"start": 2693,
"stop": 2710,
"text": " the"
},
{
"start": 2710,
"stop": 2756,
"text": " sentence"
},
{
"start": 2756,
"stop": 2773,
"text": " and"
},
{
"start": 2773,
"stop": 2790,
"text": " who"
},
{
"start": 2790,
"stop": 2813,
"text": " when"
},
{
"start": 2813,
"stop": 2824,
"text": " is"
},
{
"start": 2824,
"stop": 2841,
"text": " the"
},
{
"start": 2841,
"stop": 2879,
"text": " subject"
},
{
"start": 2881,
"stop": 2905,
"text": "."
},
{
"start": 2917,
"stop": 2942,
"text": " That"
},
{
"start": 2942,
"stop": 2969,
"text": " sounds"
},
{
"start": 2969,
"stop": 2992,
"text": " right"
},
{
"start": 2997,
"stop": 3005,
"text": "."
},
{
"start": 3005,
"stop": 3016,
"text": " Well"
},
{
"start": 3026,
"stop": 3032,
"text": " it"
},
{
"start": 3032,
"stop": 3059,
"text": " sounds"
},
{
"start": 3059,
"stop": 3082,
"text": " right"
},
{
"start": 3082,
"stop": 3095,
"text": " but"
},
{
"start": 3095,
"stop": 3103,
"text": " is"
},
{
"start": 3104,
"stop": 3113,
"text": " it"
},
{
"start": 3113,
"stop": 3126,
"text": "?"
},
{
"start": 3126,
"stop": 3139,
"text": " How"
},
{
"start": 3139,
"stop": 3152,
"text": " did"
},
{
"start": 3152,
"stop": 3170,
"text": " Ryan"
},
{
"start": 3170,
"stop": 3183,
"text": " use"
},
{
"start": 3183,
"stop": 3192,
"text": " it"
},
{
"start": 3192,
"stop": 3201,
"text": " as"
},
{
"start": 3201,
"stop": 3210,
"text": " an"
},
{
"start": 3210,
"stop": 3237,
"text": " object"
},
{
"start": 3237,
"stop": 3250,
"text": "?"
},
{
"start": 3250,
"stop": 3256,
"text": " As"
},
{
"start": 3260,
"stop": 3268,
"text": " an"
},
{
"start": 3268,
"stop": 3295,
"text": " object"
},
{
"start": 3295,
"stop": 3320,
"text": "."
},
{
"start": 3335,
"stop": 3358,
"text": " Ryan"
},
{
"start": 3358,
"stop": 3392,
"text": " used"
},
{
"start": 3392,
"stop": 3409,
"text": " me"
},
{
"start": 3409,
"stop": 3426,
"text": " as"
},
{
"start": 3426,
"stop": 3442,
"text": " an"
},
{
"start": 3442,
"stop": 3464,
"text": " object"
},
{
"start": 3503,
"stop": 3521,
"text": "."
},
{
"start": 3521,
"stop": 3547,
"text": " How"
},
{
"start": 3547,
"stop": 3566,
"text": " did"
},
{
"start": 3573,
"stop": 3587,
"text": " he"
},
{
"start": 3598,
"stop": 3614,
"text": " use"
},
{
"start": 3627,
"stop": 3633,
"text": " it"
},
{
"start": 3633,
"stop": 3675,
"text": " again"
},
{
"start": 3676,
"stop": 3708,
"text": "?"
},
{
"start": 3708,
"stop": 3729,
"text": " It"
},
{
"start": 3730,
"stop": 3763,
"text": " was"
},
{
"start": 3763,
"stop": 3808,
"text": " Ryan"
},
{
"start": 3808,
"stop": 3836,
"text": " wanted"
},
{
"start": 3840,
"stop": 3878,
"text": " Michael"
},
{
"start": 3878,
"stop": 3896,
"text": " the"
},
{
"start": 3896,
"stop": 3952,
"text": " subject"
},
{
"start": 3964,
"stop": 3976,
"text": " to"
},
{
"start": 3976,
"stop": 4036,
"text": " explain"
},
{
"start": 4036,
"stop": 4045,
"text": " the"
},
{
"start": 4051,
"stop": 4085,
"text": " computer"
},
{
"start": 4085,
"stop": 4105,
"text": " system"
},
{
"start": 4112,
"stop": 4121,
"text": ","
},
{
"start": 4121,
"stop": 4136,
"text": " the"
},
{
"start": 4136,
"stop": 4182,
"text": " object"
},
{
"start": 4188,
"stop": 4202,
"text": ","
},
{
"start": 4214,
"stop": 4218,
"text": " to"
},
{
"start": 4218,
"stop": 4231,
"text": " wh"
},
{
"start": 4241,
"stop": 4259,
"text": "ome"
},
{
"start": 4259,
"stop": 4281,
"text": "ver"
},
{
"start": 4289,
"stop": 4300,
"text": ","
},
{
"start": 4300,
"stop": 4359,
"text": " meaning"
},
{
"start": 4359,
"stop": 4375,
"text": " us"
},
{
"start": 4375,
"stop": 4391,
"text": ","
},
{
"start": 4391,
"stop": 4424,
"text": " the"
},
{
"start": 4424,
"stop": 4503,
"text": " indirect"
},
{
"start": 4506,
"stop": 4568,
"text": " object"
},
{
"start": 4568,
"stop": 4584,
"text": ","
},
{
"start": 4591,
"stop": 4636,
"text": " which"
},
{
"start": 4641,
"stop": 4659,
"text": " is"
},
{
"start": 4659,
"stop": 4690,
"text": " the"
},
{
"start": 4690,
"stop": 4755,
"text": " correct"
},
{
"start": 4755,
"stop": 4755,
"text": " usage"
},
{
"start": 4755,
"stop": 4755,
"text": " of"
},
{
"start": 4755,
"stop": 4755,
"text": " the"
},
{
"start": 4755,
"stop": 4755,
"text": " word"
},
{
"start": 4755,
"stop": 4755,
"text": "."
}
] Created with Vibe app. |
It’s because token is not a word. Whisper has about 54000 tokens and all words is built from this tokens. Maybe if you set |
Same params.set_token_timestamps(true);
params.set_split_on_word(true);
params.set_max_len(1); transcript.json[
{
"start": 0,
"stop": 14,
"text": " It"
},
{
"start": 14,
"stop": 28,
"text": "'s"
},
{
"start": 28,
"stop": 79,
"text": " whoever"
},
{
"start": 93,
"stop": 93,
"text": ","
},
{
"start": 94,
"stop": 115,
"text": " not"
},
{
"start": 122,
"stop": 129,
"text": " wh"
},
{
"start": 129,
"stop": 147,
"text": "ome"
},
{
"start": 152,
"stop": 173,
"text": "ver"
},
{
"start": 173,
"stop": 200,
"text": "."
},
{
"start": 200,
"stop": 223,
"text": " That"
},
{
"start": 223,
"stop": 233,
"text": "'s"
},
{
"start": 234,
"stop": 245,
"text": " wh"
},
{
"start": 245,
"stop": 262,
"text": "ome"
},
{
"start": 262,
"stop": 278,
"text": "ver"
},
{
"start": 279,
"stop": 298,
"text": "."
},
{
"start": 304,
"stop": 313,
"text": " No"
},
{
"start": 313,
"stop": 326,
"text": " wh"
},
{
"start": 326,
"stop": 345,
"text": "ome"
},
{
"start": 345,
"stop": 364,
"text": "ver"
},
{
"start": 364,
"stop": 365,
"text": " is"
},
{
"start": 380,
"stop": 410,
"text": " never"
},
{
"start": 410,
"stop": 463,
"text": " actually"
},
{
"start": 463,
"stop": 496,
"text": " right"
},
{
"start": 496,
"stop": 520,
"text": "."
},
{
"start": 520,
"stop": 544,
"text": " Well"
},
{
"start": 544,
"stop": 597,
"text": " sometimes"
},
{
"start": 597,
"stop": 609,
"text": " it"
},
{
"start": 609,
"stop": 615,
"text": "'s"
},
{
"start": 623,
"stop": 649,
"text": " right"
},
{
"start": 649,
"stop": 658,
"text": "."
},
{
"start": 667,
"stop": 706,
"text": " Michael"
},
{
"start": 707,
"stop": 718,
"text": " is"
},
{
"start": 718,
"stop": 741,
"text": " right"
},
{
"start": 752,
"stop": 765,
"text": "."
},
{
"start": 765,
"stop": 777,
"text": " It"
},
{
"start": 777,
"stop": 788,
"text": "'s"
},
{
"start": 788,
"stop": 794,
"text": " a"
},
{
"start": 794,
"stop": 818,
"text": " made"
},
{
"start": 818,
"stop": 819,
"text": "-"
},
{
"start": 831,
"stop": 834,
"text": "up"
},
{
"start": 834,
"stop": 855,
"text": " word"
},
{
"start": 858,
"stop": 879,
"text": " used"
},
{
"start": 886,
"stop": 894,
"text": " to"
},
{
"start": 894,
"stop": 931,
"text": " trick"
},
{
"start": 936,
"stop": 990,
"text": " students"
},
{
"start": 990,
"stop": 1008,
"text": "."
},
{
"start": 1010,
"stop": 1012,
"text": " No"
},
{
"start": 1037,
"stop": 1079,
"text": " actually"
},
{
"start": 1095,
"stop": 1095,
"text": " wh"
},
{
"start": 1095,
"stop": 1116,
"text": "ome"
},
{
"start": 1132,
"stop": 1137,
"text": "ver"
},
{
"start": 1137,
"stop": 1151,
"text": " is"
},
{
"start": 1151,
"stop": 1172,
"text": " the"
},
{
"start": 1172,
"stop": 1214,
"text": " formal"
},
{
"start": 1214,
"stop": 1263,
"text": " version"
},
{
"start": 1263,
"stop": 1277,
"text": " of"
},
{
"start": 1277,
"stop": 1298,
"text": " the"
},
{
"start": 1298,
"stop": 1326,
"text": " word"
},
{
"start": 1326,
"stop": 1347,
"text": "."
},
{
"start": 1347,
"stop": 1417,
"text": " Obviously"
},
{
"start": 1418,
"stop": 1428,
"text": " it"
},
{
"start": 1428,
"stop": 1435,
"text": "'s"
},
{
"start": 1440,
"stop": 1443,
"text": " a"
},
{
"start": 1443,
"stop": 1464,
"text": " real"
},
{
"start": 1464,
"stop": 1485,
"text": " word"
},
{
"start": 1485,
"stop": 1494,
"text": ","
},
{
"start": 1494,
"stop": 1505,
"text": " but"
},
{
"start": 1509,
"stop": 1512,
"text": " I"
},
{
"start": 1522,
"stop": 1530,
"text": " don"
},
{
"start": 1530,
"stop": 1538,
"text": "'t"
},
{
"start": 1547,
"stop": 1561,
"text": " know"
},
{
"start": 1561,
"stop": 1582,
"text": " when"
},
{
"start": 1582,
"stop": 1592,
"text": " to"
},
{
"start": 1592,
"stop": 1607,
"text": " use"
},
{
"start": 1607,
"stop": 1617,
"text": " it"
},
{
"start": 1617,
"stop": 1664,
"text": " correctly"
},
{
"start": 1664,
"stop": 1678,
"text": "."
},
{
"start": 1678,
"stop": 1694,
"text": " Not"
},
{
"start": 1694,
"stop": 1698,
"text": " a"
},
{
"start": 1699,
"stop": 1730,
"text": " native"
},
{
"start": 1730,
"stop": 1761,
"text": " speaker"
},
{
"start": 1767,
"stop": 1792,
"text": "."
},
{
"start": 1792,
"stop": 1798,
"text": " I"
},
{
"start": 1800,
"stop": 1823,
"text": " know"
},
{
"start": 1823,
"stop": 1848,
"text": " what"
},
{
"start": 1848,
"stop": 1860,
"text": "'s"
},
{
"start": 1860,
"stop": 1881,
"text": " right"
},
{
"start": 1889,
"stop": 1903,
"text": ","
},
{
"start": 1904,
"stop": 1910,
"text": " but"
},
{
"start": 1923,
"stop": 1927,
"text": " I"
},
{
"start": 1927,
"stop": 1939,
"text": "'m"
},
{
"start": 1939,
"stop": 1957,
"text": " not"
},
{
"start": 1957,
"stop": 1988,
"text": " gonna"
},
{
"start": 1988,
"stop": 2005,
"text": " say"
},
{
"start": 2005,
"stop": 2023,
"text": " because"
},
{
"start": 2050,
"stop": 2067,
"text": " you"
},
{
"start": 2067,
"stop": 2085,
"text": "'re"
},
{
"start": 2085,
"stop": 2103,
"text": " all"
},
{
"start": 2103,
"stop": 2120,
"text": " jer"
},
{
"start": 2125,
"stop": 2133,
"text": "ks"
},
{
"start": 2133,
"stop": 2148,
"text": " who"
},
{
"start": 2157,
"stop": 2175,
"text": " didn"
},
{
"start": 2177,
"stop": 2199,
"text": "'t"
},
{
"start": 2206,
"stop": 2218,
"text": " come"
},
{
"start": 2218,
"stop": 2231,
"text": " see"
},
{
"start": 2231,
"stop": 2240,
"text": " my"
},
{
"start": 2240,
"stop": 2258,
"text": " band"
},
{
"start": 2258,
"stop": 2276,
"text": " last"
},
{
"start": 2276,
"stop": 2293,
"text": " night"
},
{
"start": 2301,
"stop": 2312,
"text": "."
},
{
"start": 2312,
"stop": 2321,
"text": " Do"
},
{
"start": 2321,
"stop": 2334,
"text": " you"
},
{
"start": 2334,
"stop": 2361,
"text": " really"
},
{
"start": 2361,
"stop": 2379,
"text": " know"
},
{
"start": 2379,
"stop": 2402,
"text": " which"
},
{
"start": 2402,
"stop": 2411,
"text": " one"
},
{
"start": 2417,
"stop": 2424,
"text": " is"
},
{
"start": 2424,
"stop": 2456,
"text": " correct"
},
{
"start": 2456,
"stop": 2457,
"text": "?"
},
{
"start": 2471,
"stop": 2473,
"text": " I"
},
{
"start": 2473,
"stop": 2486,
"text": " don"
},
{
"start": 2486,
"stop": 2504,
"text": "'t"
},
{
"start": 2504,
"stop": 2507,
"text": " know"
},
{
"start": 2524,
"stop": 2540,
"text": "."
},
{
"start": 2540,
"stop": 2551,
"text": " It"
},
{
"start": 2551,
"stop": 2561,
"text": "'s"
},
{
"start": 2574,
"stop": 2584,
"text": " whom"
},
{
"start": 2591,
"stop": 2608,
"text": " when"
},
{
"start": 2608,
"stop": 2619,
"text": " it"
},
{
"start": 2619,
"stop": 2630,
"text": "'s"
},
{
"start": 2630,
"stop": 2647,
"text": " the"
},
{
"start": 2647,
"stop": 2682,
"text": " object"
},
{
"start": 2682,
"stop": 2693,
"text": " of"
},
{
"start": 2693,
"stop": 2710,
"text": " the"
},
{
"start": 2710,
"stop": 2756,
"text": " sentence"
},
{
"start": 2756,
"stop": 2773,
"text": " and"
},
{
"start": 2773,
"stop": 2790,
"text": " who"
},
{
"start": 2790,
"stop": 2813,
"text": " when"
},
{
"start": 2813,
"stop": 2824,
"text": " is"
},
{
"start": 2824,
"stop": 2841,
"text": " the"
},
{
"start": 2841,
"stop": 2879,
"text": " subject"
},
{
"start": 2881,
"stop": 2905,
"text": "."
},
{
"start": 2917,
"stop": 2942,
"text": " That"
},
{
"start": 2942,
"stop": 2964,
"text": " That"
},
{
"start": 2964,
"stop": 2993,
"text": " sounds"
},
{
"start": 2997,
"stop": 3023,
"text": " right"
},
{
"start": 3026,
"stop": 3042,
"text": "."
},
{
"start": 3042,
"stop": 3047,
"text": " Well"
},
{
"start": 3052,
"stop": 3057,
"text": ","
},
{
"start": 3057,
"stop": 3062,
"text": " it"
},
{
"start": 3062,
"stop": 3076,
"text": " sounds"
},
{
"start": 3077,
"stop": 3089,
"text": " right"
},
{
"start": 3089,
"stop": 3094,
"text": ","
},
{
"start": 3094,
"stop": 3101,
"text": " but"
},
{
"start": 3101,
"stop": 3106,
"text": " is"
},
{
"start": 3106,
"stop": 3111,
"text": " it"
},
{
"start": 3111,
"stop": 3121,
"text": "?"
},
{
"start": 3122,
"stop": 3137,
"text": " How"
},
{
"start": 3137,
"stop": 3152,
"text": " did"
},
{
"start": 3152,
"stop": 3171,
"text": " Ryan"
},
{
"start": 3171,
"stop": 3186,
"text": " use"
},
{
"start": 3186,
"stop": 3196,
"text": " it"
},
{
"start": 3196,
"stop": 3205,
"text": ","
},
{
"start": 3205,
"stop": 3215,
"text": " as"
},
{
"start": 3215,
"stop": 3223,
"text": " an"
},
{
"start": 3227,
"stop": 3254,
"text": " object"
},
{
"start": 3254,
"stop": 3272,
"text": "?"
},
{
"start": 3272,
"stop": 3280,
"text": " As"
},
{
"start": 3280,
"stop": 3288,
"text": " an"
},
{
"start": 3288,
"stop": 3309,
"text": " object"
},
{
"start": 3309,
"stop": 3324,
"text": "."
},
{
"start": 3324,
"stop": 3353,
"text": " Ryan"
},
{
"start": 3353,
"stop": 3382,
"text": " used"
},
{
"start": 3382,
"stop": 3396,
"text": " me"
},
{
"start": 3396,
"stop": 3410,
"text": " as"
},
{
"start": 3410,
"stop": 3424,
"text": " an"
},
{
"start": 3424,
"stop": 3466,
"text": " object"
},
{
"start": 3494,
"stop": 3494,
"text": "."
},
{
"start": 3502,
"stop": 3506,
"text": " Is"
},
{
"start": 3506,
"stop": 3516,
"text": " he"
},
{
"start": 3520,
"stop": 3549,
"text": " right"
},
{
"start": 3549,
"stop": 3580,
"text": " about"
},
{
"start": 3580,
"stop": 3605,
"text": " that"
},
{
"start": 3605,
"stop": 3609,
"text": "?"
},
{
"start": 3627,
"stop": 3640,
"text": " How"
},
{
"start": 3640,
"stop": 3654,
"text": " did"
},
{
"start": 3654,
"stop": 3663,
"text": " he"
},
{
"start": 3663,
"stop": 3677,
"text": " use"
},
{
"start": 3677,
"stop": 3686,
"text": " it"
},
{
"start": 3686,
"stop": 3709,
"text": " again"
},
{
"start": 3709,
"stop": 3726,
"text": "?"
},
{
"start": 3726,
"stop": 3735,
"text": " It"
},
{
"start": 3735,
"stop": 3749,
"text": " was"
},
{
"start": 3749,
"stop": 3775,
"text": "..."
},
{
"start": 3794,
"stop": 3814,
"text": " Ryan"
},
{
"start": 3814,
"stop": 3847,
"text": " wanted"
},
{
"start": 3847,
"stop": 3885,
"text": " Michael"
},
{
"start": 3885,
"stop": 3897,
"text": ","
},
{
"start": 3897,
"stop": 3914,
"text": " the"
},
{
"start": 3914,
"stop": 3952,
"text": " subject"
},
{
"start": 3952,
"stop": 3960,
"text": ","
},
{
"start": 3964,
"stop": 3975,
"text": " to"
},
{
"start": 3975,
"stop": 4014,
"text": " explain"
},
{
"start": 4014,
"stop": 4031,
"text": " the"
},
{
"start": 4031,
"stop": 4076,
"text": " computer"
},
{
"start": 4076,
"stop": 4105,
"text": " system"
},
{
"start": 4109,
"stop": 4120,
"text": ","
},
{
"start": 4120,
"stop": 4137,
"text": " the"
},
{
"start": 4137,
"stop": 4170,
"text": " object"
},
{
"start": 4170,
"stop": 4194,
"text": "."
},
{
"start": 4214,
"stop": 4227,
"text": " Thank"
},
{
"start": 4227,
"stop": 4242,
"text": " you"
},
{
"start": 4247,
"stop": 4265,
"text": "."
},
{
"start": 4265,
"stop": 4278,
"text": " To"
},
{
"start": 4278,
"stop": 4291,
"text": " wh"
},
{
"start": 4291,
"stop": 4310,
"text": "ome"
},
{
"start": 4310,
"stop": 4329,
"text": "ver"
},
{
"start": 4329,
"stop": 4340,
"text": ","
},
{
"start": 4358,
"stop": 4388,
"text": " meaning"
},
{
"start": 4388,
"stop": 4401,
"text": " us"
},
{
"start": 4401,
"stop": 4411,
"text": ","
},
{
"start": 4418,
"stop": 4429,
"text": " the"
},
{
"start": 4433,
"stop": 4486,
"text": " indirect"
},
{
"start": 4486,
"stop": 4524,
"text": " object"
},
{
"start": 4525,
"stop": 4546,
"text": ","
},
{
"start": 4549,
"stop": 4573,
"text": " which"
},
{
"start": 4573,
"stop": 4584,
"text": " is"
},
{
"start": 4584,
"stop": 4600,
"text": " the"
},
{
"start": 4600,
"stop": 4636,
"text": " correct"
},
{
"start": 4641,
"stop": 4661,
"text": " usage"
},
{
"start": 4668,
"stop": 4677,
"text": " of"
},
{
"start": 4677,
"stop": 4693,
"text": " the"
},
{
"start": 4693,
"stop": 4715,
"text": " word"
},
{
"start": 4715,
"stop": 4736,
"text": "."
}
] Maybe I have mistake in how I consume the segments |
Yes, you get tokens, but you need to get segment text. Try to use this |
Notice that I said word segments, in general I already use there get_segment_text in the else statement. Do I need to use get_segment_text even in the loop of the num_tokens? |
My proposal was to use So you don’t need to use tokens at all, only segments. for s in 0..num_segments {
let text = state.full_get_segment_text_lossy(s).context("failed to get segment")?;
let start = state.full_get_segment_t0(s).context("failed to get start timestamp")?;
let stop = state.full_get_segment_t1(s).context("failed to get end timestamp")?;
segments.push(Segment { text, start, stop });
} If this doesn’t help tomorrow I’ll give you example how to create words from tokens. |
It worked! Thank you so much :) |
I'm trying to enable
set_max_tokens
along withset_split_on_word
to provide a way to set max word per sentence but when I setsplit_on_word
totrue
andmax_tokens
to anything more than 0 then the transcription happens very fast but with gibberish and only 2 sentence for long audioIn original
whisper.cpp
cli
program it works as expected with max length per line.The text was updated successfully, but these errors were encountered: