Fix language detection #133

jkrukowski · 2024-05-02T15:38:47Z

While working on audio chunking I noticed that sometimes language detection is off. It can happen that language is detected as <|nocaptions|> which might result in a whole 30s segment being discarded.

This PR fixes that by defaulting to "en" when the language detection is confused.

jkrukowski · 2024-05-02T15:39:50Z

Sources/WhisperKit/Core/TextDecoder.swift

+        return DecodingResult(
+            language: detectedLanguage,
+            languageProbs: [:],
+            tokens: [],
+            tokenLogProbs: [],
+            text: "",
+            avgLogProb: 0.0,
+            noSpeechProb: 0.0,
+            temperature: 0.0,
+            compressionRatio: 0.0,
+            cache: nil,
+            timings: timings,
+            fallback: nil
+        )


we could as well change the interface of this function and return the detected (optional) language code instead of DecodingResult

This seems reasonable, can it also return the languageProbs here?

This seems reasonable, can it also return the languageProbs here?

so detected language together with languageProbs instead of DecodingResult? will do!

Might be better to keep an empty DecodingResult because it needs timings as well

Just looked, I dont think the timings are being accounted for with this at all, would be nice to merge these in this PR too while you're in the code

WhisperKit/Sources/WhisperKit/Core/TranscribeTask.swift

Lines 273 to 319 in c20943d

if textDecoder.isModelMultilingual, options.language == nil, options.detectLanguage {

let languageDecodingResult: DecodingResult? = try? await textDecoder.detectLanguage(

from: encoderOutput,

using: decoderInputs,

sampler: tokenSampler,

options: options,

temperature: temp

)

// Update the language decoding options

currentDecodingOptions.language = languageDecodingResult?.language

detectedLanguage = languageDecodingResult?.language

// Update prompt and KV Cache if needed

if options.usePrefillPrompt {

decoderInputs = try await textDecoder.prefillDecoderInputs(decoderInputs, withOptions: currentDecodingOptions)

}

Logging.debug("Prefill prompt updated to: \(decoderInputs.initialPrompt.map { tokenizer.convertIdToToken($0) ?? "" })")

}

decodingResult = try await textDecoder.decodeText(

from: encoderOutput,

using: decoderInputs,

sampler: tokenSampler,

options: currentDecodingOptions,

callback: callback

)

// Use the predicted language if it was not detected ahead of time

if detectedLanguage == nil {

detectedLanguage = decodingResult?.language

}

// Update timings from the decoder main loop

if let decodingTimings = decodingResult?.timings {

if timings.firstTokenTime == 0 {

timings.firstTokenTime = decodingTimings.firstTokenTime

}

timings.decodingPredictions += decodingTimings.decodingPredictions

timings.totalDecodingLoops += decodingTimings.totalDecodingLoops

timings.decodingNonPrediction += decodingTimings.decodingNonPrediction

timings.decodingFiltering += decodingTimings.decodingFiltering

timings.decodingSampling += decodingTimings.decodingSampling

timings.decodingKvCaching += decodingTimings.decodingKvCaching

timings.totalKVUpdateRuns += decodingTimings.totalKVUpdateRuns

}

ok, done, please take look

ZachNagengast

Nice, this look great - well done 💯

ZachNagengast · 2024-05-04T18:26:03Z

Sources/WhisperKit/Core/TextDecoder.swift

+        } else {
+            detectedLanguage = Constants.defaultLanguageCode
+            Logging.error("Detected language \(sampledLanguage) is not supported, defaulting to \(Constants.defaultLanguageCode)")


Does this error happen often or just a precaution?

ZachNagengast · 2024-05-04T18:26:58Z

Sources/WhisperKit/Core/Utils.swift

+    func trimmingSpecialTokenCharacters() -> String {
+        trimmingCharacters(in: Constants.specialTokenCharacters)
+    }


Fix language detection

fafebd5

jkrukowski commented May 2, 2024

View reviewed changes

jkrukowski added 3 commits May 2, 2024 17:57

fix

8d50098

review changes

f5d65aa

fix

7c1f8fa

ZachNagengast approved these changes May 4, 2024

View reviewed changes

ZachNagengast merged commit d6f50da into argmaxinc:main May 4, 2024
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix language detection #133

Fix language detection #133

jkrukowski commented May 2, 2024 •

edited

Loading

jkrukowski May 2, 2024

ZachNagengast May 2, 2024

jkrukowski May 2, 2024

ZachNagengast May 3, 2024

ZachNagengast May 3, 2024

jkrukowski May 3, 2024

ZachNagengast left a comment

ZachNagengast May 4, 2024

ZachNagengast May 4, 2024

	if textDecoder.isModelMultilingual, options.language == nil, options.detectLanguage {
	let languageDecodingResult: DecodingResult? = try? await textDecoder.detectLanguage(
	from: encoderOutput,
	using: decoderInputs,
	sampler: tokenSampler,
	options: options,
	temperature: temp
	)

	// Update the language decoding options
	currentDecodingOptions.language = languageDecodingResult?.language
	detectedLanguage = languageDecodingResult?.language

	// Update prompt and KV Cache if needed
	if options.usePrefillPrompt {
	decoderInputs = try await textDecoder.prefillDecoderInputs(decoderInputs, withOptions: currentDecodingOptions)
	}

	Logging.debug("Prefill prompt updated to: \(decoderInputs.initialPrompt.map { tokenizer.convertIdToToken($0) ?? "" })")
	}

	decodingResult = try await textDecoder.decodeText(
	from: encoderOutput,
	using: decoderInputs,
	sampler: tokenSampler,
	options: currentDecodingOptions,
	callback: callback
	)

	// Use the predicted language if it was not detected ahead of time
	if detectedLanguage == nil {
	detectedLanguage = decodingResult?.language
	}

	// Update timings from the decoder main loop
	if let decodingTimings = decodingResult?.timings {
	if timings.firstTokenTime == 0 {
	timings.firstTokenTime = decodingTimings.firstTokenTime
	}
	timings.decodingPredictions += decodingTimings.decodingPredictions
	timings.totalDecodingLoops += decodingTimings.totalDecodingLoops
	timings.decodingNonPrediction += decodingTimings.decodingNonPrediction
	timings.decodingFiltering += decodingTimings.decodingFiltering
	timings.decodingSampling += decodingTimings.decodingSampling
	timings.decodingKvCaching += decodingTimings.decodingKvCaching
	timings.totalKVUpdateRuns += decodingTimings.totalKVUpdateRuns
	}

Fix language detection #133

Fix language detection #133

Conversation

jkrukowski commented May 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ZachNagengast left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jkrukowski commented May 2, 2024 •

edited

Loading