Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix language detection #133

Merged
merged 4 commits into from
May 4, 2024

Conversation

jkrukowski
Copy link
Contributor

@jkrukowski jkrukowski commented May 2, 2024

While working on audio chunking I noticed that sometimes language detection is off. It can happen that language is detected as <|nocaptions|> which might result in a whole 30s segment being discarded.

Screenshot 2024-05-02 at 17 34 45

This PR fixes that by defaulting to "en" when the language detection is confused.

Comment on lines 482 to 495
return DecodingResult(
language: detectedLanguage,
languageProbs: [:],
tokens: [],
tokenLogProbs: [],
text: "",
avgLogProb: 0.0,
noSpeechProb: 0.0,
temperature: 0.0,
compressionRatio: 0.0,
cache: nil,
timings: timings,
fallback: nil
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could as well change the interface of this function and return the detected (optional) language code instead of DecodingResult

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems reasonable, can it also return the languageProbs here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems reasonable, can it also return the languageProbs here?

so detected language together with languageProbs instead of DecodingResult? will do!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be better to keep an empty DecodingResult because it needs timings as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just looked, I dont think the timings are being accounted for with this at all, would be nice to merge these in this PR too while you're in the code

if textDecoder.isModelMultilingual, options.language == nil, options.detectLanguage {
let languageDecodingResult: DecodingResult? = try? await textDecoder.detectLanguage(
from: encoderOutput,
using: decoderInputs,
sampler: tokenSampler,
options: options,
temperature: temp
)
// Update the language decoding options
currentDecodingOptions.language = languageDecodingResult?.language
detectedLanguage = languageDecodingResult?.language
// Update prompt and KV Cache if needed
if options.usePrefillPrompt {
decoderInputs = try await textDecoder.prefillDecoderInputs(decoderInputs, withOptions: currentDecodingOptions)
}
Logging.debug("Prefill prompt updated to: \(decoderInputs.initialPrompt.map { tokenizer.convertIdToToken($0) ?? "" })")
}
decodingResult = try await textDecoder.decodeText(
from: encoderOutput,
using: decoderInputs,
sampler: tokenSampler,
options: currentDecodingOptions,
callback: callback
)
// Use the predicted language if it was not detected ahead of time
if detectedLanguage == nil {
detectedLanguage = decodingResult?.language
}
// Update timings from the decoder main loop
if let decodingTimings = decodingResult?.timings {
if timings.firstTokenTime == 0 {
timings.firstTokenTime = decodingTimings.firstTokenTime
}
timings.decodingPredictions += decodingTimings.decodingPredictions
timings.totalDecodingLoops += decodingTimings.totalDecodingLoops
timings.decodingNonPrediction += decodingTimings.decodingNonPrediction
timings.decodingFiltering += decodingTimings.decodingFiltering
timings.decodingSampling += decodingTimings.decodingSampling
timings.decodingKvCaching += decodingTimings.decodingKvCaching
timings.totalKVUpdateRuns += decodingTimings.totalKVUpdateRuns
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, done, please take look

Copy link
Contributor

@ZachNagengast ZachNagengast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this look great - well done 💯

Comment on lines +487 to +489
} else {
detectedLanguage = Constants.defaultLanguageCode
Logging.error("Detected language \(sampledLanguage) is not supported, defaulting to \(Constants.defaultLanguageCode)")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this error happen often or just a precaution?

Comment on lines +150 to +152
func trimmingSpecialTokenCharacters() -> String {
trimmingCharacters(in: Constants.specialTokenCharacters)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@ZachNagengast ZachNagengast merged commit d6f50da into argmaxinc:main May 4, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants