Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed tokenizer and audio processing logic #214

Open
wants to merge 27 commits into
base: main
Choose a base branch
from

Conversation

1amageek
Copy link
Contributor

@1amageek 1amageek commented Oct 3, 2024

This PR addresses the following changes:

  • Updated the tokenizer logic in WhisperTokenizerWrapper.
  • Fixed the startIndex validation in AudioChunker.swift and AudioProcessor.swift.
  • Corrected the embedSize and sequenceLength calculations in AudioEncoder.swift.
  • Added missing implementations for applyChatTemplate and encode methods.
  • Updated dependencies in Package.swift.

@@ -22,16 +22,14 @@ public class AudioEncoder: AudioEncoding, WhisperMLModel {
guard let inputDescription = model?.modelDescription.outputDescriptionsByName["encoder_output_embeds"] else { return nil }
guard inputDescription.type == .multiArray else { return nil }
guard let shapeConstraint = inputDescription.multiArrayConstraint else { return nil }
let shape = shapeConstraint.shape.map { $0.intValue }
return shape[1]
return shapeConstraint.shape[0].intValue
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shape[0]: Batch size
shape[1]: Sequence length
shape[2]: Embedding dimension

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ZachNagengast
Copy link
Contributor

Hi @1amageek, I'm happy to see your enthusiasm to contribute to WhisperKit! I'd like to make a few recommendations to help guide your work in the future, since this PR has grown quite large.

  1. Your initial changes identified a bug in the startIndex checking code, which was great to see! I've adjusted it slightly and added you as a co-author in this commit 47035a8 (#216)
  2. I'd highly recommend running the test locally in Xcode to make sure they pass. I've been periodically running the CI in this PR and they have been failing, which is a required check for merging.
  3. I'd also recommend breaking up this PR into smaller more focused PRs that fix a specific issue or add a single new feature.
  4. Try to make a best effort to conform to the existing code style. This includes keeping any existing documentation in place, using english language for new documentation, and including the common file header when adding new files.

With this in mind, I would recommend closing this PR and separating out the changes that add new features or fix existing bugs into separate PRs with details on why they are necessary and solve an open issue listed in https://github.com/argmaxinc/WhisperKit/issues. Thanks again for your contributions so far, open to any discussion on these topics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants