-
Notifications
You must be signed in to change notification settings - Fork 357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory and Latency Regression Tests #99
Conversation
We also need a scheme to detect regressions for these tests on a chip-by-chip basis. We can store the data inside the test resources directly, or host them in a huggingface repo. Files should be named something like this: All of these should be at or below 1.05x of the baseline data. I.e. a given regression test can be up to 5% slower than the baseline to account for various hardware fluctuations.
We don't yet have code to calculate WER from swift, but from python we use this metric: https://huggingface.co/spaces/evaluate-metric/wer Should also have a test that runs all available models from the whisper-coreml repo, similar to |
|
Should resolve #61 |
} | ||
|
||
func testLargeV2949PerformanceOverTime() async throws{ | ||
try await testAndMeasureModelPerformance(model: "large-v2_949") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are all subject to change over time unfortunately, so these tests should actually search the full HF repo for all models available in there in order to run these tests. You can use this method to find all available models:
WhisperKit/Sources/WhisperKit/Core/WhisperKit.swift
Lines 131 to 136 in 8564ce2
public static func fetchAvailableModels(from repo: String = "argmaxinc/whisperkit-coreml", matching: [String] = ["openai_*", "distil-whisper_*"]) async throws -> [String] { | |
let hubApi = HubApi() | |
let modelFiles = try await hubApi.getFilenames(from: repo, matching: matching) | |
return formatModelFiles(modelFiles) | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the code to use this function, Zack
* make `getMemoryUsed` static * remove `jfk_long.mp4` as its unused * update dataset url to point to whisperkit * dynamically test all models available on the hub
You'll need to install xcparse via
Note: xcparse command above will output attachments as files in current directory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! And low risk because its all new code. Needs just a bit of linting which I can run for you.
Can you give a bit of guidance on how to test this? I.e. what commands to run and what the expected output should be?
Co-authored-by: Zach Nagengast <[email protected]>
Do let me know how I can improve on the linting ! As for running it, the commands above should work, if not, you can manually run it from XCode and see the test attachments in the Xcode test result. |
Will have linting rules setup soon, until then this is good to merge 👍 |
Checklist:
- [ ] add WER calculations in swiftMoved out of this PR. Plan to include it as part of EN Normalization PRUtils.swift
testOutputAll
See colab for plots