Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory and Latency Regression Tests #99

Merged
merged 11 commits into from
Apr 21, 2024

Conversation

Abhinay1997
Copy link
Contributor

@Abhinay1997 Abhinay1997 commented Apr 2, 2024

  • Log memory during regression test
  • Keep track of latency during regression test
  • Summarize measurements and save for every N(=100) samples
  • Dump measurements to a json for easy visualization.

Checklist:

  • code clean up
  • add more stats for latency tests
    - [ ] add WER calculations in swift Moved out of this PR. Plan to include it as part of EN Normalization PR
  • get processor info and add it to test output. See Utils.swift
  • use a longer audio file. > 1 hr for tests. Using earnings22
  • test dynamically on all models similar to testOutputAll

See colab for plots

@Abhinay1997 Abhinay1997 changed the title Add initial code for regression tests Memory and Latency Regression Tests Apr 2, 2024
@ZachNagengast
Copy link
Contributor

ZachNagengast commented Apr 2, 2024

We also need a scheme to detect regressions for these tests on a chip-by-chip basis. We can store the data inside the test resources directly, or host them in a huggingface repo.

Files should be named something like this:
Resources/Fixtures/RegressionTests/\(Process.processor)/\(modelName)/\(date).json

All of these should be at or below 1.05x of the baseline data. I.e. a given regression test can be up to 5% slower than the baseline to account for various hardware fluctuations.

  • Word error rate
  • Peak memory
  • Tokens per second
  • Full pipeline (for model loading times)

We don't yet have code to calculate WER from swift, but from python we use this metric: https://huggingface.co/spaces/evaluate-metric/wer

Should also have a test that runs all available models from the whisper-coreml repo, similar to testOutputAll, however I wouldn't expect this to run on the github runners due to the limited resources they have, it will just be run manually before releases.

@Abhinay1997
Copy link
Contributor Author

  • WER calculations are detailed here, we'll have to implement this in Swift.
  • Chip/Processor info is possible to obtain. Shouldn't be an issue. I'll add that.
  • Understood on the file name ! Let me do that.

@atiorh
Copy link
Contributor

atiorh commented Apr 5, 2024

Should resolve #61

@ZachNagengast ZachNagengast linked an issue Apr 12, 2024 that may be closed by this pull request
}

func testLargeV2949PerformanceOverTime() async throws{
try await testAndMeasureModelPerformance(model: "large-v2_949")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are all subject to change over time unfortunately, so these tests should actually search the full HF repo for all models available in there in order to run these tests. You can use this method to find all available models:

public static func fetchAvailableModels(from repo: String = "argmaxinc/whisperkit-coreml", matching: [String] = ["openai_*", "distil-whisper_*"]) async throws -> [String] {
let hubApi = HubApi()
let modelFiles = try await hubApi.getFilenames(from: repo, matching: matching)
return formatModelFiles(modelFiles)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the code to use this function, Zack

* make `getMemoryUsed` static
* remove `jfk_long.mp4` as its unused
* update dataset url to point to whisperkit
* dynamically test all models available on the hub
@Abhinay1997
Copy link
Contributor Author

Abhinay1997 commented Apr 16, 2024

You'll need to install xcparse via brew install chargepoint/xcparse/xcparse

xcodebuild clean build-for-testing -scheme whisperkit-Package -destination generic/platform=macOS | xcpretty
xcodebuild test -only-testing WhisperKitTests/RegressionTests -scheme whisperkit-Package -destination "platform=macOS,arch=arm64" -resultBundlePath ~/Downloads
xcparse attachments ~/Downloads/<latest_xc_result_file>.xcresult

Note: xcparse command above will output attachments as files in current directory

Copy link
Contributor

@ZachNagengast ZachNagengast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! And low risk because its all new code. Needs just a bit of linting which I can run for you.

Can you give a bit of guidance on how to test this? I.e. what commands to run and what the expected output should be?

Tests/WhisperKitTests/FunctionalTests.swift Outdated Show resolved Hide resolved
@Abhinay1997
Copy link
Contributor Author

Do let me know how I can improve on the linting !

As for running it, the commands above should work, if not, you can manually run it from XCode and see the test attachments in the Xcode test result.

@Abhinay1997 Abhinay1997 mentioned this pull request Apr 20, 2024
4 tasks
@ZachNagengast
Copy link
Contributor

Do let me know how I can improve on the lining !

Will have linting rules setup soon, until then this is good to merge 👍

@ZachNagengast ZachNagengast merged commit d3a9a99 into argmaxinc:main Apr 21, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement memory and latency regression tests
3 participants