Memory and Latency Regression Tests #99

Abhinay1997 · 2024-04-02T15:39:03Z

Log memory during regression test
Keep track of latency during regression test
Summarize measurements and save for every N(=100) samples
Dump measurements to a json for easy visualization.

Checklist:

code clean up
add more stats for latency tests
~~- [ ] add WER calculations in swift~~ Moved out of this PR. Plan to include it as part of EN Normalization PR
get processor info and add it to test output. See Utils.swift
use a longer audio file. > 1 hr for tests. Using earnings22
test dynamically on all models similar to testOutputAll

See colab for plots

ZachNagengast · 2024-04-02T23:17:42Z

We also need a scheme to detect regressions for these tests on a chip-by-chip basis. We can store the data inside the test resources directly, or host them in a huggingface repo.

Files should be named something like this:
Resources/Fixtures/RegressionTests/\(Process.processor)/\(modelName)/\(date).json

All of these should be at or below 1.05x of the baseline data. I.e. a given regression test can be up to 5% slower than the baseline to account for various hardware fluctuations.

Word error rate
Peak memory
Tokens per second
Full pipeline (for model loading times)

We don't yet have code to calculate WER from swift, but from python we use this metric: https://huggingface.co/spaces/evaluate-metric/wer

Should also have a test that runs all available models from the whisper-coreml repo, similar to testOutputAll, however I wouldn't expect this to run on the github runners due to the limited resources they have, it will just be run manually before releases.

Abhinay1997 · 2024-04-03T02:58:23Z

WER calculations are detailed here, we'll have to implement this in Swift.
Chip/Processor info is possible to obtain. Shouldn't be an issue. I'll add that.
Understood on the file name ! Let me do that.

atiorh · 2024-04-05T08:37:06Z

Should resolve #61

ZachNagengast · 2024-04-15T13:05:15Z

Tests/WhisperKitTests/RegressionTests.swift

+    }
+
+    func testLargeV2949PerformanceOverTime() async throws{
+        try await testAndMeasureModelPerformance(model: "large-v2_949")


These are all subject to change over time unfortunately, so these tests should actually search the full HF repo for all models available in there in order to run these tests. You can use this method to find all available models:

WhisperKit/Sources/WhisperKit/Core/WhisperKit.swift

Lines 131 to 136 in 8564ce2

public static func fetchAvailableModels(from repo: String = "argmaxinc/whisperkit-coreml", matching: [String] = ["openai_*", "distil-whisper_*"]) async throws -> [String] {

let hubApi = HubApi()

let modelFiles = try await hubApi.getFilenames(from: repo, matching: matching)

return formatModelFiles(modelFiles)

}

Updated the code to use this function, Zack

* make `getMemoryUsed` static * remove `jfk_long.mp4` as its unused * update dataset url to point to whisperkit * dynamically test all models available on the hub

Abhinay1997 · 2024-04-16T14:55:23Z

You'll need to install xcparse via brew install chargepoint/xcparse/xcparse

xcodebuild clean build-for-testing -scheme whisperkit-Package -destination generic/platform=macOS | xcpretty
xcodebuild test -only-testing WhisperKitTests/RegressionTests -scheme whisperkit-Package -destination "platform=macOS,arch=arm64" -resultBundlePath ~/Downloads
xcparse attachments ~/Downloads/<latest_xc_result_file>.xcresult

Note: xcparse command above will output attachments as files in current directory

ZachNagengast

Looks great! And low risk because its all new code. Needs just a bit of linting which I can run for you.

Can you give a bit of guidance on how to test this? I.e. what commands to run and what the expected output should be?

Tests/WhisperKitTests/FunctionalTests.swift

Co-authored-by: Zach Nagengast <[email protected]>

Abhinay1997 · 2024-04-17T02:08:30Z

Do let me know how I can improve on the linting !

As for running it, the commands above should work, if not, you can manually run it from XCode and see the test attachments in the Xcode test result.

ZachNagengast · 2024-04-21T05:38:49Z

Do let me know how I can improve on the lining !

Will have linting rules setup soon, until then this is good to merge 👍

Add initial code for regression tests

f093fe5

Abhinay1997 changed the title ~~Add initial code for regression tests~~ Memory and Latency Regression Tests Apr 2, 2024

Abhinay1997 mentioned this pull request Apr 2, 2024

Implement memory and latency regression tests #61

Closed

Abhinay1997 added 3 commits April 3, 2024 23:06

Add processor info & generalize file write

f7b5efc

Add WER calculations

9dce632

Add unit tests for WER

4f45e15

atiorh mentioned this pull request Apr 8, 2024

English text normalization utilization for Eager Streaming Mode #111

Open

Merge branch 'main' into regression-test

3ffc7ee

ZachNagengast linked an issue Apr 12, 2024 that may be closed by this pull request

Implement memory and latency regression tests #61

Closed

Add regression tests for each model

ee83f74

ZachNagengast reviewed Apr 15, 2024

View reviewed changes

Merge branch 'main' into regression-test

033c38e

atiorh mentioned this pull request Apr 15, 2024

Implement test data-driven unsupportedModelDeviceCombination at init #118

Open

Abhinay1997 added 2 commits April 16, 2024 20:16

* capture transcript in test report

ff13925

* make `getMemoryUsed` static * remove `jfk_long.mp4` as its unused * update dataset url to point to whisperkit * dynamically test all models available on the hub

Merge branch 'main' into regression-test

1381b27

ZachNagengast reviewed Apr 16, 2024

View reviewed changes

Tests/WhisperKitTests/FunctionalTests.swift Outdated Show resolved Hide resolved

Abhinay1997 and others added 2 commits April 17, 2024 07:17

Update Tests/WhisperKitTests/FunctionalTests.swift

bf8e3ef

Co-authored-by: Zach Nagengast <[email protected]>

Remover WERUtils as it's not part of the current changes

4e01432

Abhinay1997 mentioned this pull request Apr 20, 2024

Regression Test Pipeline #120

Merged

4 tasks

ZachNagengast approved these changes Apr 21, 2024

View reviewed changes

ZachNagengast merged commit d3a9a99 into argmaxinc:main Apr 21, 2024
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory and Latency Regression Tests #99

Memory and Latency Regression Tests #99

Abhinay1997 commented Apr 2, 2024 •

edited

Loading

ZachNagengast commented Apr 2, 2024 •

edited

Loading

Abhinay1997 commented Apr 3, 2024

atiorh commented Apr 5, 2024

ZachNagengast Apr 15, 2024

Abhinay1997 Apr 16, 2024

Abhinay1997 commented Apr 16, 2024 •

edited

Loading

ZachNagengast left a comment

Abhinay1997 commented Apr 17, 2024

ZachNagengast commented Apr 21, 2024

	public static func fetchAvailableModels(from repo: String = "argmaxinc/whisperkit-coreml", matching: [String] = ["openai_", "distil-whisper_"]) async throws -> [String] {
	let hubApi = HubApi()
	let modelFiles = try await hubApi.getFilenames(from: repo, matching: matching)

	return formatModelFiles(modelFiles)
	}

Memory and Latency Regression Tests #99

Memory and Latency Regression Tests #99

Conversation

Abhinay1997 commented Apr 2, 2024 • edited Loading

ZachNagengast commented Apr 2, 2024 • edited Loading

Abhinay1997 commented Apr 3, 2024

atiorh commented Apr 5, 2024

ZachNagengast Apr 15, 2024

Choose a reason for hiding this comment

Abhinay1997 Apr 16, 2024

Choose a reason for hiding this comment

Abhinay1997 commented Apr 16, 2024 • edited Loading

ZachNagengast left a comment

Choose a reason for hiding this comment

Abhinay1997 commented Apr 17, 2024

ZachNagengast commented Apr 21, 2024

Abhinay1997 commented Apr 2, 2024 •

edited

Loading

ZachNagengast commented Apr 2, 2024 •

edited

Loading

Abhinay1997 commented Apr 16, 2024 •

edited

Loading