Skip to content

Commit f16e681

Browse files
authored
Update docs on LMM runner Apple API
Updated the generation process for tokens and added instructions for resetting and using multimodal inputs.
1 parent 9152f0a commit f16e681

File tree

1 file changed

+146
-8
lines changed

1 file changed

+146
-8
lines changed

docs/source/llm/run-on-ios.md

Lines changed: 146 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -80,17 +80,22 @@ do {
8080

8181
#### Generating
8282

83-
Generate up to a given number of tokens from an initial prompt. The callback block is invoked once per token as it’s produced.
83+
Generate tokens from an initial prompt, configured with an `ExecuTorchLLMConfig` object. The callback block is invoked once per token as it’s produced.
8484

8585
Objective-C:
8686
```objectivec
87+
ExecuTorchLLMConfig *config = [[ExecuTorchLLMConfig alloc] initWithBlock:^(ExecuTorchLLMConfig *c) {
88+
c.temperature = 0.8;
89+
c.sequenceLength = 2048;
90+
}];
91+
8792
NSError *error = nil;
88-
BOOL success = [runner generate:@"Once upon a time"
89-
sequenceLength:50
90-
withTokenCallback:^(NSString *token) {
91-
NSLog(@"Generated token: %@", token);
92-
}
93-
error:&error];
93+
BOOL success = [runner generateWithPrompt:@"Once upon a time"
94+
config:config
95+
tokenCallback:^(NSString *token) {
96+
NSLog(@"Generated token: %@", token);
97+
}
98+
error:&error];
9499
if (!success) {
95100
NSLog(@"Generation failed: %@", error);
96101
}
@@ -99,7 +104,10 @@ if (!success) {
99104
Swift:
100105
```swift
101106
do {
102-
try runner.generate("Once upon a time", sequenceLength: 50) { token in
107+
try runner.generate("Once upon a time", Config {
108+
$0.temperature = 0.8
109+
$0.sequenceLength = 2048
110+
}) { token in
103111
print("Generated token:", token)
104112
}
105113
} catch {
@@ -121,6 +129,136 @@ Swift:
121129
runner.stop()
122130
```
123131

132+
#### Resetting
133+
134+
To clear the prefilled tokens from the KV cache and reset generation stats, call:
135+
136+
Objective-C:
137+
```objectivec
138+
[runner reset];
139+
```
140+
141+
Swift:
142+
```swift
143+
runner.reset()
144+
```
145+
146+
### MultimodalRunner
147+
148+
The `ExecuTorchLLMMultimodalRunner` class (bridged to Swift as `MultimodalRunner`) provides an interface for loading and running multimodal models that can accept a sequence of text, image, and audio inputs.
149+
150+
#### Multimodal Inputs
151+
152+
Inputs are provided as an array of `ExecuTorchLLMMultimodalInput` (or `MultimodalInput` in Swift). You can create inputs from String for text, `ExecuTorchLLMImage` for images (`Image` in Swift), and `ExecuTorchLLMAudio` for audio features (`Audio`) in Swift.
153+
154+
Objective-C:
155+
```objectivec
156+
ExecuTorchLLMMultimodalInput *textInput = [ExecuTorchLLMMultimodalInput inputWithText:@"What's in this image?"];
157+
158+
NSData *imageData = ...; // Your raw image bytes
159+
ExecuTorchLLMImage *image = [[ExecuTorchLLMImage alloc] initWithData:imageData width:336 height:336 channels:3];
160+
ExecuTorchLLMMultimodalInput *imageInput = [ExecuTorchLLMMultimodalInput inputWithImage:image];
161+
```
162+
163+
Swift:
164+
```swift
165+
let textInput = MultimodalInput("What's in this image?")
166+
167+
let imageData: Data = ... // Your raw image bytes
168+
let image = Image(data: imageData, width: 336, height: 336, channels: 3)
169+
let imageInput = MultimodalInput(image)
170+
171+
let audioFeatureData: Data = ... // Your raw audio feature bytes
172+
let audio = Audio(float: audioFeatureData, batchSize: 1, bins: 128, frames: 3000)
173+
let audioInput = MultimodalInput(audio)
174+
```
175+
176+
#### Initialization
177+
178+
Create a runner by specifying the paths to your multimodal model and its tokenizer.
179+
180+
Objective-C:
181+
```objectivec
182+
NSString *modelPath = [[NSBundle mainBundle] pathForResource:@"llava" ofType:@"pte"];
183+
NSString *tokenizerPath = [[NSBundle mainBundle] pathForResource:@"llava_tokenizer" ofType:@"bin"];
184+
185+
ExecuTorchLLMMultimodalRunner *runner = [[ExecuTorchLLMMultimodalRunner alloc] initWithModelPath:modelPath
186+
tokenizerPath:tokenizerPath];
187+
```
188+
189+
Swift:
190+
```swift
191+
let modelPath = Bundle.main.path(forResource: "llava", ofType: "pte")!
192+
let tokenizerPath = Bundle.main.path(forResource: "llava_tokenizer", ofType: "bin")!
193+
194+
let runner = MultimodalRunner(modelPath: modelPath, tokenizerPath: tokenizerPath)
195+
```
196+
197+
#### Loading
198+
199+
Explicitly load the model before generation.
200+
201+
Objective-C:
202+
```objectivec
203+
NSError *error = nil;
204+
BOOL success = [runner loadWithError:&error];
205+
if (!success) {
206+
NSLog(@"Failed to load: %@", error);
207+
}
208+
```
209+
210+
Swift:
211+
```swift
212+
do {
213+
try runner.load()
214+
} catch {
215+
print("Failed to load: \(error)")
216+
}
217+
```
218+
219+
#### Generating
220+
221+
Generate tokens from an ordered array of multimodal inputs.
222+
223+
Objective-C:
224+
```objectivec
225+
NSArray<ExecuTorchLLMMultimodalInput *> *inputs = @[textInput, imageInput];
226+
227+
ExecuTorchLLMConfig *config = [[ExecuTorchLLMConfig alloc] initWithBlock:^(ExecuTorchLLMConfig *c) {
228+
c.sequenceLength = 768;
229+
}];
230+
231+
NSError *error = nil;
232+
BOOL success = [runner generateWithInputs:inputs
233+
config:config
234+
tokenCallback:^(NSString *token) {
235+
NSLog(@"Generated token: %@", token);
236+
}
237+
error:&error];
238+
if (!success) {
239+
NSLog(@"Generation failed: %@", error);
240+
}
241+
```
242+
243+
Swift:
244+
```swift
245+
let inputs = [textInput, imageInput]
246+
247+
do {
248+
try runner.generate(inputs, Config {
249+
$0.sequenceLength = 768
250+
}) { token in
251+
print("Generated token:", token)
252+
}
253+
} catch {
254+
print("Generation failed:", error)
255+
}
256+
```
257+
258+
#### Stopping and Resetting
259+
260+
The stop and reset methods for `MultimodalRunner` behave identically to those on `TextRunner`.
261+
124262
## Demo
125263

126264
Get hands-on with our [etLLM iOS Demo App](https://github.com/meta-pytorch/executorch-examples/tree/main/llm/apple) to see the LLM runtime APIs in action.

0 commit comments

Comments
 (0)