Skip to content

livekit-examples/agent-starter-swift

Repository files navigation

Voice Agent App Icon

Swift Voice Agent starter app

This starter app template for LiveKit Agents provides a simple voice interface using the LiveKit Swift SDK. It supports voice, transcriptions, live video input, and virtual avatars.

This template is compatible with iOS, iPadOS, macOS, and visionOS and is free for you to use or modify as you see fit.

Voice Agent Screenshot

Getting started

First, you'll need a LiveKit agent to speak with. Try our starter agent for Python, Node.js, or create your own from scratch.

Second, you need a token server. The easiest way to set this up is with the Sandbox for LiveKit Cloud and the LiveKit CLI.

First, create a new Sandbox Token Server for your LiveKit Cloud project. Then, run the following command to automatically clone this template and connect it to LiveKit Cloud. This will create a new Xcode project in the current directory.

lk app create --template agent-starter-swift --sandbox <token_server_sandbox_id>

Then, build and run the app from Xcode by opening VoiceAgent.xcodeproj. You may need to adjust your app signing settings to run the app on your device.

Note

To set up without the LiveKit CLI, clone the repository and then either create a VoiceAgent/.env.xcconfig with a LIVEKIT_SANDBOX_ID (if using a Sandbox Token Server), or modify VoiceAgent/VoiceAgentApp.swift to replace the SandboxTokenSource with a custom token source implementation.

Feature overview

This starter app supports several features of the agents framework and is easily configurable to enable or disable them in code based on your needs as you adapt this template to your own use case.

Text, video, and voice input

This app supports text, video, and/or voice input according to the needs of your agent. To update the features enabled in the app, edit VoiceAgent/VoiceAgentApp.swift and modify the .environment() modifiers to enable or disable features.

By default, all features (voice, video, and text input) are enabled. To disable a feature, change the value from true to false:

.environment(\.voiceEnabled, true)   // Enable voice input
.environment(\.videoEnabled, false)  // Disable video input
.environment(\.textEnabled, true)    // Enable text input

Available input types:

  • .voice: Allows the user to speak to the agent using their microphone. Requires microphone permissions.
  • .text: Allows the user to type to the agent. See the docs for more details.
  • .video: Allows the user to share their camera or screen to the agent. This requires a supported model like the Gemini Live API. See the docs for more details.

If you have trouble with screen sharing, refer to the docs for more setup instructions.

Session

The app is built on top of two main observable components from the LiveKit Swift SDK:

  • Session object to connect to the LiveKit infrastructure, interact with the Agent and its local state, and send/receive text messages.
  • LocalMedia object to manage the local media tracks (audio, video, screen sharing) and their lifecycle.

Preconnect audio buffer

This app enables preConnectAudio by default to capture and buffer audio before the room connection completes. This allows the connection to appear "instant" from the user's perspective and makes your app more responsive. To disable this feature, set preConnectAudio to false in SessionOptions when creating the Session.

Virtual avatar support

If your agent publishes a virtual avatar, this app will automatically render the avatar's camera feed in AgentView when available.

Token generation in production

In production, you'll need to develop a solution to generate tokens for your users that integrates with your authentication system. You should replace your SandboxTokenSource with an EndpointTokenSource or your own TokenSourceFixed or TokenSourceConfigurable implementation. Additionally, you can use the .cached() extension to cache valid tokens and avoid unnecessary token requests.

Running on Simulator

To use this template with video (or screen sharing) input, you need to run the app on a physical device. Testing on the Simulator will still support voice and text modes, as well as virtual avatars.

Submitting to the App Store

LiveKitWebRTC.xcframework, which is part of the LiveKit Swift SDK, does not contain dSYMs. Submitting the app to the App Store will result in the following warning:

The archive did not include a dSYM for the LiveKitWebRTC.framework with the UUIDs [...]. Ensure that the archive's dSYM folder includes a DWARF file for LiveKitWebRTC.framework with the expected UUIDs.

It will not prevent the app from being submitted to the App Store or passing the review process.

Contributing

This template is open source and we welcome contributions! Please open a PR or issue through GitHub, and don't forget to join us in the LiveKit Community Slack!

About

AI voice assistant starter app for iOS, macOS, and visionOS built with LiveKit

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 5

Languages