Skip to content

Conversation

@1egoman
Copy link
Contributor

@1egoman 1egoman commented Sep 15, 2025

This change comprises the new client agents sdk, a set of react hooks that are being built to make interaction with the livekit agents framework less complex.

This is version 3 - version 1 can be found here, and version 2 can be found here. Each step it has evolved significantly based on comments and perspectives from people who have taken a look!

Single file example

import { useEffect, useState } from "react";
import { Track, TokenSource } from "livekit-client";
import {
  useSession,
  useAgent,
  useSessionMessages,

  VideoTrack,
  StartAudio,
  RoomAudioRenderer,
  useMediaDeviceSelect,
  useTrackToggle,
} from "@livekit/components-react";

// From: https://github.com/livekit/client-sdk-js/pull/1645
const tokenSource = new TokenSource.SandboxTokenServer({ sandboxId: "xxx" });

export default function SinglePageDemo() {
  const session = useSession(tokenSource, { agentName: 'voice ai quickstart' });

  const agent = useAgent(session);

  // FIXME: still using the old local participant related hooks, so this isn't much simplier than in
  // the past. Eventually I think there needs to be something like a `useLocalTrack(session.local.camera)`
  // hook that abstracts over all this...
  const audioDevices = useMediaDeviceSelect({ kind: "audioinput", room: session.room });
  const microphoneTrack = useTrackToggle({ source: Track.Source.Microphone, room: session.room });
  const videoDevices = useMediaDeviceSelect({ kind: "videoinput", room: session.room });
  const cameraTrack = useTrackToggle({ source: Track.Source.Camera, room: session.room });

  const [started, setStarted] = useState(false);
  useEffect(() => {
    if (!started) {
      return;
    }
    session.start();
    return () => {
      session.end();
    };
  }, [started]);

  const { messages, send, isSending } = useSessionMessages(session);
  const [chatMessage, setChatMessage] = useState('');

  return (
    <div className="flex flex-col gap-4 p-4">
      <div className="flex items-center gap-4">
        <Button variant="primary" onClick={() => setStarted(s => !s)} disabled={session.connectionState === 'connecting'}>
          {session.isConnected ? 'Disconnect' : 'Connect'}
        </Button>
        <span>
          <strong className="mr-1">Statuses:</strong>
          {session.connectionState} / {agent.state ?? 'N/A'}
        </span>
      </div>

      {session.isConnected ? (
        <>
          <div className="border rounded bg-muted p-2">
            <Button onClick={() => cameraTrack.toggle()} disabled={cameraTrack.pending}>
              {cameraTrack.enabled ? 'Disable' : 'Enable'} local camera
            </Button>
            <Button onClick={() => microphoneTrack.toggle()} disabled={microphoneTrack.pending}>
              {microphoneTrack.enabled ? 'Mute' : 'Un mute'} local microphone
            </Button>
            <div>
              <p>Local camera sources:</p>
              {videoDevices.devices.map(item => (
                <li
                  key={item.deviceId}
                  onClick={() => videoDevices.setActiveMediaDevice(item.deviceId)}
                  style={{ color: item.deviceId === videoDevices.activeDeviceId ? 'red' : undefined }}
                >
                  {item.label}
                </li>
              ))}
            </div>
            <div>
              <p>Local microphone sources:</p>
              {audioDevices.devices.map(item => (
                <li
                  key={item.deviceId}
                  onClick={() => audioDevices.setActiveMediaDevice(item.deviceId)}
                  style={{ color: item.deviceId === audioDevices.activeDeviceId ? 'red' : undefined }}
                >
                  {item.label}
                </li>
              ))}
            </div>
          </div>

          <div>
            {session.local.camera.publication ? (
              <VideoTrack trackRef={session.local.camera} />
            ) : null}
            {agent.camera ? (
              <VideoTrack trackRef={agent.camera} />
            ) : null}
          </div>

          <ul>
            {messages.map(receivedMessage => (
              <li key={receivedMessage.id}>{receivedMessage.message}</li>
            ))}
            <li className="flex items-center gap-1">
              <input
                type="text"
                value={chatMessage}
                onChange={e => setChatMessage(e.target.value)}
                className="border border-2"
              />
              <Button
                variant="secondary"
                disabled={isSending}
                onClick={() => {
                  send(chatMessage);
                  setChatMessage('');
                }}
              >{isSending ? 'Sending' : 'Send'}</Button>
            </li>
          </ul>
        </>
      ) : null}

      <StartAudio label="Start audio" />
      <RoomAudioRenderer room={session.room} />
    </div>
  );
}

New API surface area

  • useSession(tokenSource: TokenSourceFixed | TokenSourceConfigurable, options: UseSessionConfigurableOptions | UseSessionFixedOptions): UseSessionReturn
    A thin wrapper around a Room which handles connecting to a room and dispatching a given agent into that room (or in the future, maybe multiple agents?). In the future it will probably become thicker as more global agent state is required.
const tokenSource: TokenSourceConfigurable = /* ... */;

const session = useSession(tokenSource, {
  // NOTE: either `room` can be a property here, or if not specified, it reads `room` from `RoomContext`, or if that can't be found, finally falls back to just making a new room
  tokenSource,
  tokenFetchOptions: { agentName: 'agent name to dispatch' },
});

useEffect(() => {
  session.begin();
  return () => {
    session.end();
  };
}, [session]);
  • useAgent(session: UseSessionReturn): Agent
    A much more advanced version of the previously existing useVoiceAssistant hook - tracks the agent's state within the conversation, manages agent connection timeouts / other failures, and largely maintains backwards compatibility with existing interfaces.
const agent = useAgent(session);

// Log agent connection errors
useEffect(() => {
  if (agent.state === "failed") {
    console.error(`Error connecting to agent: ${agent.failureReasons.join(", ")}`);
  }
}, [agent]);

// later on, in a component:
<VideoTrack trackRef={agent.camera} /> 
  • useSessionMessages
    A mechanism for interacting with ReceivedMessages across the whole conversation. A ReceivedMessage can be a ReceivedChatMessage (already exists today), or a ReceivedUserTranscriptionMessage / ReceivedAgentTranscriptionMessage (both brand new). This is exposed at the conversation level so in a future world where multiple agents are within a conversation, this hook will return messages from all of them
const { messages, isSending, send } = useSessionMessages(session);
// NOTE: send / isSending are proxies of the existing interface returned by `useChat`

// later on, in a component:
<ul>
  {messages.map(receivedMessage => (
    <li key={receivedMessage.id}>{receivedMessage.from.name}: {receivedMessage.message}</li>
  )}
</ul>

Additional refactoring / cleanup

  • Added a new ParticipantAgentAttributes constant and ported all usages of lk.-prefixed attributes (which previously were just magic strings in the code) to refer to this enum.
  • Fixed type error in handleMediaDeviceError callback function in useLiveKitRoom
  • Added support for explicit room parameter to a few hooks and components that didn't support it previously, to make single file example type scenarios easier:
    • RoomAudioRenderer
    • StartAudio
    • useChat
    • useTextStream
    • useTrackToggle
    • useTranscriptions

@changeset-bot
Copy link

changeset-bot bot commented Sep 15, 2025

🦋 Changeset detected

Latest commit: d2d070d

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 6 packages
Name Type
@livekit/components-react Patch
@livekit/components-core Patch
@livekit/component-example-next Patch
@livekit/components-js-docs Patch
@livekit/component-docs-storybook Patch
@livekit/components-docs-gen Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@1egoman 1egoman force-pushed the agent-sdk branch 3 times, most recently from ce80b15 to ef0fed7 Compare September 17, 2025 20:44
Comment on lines 8 to 39
type ReceivedMessageWithType<
Type extends string,
Metadata extends {} = {},
> = {
id: string;
timestamp: number;

type: Type;

from?: Participant;
attributes?: Record<string, string>;
} & Metadata;

/** @public */
export type ReceivedChatMessage = ReceivedMessageWithType<'chatMessage', ChatMessage & {
from?: Participant;
attributes?: Record<string, string>;
}>;

export type ReceivedUserTranscriptionMessage = ReceivedMessageWithType<'userTranscript', {
message: string;
}>;

export type ReceivedAgentTranscriptionMessage = ReceivedMessageWithType<'agentTranscript', {
message: string;
}>;

/** @public */
export type ReceivedMessage =
| ReceivedUserTranscriptionMessage
| ReceivedAgentTranscriptionMessage
| ReceivedChatMessage
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ported the existing ReceivedMessage abstraction from here on top of the pre-existing ReceivedChatMessage - this means that now,ReceivedChatMessage is now a ReceivedMessage subtype.

Note ReceivedChatMessage has one new type field addition which acts as the discriminant key in ReceivedMessage, but otherwise is identical. So this should be a fully backwards compatible change even though behind the scenes a lot has been updated.

@1egoman 1egoman changed the title [WIP] Agent SDK - ported on top of components-js primatives Agent SDK - ported on top of components-js primatives Sep 17, 2025
@1egoman 1egoman marked this pull request as ready for review September 17, 2025 20:53
@1egoman 1egoman requested review from lukasIO and pblazej September 17, 2025 20:53
@lukasIO
Copy link
Contributor

lukasIO commented Sep 18, 2025

const audioDevices = useMediaDeviceSelect({ kind: "audioinput", room: conversation.subtle.room });
const microphoneTrack = useTrackToggle({ source: Track.Source.Microphone, room: conversation.subtle.room });
const videoDevices = useMediaDeviceSelect({ kind: "videoinput", room: conversation.subtle.room });
const cameraTrack = useTrackToggle({ source: Track.Source.Camera, room: conversation.subtle.room });

how about proxying some of these on the return value of useConversation ?

@1egoman
Copy link
Contributor Author

1egoman commented Sep 18, 2025

how about proxying some of these on the return value of useConversation ?

I opted to leave that out for now because of the hesitancy from ben/dz around new track abstractions. That being said, I mentioned in the comment above that I had been proposing a new hook, useLocalTrack, which would take in a track reference returned from other abstractions (conversation, agent, or even other non agent related abstractions), but it also could live underneath the conversation too.

It sounds like you are pushing for that to exist now vs deferring it? If so I can add that new hook to this branch or possibly figure out how to fit it into conversation.

Copy link
Member

@davidzhao davidzhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, i like the direction where this is going.

1egoman added 14 commits October 1, 2025 11:14
This allows functions to limit whether they just want to take in
TrackReferences from a given source - ie, the VideoTrack could be made
to only accept TrackReference<Track.Source.Camera | Track.Source.Screenshare | Track.Source.Unknown>.
Note that just the return values are changing, not the argument
definitions in other spots, so this shouldn't be a backwards
compatibility issue.
The pre-existing state was broken.
…s going to be important for future multi-agent type use cases
Longer term I think it might be worth discussing migrating away from
that pattern since react won't be tree-shaken properly in end projects
This was causing prepareConnection to get run constantly since its
reference depends on restOptions which changes every call.

The way I see it, if a user is changing their options and wants
prepareconnection to be run for something beyond the initial set
(probably unusual), they can call it themselves on the returned
session object.
@1egoman 1egoman merged commit f118da6 into main Oct 14, 2025
6 checks passed
@1egoman 1egoman deleted the agent-sdk branch October 14, 2025 17:07
@1egoman 1egoman changed the title Agent SDK - ported on top of components-js primatives Agent SDK - ported on top of components-js primitives Oct 14, 2025
@github-actions github-actions bot mentioned this pull request Oct 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants