Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: cache tts audio 缓存tts语音 #5650

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

Dakai
Copy link
Contributor

@Dakai Dakai commented Oct 13, 2024

💻 变更类型 | Change Type

  • feat
  • fix
  • refactor
  • perf
  • style
  • test
  • docs
  • ci
  • chore
  • build

🔀 变更说明 | Description of Change

按下 tts 语音播放按钮之后会自动缓存语音文件,在消息框内可以播放、调整音量、播放速度和下载。
2024-10-14_02-07

📝 补充信息 | Additional Information

Summary by CodeRabbit

  • New Features

    • Introduced support for audio messages, allowing users to send and receive audio alongside text in chat.
    • Enhanced chat interface with new styles for audio messages and improved responsiveness.
    • Added functionality to play audio messages directly within the chat interface.
  • Bug Fixes

    • Adjusted hover states and media queries for better user experience across devices.
  • Documentation

    • Updated styles for audio elements to improve visual consistency.
  • Chores

    • Updated dependencies in the project configuration.

Copy link

vercel bot commented Oct 13, 2024

@Dakai is attempting to deploy a commit to the NextChat Team on Vercel.

A member of the Team first needs to authorize it.

Copy link
Contributor

coderabbitai bot commented Oct 13, 2024

Walkthrough

The pull request introduces several enhancements across multiple files, primarily focusing on the integration of audio message support in the chat application. Key changes include the addition of an optional audio_url property in the RequestMessage interface, new styling for audio messages, and updated rendering logic to accommodate audio playback. Additional modifications include new utility functions for handling audio data and adjustments to global styles for audio elements.

Changes

File Change Summary
app/client/api.ts - Added optional property audio_url to RequestMessage interface.
app/components/chat.module.scss - Updated .chat-message-item max-width to 300px.
- Added new classes: .audio-message (min-width 350px), .chat-message-item-audio (margin-top 10px, width 100%).
- Adjusted hover state and media queries for responsiveness.
app/components/chat.tsx - Added updateMessageAudio function for audio URL updates.
- Modified openaiSpeech to return a `Promise<string
app/styles/globals.scss - Added styles for audio element (height 35px).
- Modified appearance of audio controls.
app/utils.ts - Added commented-out function getMessageAudio for extracting audio URLs.
- Updated fetch function to use tauriStreamFetch if __TAURI__ is present.
app/utils/audio.ts - Added function arrayBufferToWav for converting ArrayBuffer to WAV format.
package.json - Re-added mermaid package (version ^10.6.1).

Possibly related PRs

  • add tts #5459: Adds a new optional property audio_url to the RequestMessage interface in app/client/api.ts, directly related to changes in this PR.
  • Feature play audio and video #5519: Introduces functionality to play audio elements based on file types in markdown, relating to audio message handling enhancements in this PR.
  • Feat: using tauri fetch api in App #5565: Modifies the fetch function in app/utils.ts, relevant to audio message handling changes in this PR.
  • google gemini support function call #5581: Enhances chat functionality to support streaming responses, potentially related to audio message handling and rendering logic introduced in this PR.

Suggested labels

enhancement, help wanted

Suggested reviewers

  • Dogtiti

Poem

In the chat where messages flow,
A new audio feature starts to glow.
With a click, hear the sounds,
As joy in each message abounds.
From bytes to waves, we cheer and play,
Happy chatting, hip-hip-hooray! 🎶🐰


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

Your build has completed!

Preview deployment

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Outside diff range and nitpick comments (8)
app/utils/audio.ts (1)

47-89: Overall implementation looks good, with some suggestions for improvement.

The arrayBufferToWav function correctly implements the conversion from ArrayBuffer to WAV format. However, there are a few areas where it could be improved:

  1. Consider parameterizing the audio properties (number of channels, sample rate, bits per sample) instead of hard-coding them. This would make the function more flexible for different audio formats.

  2. Add input validation to ensure the provided ArrayBuffer is not empty and has a valid size for audio data.

Here's a suggested refactor to address these points:

export function arrayBufferToWav(
  buffer: ArrayBuffer,
  numOfChannels: number = 1,
  sampleRate: number = 24000,
  bitsPerSample: number = 16
): ArrayBuffer {
  if (buffer.byteLength === 0) {
    throw new Error("Input buffer is empty");
  }

  // Rest of the function remains the same, using the parameterized values
  // ...
}

This change allows for more flexibility in audio formats while maintaining the default values currently used.

app/utils.ts (1)

Line range hint 1-453: Summary and Next Steps

The changes in this file are minimal but indicate the intention to add audio support, which aligns with the PR objectives. To fully implement the audio caching feature, consider the following next steps:

  1. Implement the getMessageAudio function correctly or remove it if not needed yet.
  2. Enhance existing utility functions (fetch, adapter, safeLocalStorage) to support audio file caching and retrieval.
  3. Add new utility functions specifically for audio caching if necessary.
  4. Ensure that all changes are consistent with the existing codebase structure and naming conventions.
  5. Add appropriate error handling and logging for audio-related operations.
  6. Update relevant components to use these utility functions for audio playback and caching.

Before proceeding with the implementation, it would be beneficial to create a detailed design document outlining the audio caching strategy, including:

  • Data structure for storing cached audio files
  • Caching policy (e.g., expiration, size limits)
  • Error handling and fallback mechanisms
  • Performance considerations for audio file retrieval and playback

This will ensure a more robust and scalable implementation of the audio caching feature.

app/components/chat.module.scss (3)

433-433: Consider responsive design for message width

The change from max-width: 100% to max-width: 300px might improve readability on larger screens, but it could cause issues on smaller devices or with messages containing long words or URLs.

Consider using a responsive approach, such as:

max-width: min(300px, 100%);

This will ensure the message width is either 300px or 100% of its container, whichever is smaller, providing better adaptability across different screen sizes.


446-448: Ensure responsiveness for audio messages

The new .audio-message class with a fixed min-width of 350px might cause layout issues on smaller screens.

Consider using a responsive approach:

.audio-message {
  min-width: min(350px, 100%);
  width: 100%;
}

This will ensure the audio message takes up the full width of its container on smaller screens while maintaining the desired minimum width on larger screens.


530-530: Clean up commented code

The min-width: 350px property on hover has been commented out. If this change is intentional and final, consider removing the commented line entirely to keep the codebase clean.

If you're still considering this feature, it might be helpful to add a comment explaining why it's temporarily disabled or under what conditions it might be re-enabled.

app/components/chat.tsx (3)

Line range hint 1214-1215: Use const instead of var for variable declaration

The use of var is outdated in modern TypeScript and can lead to unexpected behavior due to its function scoping. Since api is not reassigned, consider using const for better readability and to prevent unintended reassignments.

Apply this diff to fix the declaration:

-    var api: ClientApi;
-    api = new ClientApi(ModelProvider.GPT);
+    const api = new ClientApi(ModelProvider.GPT);

Line range hint 1220-1220: Use ES6 import instead of require for module import

In TypeScript, it's recommended to use ES6 module syntax (import) over CommonJS require. This improves compatibility and readability.

Change the require statement to an import:

-    const { markdownToTxt } = require("markdown-to-txt");
+    import { markdownToTxt } from "markdown-to-txt";

Note: Ensure that this change is compatible with your build and module resolution settings.


1811-1816: Check for url before updating message audio

If openaiSpeech fails and returns undefined, the updateMessageAudio function is still called with an undefined url, which may not be necessary. Consider adding a check to ensure url is defined before updating the message.

Here is a suggested change:

     onClick={async () => {
       const url = await openaiSpeech(
         getMessageTextContent(message),
       );
+      if (url) {
         updateMessageAudio(message.id, url);
+      }
     }}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between a84383f and e64a4f2.

⛔ Files ignored due to path filters (2)
  • app/icons/play.svg is excluded by !**/*.svg
  • app/icons/stop.svg is excluded by !**/*.svg
📒 Files selected for processing (7)
  • app/client/api.ts (1 hunks)
  • app/components/chat.module.scss (5 hunks)
  • app/components/chat.tsx (7 hunks)
  • app/styles/globals.scss (1 hunks)
  • app/utils.ts (1 hunks)
  • app/utils/audio.ts (1 hunks)
  • package.json (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • package.json
🧰 Additional context used
🪛 Biome
app/components/chat.tsx

[error] 1127-1130: The assignment should not be in an expression.

The use of assignments in expressions is confusing.
Expressions are often considered as side-effect free.

(lint/suspicious/noAssignInExpressions)

🔇 Additional comments (5)
app/utils/audio.ts (1)

Line range hint 1-96: Overall, the audio utility functions are well-implemented with room for minor improvements.

The new arrayBufferToWav function correctly implements the conversion from ArrayBuffer to WAV format, and the writeString helper function serves its purpose well. The existing createTTSPlayer function remains unchanged and functional.

To further enhance this file:

  1. Consider implementing the suggested parameterization for arrayBufferToWav.
  2. Add the recommended bounds checking to writeString.
  3. If possible, add unit tests to ensure the correctness of the WAV conversion process.

These changes will improve the flexibility, safety, and reliability of the audio utilities.

app/styles/globals.scss (1)

403-410: Audio styles align with PR objectives but need minor adjustments.

The addition of these audio styles supports the PR's objective of implementing audio message support in the chat application. These styles will help integrate the audio elements visually within the UI, allowing for a consistent look and feel.

However, as mentioned in the previous comment, there are minor issues with browser compatibility and CSS syntax that should be addressed. Once these adjustments are made, the audio styling will be more robust and consistent across different browsers.

app/client/api.ts (1)

42-42: LGTM! Consider adding documentation and reviewing usage.

The addition of the optional audio_url property to the RequestMessage interface is a good approach to support the new TTS caching feature. It's non-breaking and aligns well with the PR objectives.

To improve this further:

  1. Consider adding a comment to explain the purpose and expected format of the audio_url. For example:

    /** URL of the cached TTS audio file associated with this message */
    audio_url?: string;
  2. It's recommended to review other parts of the codebase that create or consume RequestMessage objects to ensure they properly handle this new property where applicable.

To verify the impact and usage of this new property, you can run the following script:

✅ Verification successful

LGTM! The audio_url property is properly implemented and utilized throughout the codebase.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for usages of RequestMessage and potential places to update
echo "Searching for RequestMessage usages:"
rg --type typescript "RequestMessage" -C 3

echo "\nSearching for places where messages are created or processed:"
rg --type typescript "(createMessage|processMessage)" -C 3

Length of output: 393


Script:

#!/bin/bash
# Search for usages of RequestMessage in TypeScript files
echo "Searching for RequestMessage usages:"
rg "RequestMessage" -C 3 --glob "*.ts" --glob "*.tsx"

echo "\nSearching for places where messages are created or processed:"
rg "(createMessage|processMessage)" -C 3 --glob "*.ts" --glob "*.tsx"

Length of output: 8071


Script:

#!/bin/bash
# Search for usages of audio_url in TypeScript files
echo "Searching for 'audio_url' usages in RequestMessage instances:"
rg "audio_url" --glob "*.ts" --glob "*.tsx"

Length of output: 656

app/utils.ts (1)

Line range hint 1-453: Consider enhancing utility functions for audio caching support

While the PR objectives mention implementing a feature to cache TTS audio files, the current changes in this file are minimal and don't directly support this functionality. Consider the following suggestions to enhance the utility functions for better audio caching support:

  1. Extend the fetch function to handle audio file downloads efficiently.
  2. Modify the adapter function to include specific handling for audio-related API calls.
  3. Utilize the safeLocalStorage function for storing and retrieving cached audio data.

Here are some specific recommendations:

  1. Add a new utility function for audio caching:

    export function cacheAudioFile(url: string, audioData: ArrayBuffer): void {
      const storage = safeLocalStorage();
      storage.setItem(`audio_cache_${url}`, arrayBufferToBase64(audioData));
    }
    
    export function getCachedAudioFile(url: string): ArrayBuffer | null {
      const storage = safeLocalStorage();
      const cachedData = storage.getItem(`audio_cache_${url}`);
      return cachedData ? base64ToArrayBuffer(cachedData) : null;
    }
    
    // Helper functions for ArrayBuffer <-> Base64 conversion
    function arrayBufferToBase64(buffer: ArrayBuffer): string {
      // Implementation here
    }
    
    function base64ToArrayBuffer(base64: string): ArrayBuffer {
      // Implementation here
    }
  2. Modify the fetch function to check for cached audio files:

    export async function fetch(
      url: string,
      options?: Record<string, unknown>,
    ): Promise<any> {
      if (options?.method === 'GET' && url.endsWith('.mp3')) {
        const cachedAudio = getCachedAudioFile(url);
        if (cachedAudio) {
          return new Response(cachedAudio, { status: 200 });
        }
      }
      
      // Existing fetch logic here
    }
  3. Update the adapter function to handle audio caching:

    export function adapter(config: Record<string, unknown>) {
      // Existing adapter logic here
      
      return fetch(fetchUrl as string, { ...rest, body }).then(async (res) => {
        const { status, headers, statusText } = res;
        if (fetchUrl.endsWith('.mp3') && status === 200) {
          const audioData = await res.arrayBuffer();
          cacheAudioFile(fetchUrl, audioData);
        }
        return res
          .text()
          .then((data: string) => ({ status, statusText, headers, data }));
      });
    }

These changes will provide a foundation for implementing the audio caching feature. Remember to update the relevant components that will use these utility functions for audio playback and caching.

To ensure that these changes align with the existing codebase structure and naming conventions, run the following command:

rg --type typescript 'function (fetch|adapter)' app/

This will help us verify the current implementation of fetch and adapter functions across the application.

app/components/chat.module.scss (1)

478-481: LGTM: Audio message item styling

The new .chat-message-item-audio class provides appropriate styling for audio elements within chat messages. The full width and top margin ensure good layout and spacing.

app/utils/audio.ts Outdated Show resolved Hide resolved
app/styles/globals.scss Outdated Show resolved Hide resolved
app/utils.ts Outdated Show resolved Hide resolved
app/components/chat.tsx Outdated Show resolved Hide resolved
app/components/chat.tsx Outdated Show resolved Hide resolved
app/components/chat.tsx Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (4)
app/utils/audio.ts (1)

47-89: LGTM! Well-implemented WAV conversion function.

The arrayBufferToWav function correctly implements the conversion from raw PCM data to a WAV format. The WAV header structure is accurately constructed, and the PCM data is properly copied.

Consider the following improvements:

  1. Add input validation for the buffer parameter to ensure it's not null or empty.
  2. Make the audio properties (channels, sample rate, bits per sample) configurable parameters instead of hardcoding them. This would make the function more flexible for different audio formats.

Example implementation:

export function arrayBufferToWav(
  buffer: ArrayBuffer,
  {
    numOfChannels = 1,
    sampleRate = 24000,
    bitsPerSample = 16
  }: {
    numOfChannels?: number,
    sampleRate?: number,
    bitsPerSample?: number
  } = {}
): ArrayBuffer {
  if (!buffer || buffer.byteLength === 0) {
    throw new Error("Invalid or empty buffer provided");
  }

  // Rest of the function remains the same, using the parameters instead of hardcoded values
  // ...
}

This change would allow users to specify different audio formats while keeping the current values as defaults.

app/components/chat.tsx (3)

1124-1130: Approve changes with a minor optimization suggestion.

The updateMessageAudio function is well-implemented and correctly updates the audio_url property of a specific message. It follows React state update best practices by using the functional update form of chatStore.updateCurrentSession.

Consider using the Array.prototype.find() method instead of map() if you only need to update a single message. This could be slightly more efficient, especially for large message arrays:

 const updateMessageAudio = (msgId?: string, audio_url?: string) => {
   chatStore.updateCurrentSession((session) => {
-    session.messages = session.messages.map((m) =>
-      m.id === msgId ? { ...m, audio_url } : m,
-    );
+    const message = session.messages.find((m) => m.id === msgId);
+    if (message) {
+      message.audio_url = audio_url;
+    }
   });
 };

Line range hint 1208-1253: Refactor for improved maintainability and error handling.

The openaiSpeech function has been significantly expanded to handle multiple TTS engines and audio processing. While it's functional, there are several areas for improvement:

  1. The function is handling too many responsibilities, making it complex and hard to maintain.
  2. Error handling could be more robust, especially for the audio processing and upload steps.
  3. The function is modifying global state, which can lead to unexpected behavior.

Consider refactoring this function into smaller, more focused functions:

  1. Split the TTS engine selection and audio generation into separate functions.
  2. Create a dedicated function for audio processing and uploading.
  3. Implement more granular error handling for each step.
  4. Use a state management solution (e.g., React's useState or useReducer) instead of modifying global variables.

Here's a high-level example of how you might restructure this:

const generateAudio = async (text: string, config: TTSConfig): Promise<ArrayBuffer> => {
  // Handle TTS engine selection and audio generation
};

const processAndUploadAudio = async (audioBuffer: ArrayBuffer): Promise<string> => {
  // Convert to WAV, upload, and return URL
};

const openaiSpeech = async (text: string): Promise<string | undefined> => {
  try {
    setSpeechLoading(true);
    const audioBuffer = await generateAudio(text, config.ttsConfig);
    const audioUrl = await processAndUploadAudio(audioBuffer);
    await playAudio(audioBuffer);
    return audioUrl;
  } catch (error) {
    console.error("[Speech Error]", error);
    showToast(prettyObject(error));
    return undefined;
  } finally {
    setSpeechLoading(false);
  }
};

This structure would make the code more modular, easier to test, and simpler to maintain.


Line range hint 1850-1912: Approve audio message rendering with a suggestion.

The audio message rendering is implemented correctly. It's properly conditional based on the presence of an audio_url, and the audio element is well-structured with appropriate attributes.

Consider adding an error handler for the audio element to improve user experience:

 <audio
   preload="auto"
   controls
   className={styles["chat-message-item-audio"]}
+  onError={(e) => {
+    console.error("Audio playback error:", e);
+    showToast(Locale.Chat.AudioPlaybackError);
+  }}
 >
   <source type="audio/mp3" src={message.audio_url} />
   Sorry, your browser does not support HTML5 audio.
 </audio>

This will help catch and handle any audio playback errors, providing a better user experience.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between e64a4f2 and 85e70f5.

📒 Files selected for processing (4)
  • app/components/chat.module.scss (4 hunks)
  • app/components/chat.tsx (7 hunks)
  • app/styles/globals.scss (1 hunks)
  • app/utils/audio.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • app/components/chat.module.scss
  • app/styles/globals.scss
🧰 Additional context used
🔇 Additional comments (3)
app/utils/audio.ts (2)

92-99: LGTM! Bounds checking implemented as suggested.

The writeString helper function is well-implemented and now includes the bounds checking as suggested in the previous review. This addition prevents potential buffer overruns, making the function more robust and safe to use.


Line range hint 1-99: Summary: Audio utility functions well-implemented, aligning with PR objectives.

The changes in this file, particularly the addition of the arrayBufferToWav function, directly support the PR's objective of implementing audio caching and playback features. This function will be crucial in converting the TTS audio data to a format that can be easily stored and played back.

The implementation is solid, with only minor suggestions for improvement. These changes, combined with the existing createTTSPlayer function, provide a robust foundation for the new audio-related features described in the PR objectives.

app/components/chat.tsx (1)

1810-1815: Approve audio playback integration.

The audio playback functionality is well-integrated into the existing chat actions. It correctly calls the openaiSpeech function and updates the message with the new audio URL.

@Leizhenpeng
Copy link
Member

Leizhenpeng commented Oct 14, 2024

这个功能设计是可以的 但是有下面几个需求需要改一下:

  1. 不是所有人都有下载音频的这个需求 往往这个需求对于一部分人来说多的 可以在配置里面加一个开关 因为他的确占据了不少信息量
  2. 第二下载音频的这个东西 呃 用户点播放 它会跟上面那个播放是冲突的 , 用户很可能就会听到两段音频 都在播放 , 而且进度还不一致

所以 至少要保证同一条文本 啊 在点击上面那个顶端右最右边那个播放按钮的时候 它的播放跟下面的进度条应该是同步的

而且他们两个的播放状态应该是一致的

20241014-105549.mp4

@Dakai

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


This functional design is possible, but there are several requirements that need to be changed:

  1. Not everyone has the need to download audio. Often this need is too much for some people. You can add a switch in the configuration because it does occupy a lot of information.
  2. The second downloaded audio thing, uh, when the user clicks to play it, it will conflict with the playback above. The user will probably hear both pieces of audio playing.
    At least the same text must be ensured. When you click the play button on the far right at the top, its playback should be synchronized with the progress bar below.
20241014-105549.mp4

@Dakai

@@ -39,6 +39,7 @@ export interface MultimodalContent {
export interface RequestMessage {
role: MessageRole;
content: string | MultimodalContent[];
audio_url?: string;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里需要多测试一下不同的模型。因为不同的client/platform/xxxx.ts里面处理消息的逻辑可能是不一致的。要测一下不同的模型,这里加了一个字段会不会有影响。

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

应该去扩展ChatMessage的类型 而不是RequestMessage
RequestMessage是模型参数
audio_url属于扩展属性

const audioFile = new Blob([waveFile], { type: "audio/wav" });

const audioUrl: string = await uploadImageRemote(audioFile);
await ttsPlayer.play(audioBuffer, () => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不应该在保存音频之后再播放。这样会增加延迟。体验不好

app/components/chat.tsx Show resolved Hide resolved
@lloydzhou
Copy link
Member

前面提到的,可能听到两个声音。

是不是可以考虑,这个功能“侵入性”不那么强(也就是说,不展示音频,还是只保留原来的播放按钮。只是,存在audio_url的情况下,直接下载,播放即可,不用再次调用llm)这样也就不存在两个播放声音的问题了。

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


As mentioned earlier, two voices may be heard.

Is it possible to consider that this function is less "intrusive" (that is, the audio is not displayed, and the original play button is still retained. However, if audio_url exists, it can be downloaded and played directly without calling llm again. ) In this way, there is no problem of two playing sounds.

@lloydzhou
Copy link
Member

tts以及stt相关功能可能会暂停一下
优先处理openai realtime api

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


tts and stt related functions may be suspended for a while
Prioritize openai realtime api

showToast(prettyObject(e));
try {
const waveFile = arrayBufferToWav(audioBuffer);
const audioFile = new Blob([waveFile], { type: "audio/wav" });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wav格式体积比较大,后面可以尝试增加mp3格式进行保存?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants