-
Notifications
You must be signed in to change notification settings - Fork 59.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: cache tts audio 缓存tts语音 #5650
base: main
Are you sure you want to change the base?
Conversation
@Dakai is attempting to deploy a commit to the NextChat Team on Vercel. A member of the Team first needs to authorize it. |
WalkthroughThe pull request introduces several enhancements across multiple files, primarily focusing on the integration of audio message support in the chat application. Key changes include the addition of an optional Changes
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Your build has completed! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Outside diff range and nitpick comments (8)
app/utils/audio.ts (1)
47-89
: Overall implementation looks good, with some suggestions for improvement.The
arrayBufferToWav
function correctly implements the conversion from ArrayBuffer to WAV format. However, there are a few areas where it could be improved:
Consider parameterizing the audio properties (number of channels, sample rate, bits per sample) instead of hard-coding them. This would make the function more flexible for different audio formats.
Add input validation to ensure the provided ArrayBuffer is not empty and has a valid size for audio data.
Here's a suggested refactor to address these points:
export function arrayBufferToWav( buffer: ArrayBuffer, numOfChannels: number = 1, sampleRate: number = 24000, bitsPerSample: number = 16 ): ArrayBuffer { if (buffer.byteLength === 0) { throw new Error("Input buffer is empty"); } // Rest of the function remains the same, using the parameterized values // ... }This change allows for more flexibility in audio formats while maintaining the default values currently used.
app/utils.ts (1)
Line range hint
1-453
: Summary and Next StepsThe changes in this file are minimal but indicate the intention to add audio support, which aligns with the PR objectives. To fully implement the audio caching feature, consider the following next steps:
- Implement the
getMessageAudio
function correctly or remove it if not needed yet.- Enhance existing utility functions (
fetch
,adapter
,safeLocalStorage
) to support audio file caching and retrieval.- Add new utility functions specifically for audio caching if necessary.
- Ensure that all changes are consistent with the existing codebase structure and naming conventions.
- Add appropriate error handling and logging for audio-related operations.
- Update relevant components to use these utility functions for audio playback and caching.
Before proceeding with the implementation, it would be beneficial to create a detailed design document outlining the audio caching strategy, including:
- Data structure for storing cached audio files
- Caching policy (e.g., expiration, size limits)
- Error handling and fallback mechanisms
- Performance considerations for audio file retrieval and playback
This will ensure a more robust and scalable implementation of the audio caching feature.
app/components/chat.module.scss (3)
433-433
: Consider responsive design for message widthThe change from
max-width: 100%
tomax-width: 300px
might improve readability on larger screens, but it could cause issues on smaller devices or with messages containing long words or URLs.Consider using a responsive approach, such as:
max-width: min(300px, 100%);This will ensure the message width is either 300px or 100% of its container, whichever is smaller, providing better adaptability across different screen sizes.
446-448
: Ensure responsiveness for audio messagesThe new
.audio-message
class with a fixedmin-width
of 350px might cause layout issues on smaller screens.Consider using a responsive approach:
.audio-message { min-width: min(350px, 100%); width: 100%; }This will ensure the audio message takes up the full width of its container on smaller screens while maintaining the desired minimum width on larger screens.
530-530
: Clean up commented codeThe
min-width: 350px
property on hover has been commented out. If this change is intentional and final, consider removing the commented line entirely to keep the codebase clean.If you're still considering this feature, it might be helpful to add a comment explaining why it's temporarily disabled or under what conditions it might be re-enabled.
app/components/chat.tsx (3)
Line range hint
1214-1215
: Useconst
instead ofvar
for variable declarationThe use of
var
is outdated in modern TypeScript and can lead to unexpected behavior due to its function scoping. Sinceapi
is not reassigned, consider usingconst
for better readability and to prevent unintended reassignments.Apply this diff to fix the declaration:
- var api: ClientApi; - api = new ClientApi(ModelProvider.GPT); + const api = new ClientApi(ModelProvider.GPT);
Line range hint
1220-1220
: Use ES6import
instead ofrequire
for module importIn TypeScript, it's recommended to use ES6 module syntax (
import
) over CommonJSrequire
. This improves compatibility and readability.Change the
require
statement to animport
:- const { markdownToTxt } = require("markdown-to-txt"); + import { markdownToTxt } from "markdown-to-txt";Note: Ensure that this change is compatible with your build and module resolution settings.
1811-1816
: Check forurl
before updating message audioIf
openaiSpeech
fails and returnsundefined
, theupdateMessageAudio
function is still called with an undefinedurl
, which may not be necessary. Consider adding a check to ensureurl
is defined before updating the message.Here is a suggested change:
onClick={async () => { const url = await openaiSpeech( getMessageTextContent(message), ); + if (url) { updateMessageAudio(message.id, url); + } }}
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (2)
app/icons/play.svg
is excluded by!**/*.svg
app/icons/stop.svg
is excluded by!**/*.svg
📒 Files selected for processing (7)
- app/client/api.ts (1 hunks)
- app/components/chat.module.scss (5 hunks)
- app/components/chat.tsx (7 hunks)
- app/styles/globals.scss (1 hunks)
- app/utils.ts (1 hunks)
- app/utils/audio.ts (1 hunks)
- package.json (1 hunks)
✅ Files skipped from review due to trivial changes (1)
- package.json
🧰 Additional context used
🪛 Biome
app/components/chat.tsx
[error] 1127-1130: The assignment should not be in an expression.
The use of assignments in expressions is confusing.
Expressions are often considered as side-effect free.(lint/suspicious/noAssignInExpressions)
🔇 Additional comments (5)
app/utils/audio.ts (1)
Line range hint
1-96
: Overall, the audio utility functions are well-implemented with room for minor improvements.The new
arrayBufferToWav
function correctly implements the conversion from ArrayBuffer to WAV format, and thewriteString
helper function serves its purpose well. The existingcreateTTSPlayer
function remains unchanged and functional.To further enhance this file:
- Consider implementing the suggested parameterization for
arrayBufferToWav
.- Add the recommended bounds checking to
writeString
.- If possible, add unit tests to ensure the correctness of the WAV conversion process.
These changes will improve the flexibility, safety, and reliability of the audio utilities.
app/styles/globals.scss (1)
403-410
: Audio styles align with PR objectives but need minor adjustments.The addition of these audio styles supports the PR's objective of implementing audio message support in the chat application. These styles will help integrate the audio elements visually within the UI, allowing for a consistent look and feel.
However, as mentioned in the previous comment, there are minor issues with browser compatibility and CSS syntax that should be addressed. Once these adjustments are made, the audio styling will be more robust and consistent across different browsers.
app/client/api.ts (1)
42-42
: LGTM! Consider adding documentation and reviewing usage.The addition of the optional
audio_url
property to theRequestMessage
interface is a good approach to support the new TTS caching feature. It's non-breaking and aligns well with the PR objectives.To improve this further:
Consider adding a comment to explain the purpose and expected format of the
audio_url
. For example:/** URL of the cached TTS audio file associated with this message */ audio_url?: string;It's recommended to review other parts of the codebase that create or consume
RequestMessage
objects to ensure they properly handle this new property where applicable.To verify the impact and usage of this new property, you can run the following script:
✅ Verification successful
LGTM! The
audio_url
property is properly implemented and utilized throughout the codebase.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for usages of RequestMessage and potential places to update echo "Searching for RequestMessage usages:" rg --type typescript "RequestMessage" -C 3 echo "\nSearching for places where messages are created or processed:" rg --type typescript "(createMessage|processMessage)" -C 3Length of output: 393
Script:
#!/bin/bash # Search for usages of RequestMessage in TypeScript files echo "Searching for RequestMessage usages:" rg "RequestMessage" -C 3 --glob "*.ts" --glob "*.tsx" echo "\nSearching for places where messages are created or processed:" rg "(createMessage|processMessage)" -C 3 --glob "*.ts" --glob "*.tsx"Length of output: 8071
Script:
#!/bin/bash # Search for usages of audio_url in TypeScript files echo "Searching for 'audio_url' usages in RequestMessage instances:" rg "audio_url" --glob "*.ts" --glob "*.tsx"Length of output: 656
app/utils.ts (1)
Line range hint
1-453
: Consider enhancing utility functions for audio caching supportWhile the PR objectives mention implementing a feature to cache TTS audio files, the current changes in this file are minimal and don't directly support this functionality. Consider the following suggestions to enhance the utility functions for better audio caching support:
- Extend the
fetch
function to handle audio file downloads efficiently.- Modify the
adapter
function to include specific handling for audio-related API calls.- Utilize the
safeLocalStorage
function for storing and retrieving cached audio data.Here are some specific recommendations:
Add a new utility function for audio caching:
export function cacheAudioFile(url: string, audioData: ArrayBuffer): void { const storage = safeLocalStorage(); storage.setItem(`audio_cache_${url}`, arrayBufferToBase64(audioData)); } export function getCachedAudioFile(url: string): ArrayBuffer | null { const storage = safeLocalStorage(); const cachedData = storage.getItem(`audio_cache_${url}`); return cachedData ? base64ToArrayBuffer(cachedData) : null; } // Helper functions for ArrayBuffer <-> Base64 conversion function arrayBufferToBase64(buffer: ArrayBuffer): string { // Implementation here } function base64ToArrayBuffer(base64: string): ArrayBuffer { // Implementation here }Modify the
fetch
function to check for cached audio files:export async function fetch( url: string, options?: Record<string, unknown>, ): Promise<any> { if (options?.method === 'GET' && url.endsWith('.mp3')) { const cachedAudio = getCachedAudioFile(url); if (cachedAudio) { return new Response(cachedAudio, { status: 200 }); } } // Existing fetch logic here }Update the
adapter
function to handle audio caching:export function adapter(config: Record<string, unknown>) { // Existing adapter logic here return fetch(fetchUrl as string, { ...rest, body }).then(async (res) => { const { status, headers, statusText } = res; if (fetchUrl.endsWith('.mp3') && status === 200) { const audioData = await res.arrayBuffer(); cacheAudioFile(fetchUrl, audioData); } return res .text() .then((data: string) => ({ status, statusText, headers, data })); }); }These changes will provide a foundation for implementing the audio caching feature. Remember to update the relevant components that will use these utility functions for audio playback and caching.
To ensure that these changes align with the existing codebase structure and naming conventions, run the following command:
rg --type typescript 'function (fetch|adapter)' app/
This will help us verify the current implementation of
fetch
andadapter
functions across the application.app/components/chat.module.scss (1)
478-481
: LGTM: Audio message item stylingThe new
.chat-message-item-audio
class provides appropriate styling for audio elements within chat messages. The full width and top margin ensure good layout and spacing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (4)
app/utils/audio.ts (1)
47-89
: LGTM! Well-implemented WAV conversion function.The
arrayBufferToWav
function correctly implements the conversion from raw PCM data to a WAV format. The WAV header structure is accurately constructed, and the PCM data is properly copied.Consider the following improvements:
- Add input validation for the
buffer
parameter to ensure it's not null or empty.- Make the audio properties (channels, sample rate, bits per sample) configurable parameters instead of hardcoding them. This would make the function more flexible for different audio formats.
Example implementation:
export function arrayBufferToWav( buffer: ArrayBuffer, { numOfChannels = 1, sampleRate = 24000, bitsPerSample = 16 }: { numOfChannels?: number, sampleRate?: number, bitsPerSample?: number } = {} ): ArrayBuffer { if (!buffer || buffer.byteLength === 0) { throw new Error("Invalid or empty buffer provided"); } // Rest of the function remains the same, using the parameters instead of hardcoded values // ... }This change would allow users to specify different audio formats while keeping the current values as defaults.
app/components/chat.tsx (3)
1124-1130
: Approve changes with a minor optimization suggestion.The
updateMessageAudio
function is well-implemented and correctly updates theaudio_url
property of a specific message. It follows React state update best practices by using the functional update form ofchatStore.updateCurrentSession
.Consider using the
Array.prototype.find()
method instead ofmap()
if you only need to update a single message. This could be slightly more efficient, especially for large message arrays:const updateMessageAudio = (msgId?: string, audio_url?: string) => { chatStore.updateCurrentSession((session) => { - session.messages = session.messages.map((m) => - m.id === msgId ? { ...m, audio_url } : m, - ); + const message = session.messages.find((m) => m.id === msgId); + if (message) { + message.audio_url = audio_url; + } }); };
Line range hint
1208-1253
: Refactor for improved maintainability and error handling.The
openaiSpeech
function has been significantly expanded to handle multiple TTS engines and audio processing. While it's functional, there are several areas for improvement:
- The function is handling too many responsibilities, making it complex and hard to maintain.
- Error handling could be more robust, especially for the audio processing and upload steps.
- The function is modifying global state, which can lead to unexpected behavior.
Consider refactoring this function into smaller, more focused functions:
- Split the TTS engine selection and audio generation into separate functions.
- Create a dedicated function for audio processing and uploading.
- Implement more granular error handling for each step.
- Use a state management solution (e.g., React's useState or useReducer) instead of modifying global variables.
Here's a high-level example of how you might restructure this:
const generateAudio = async (text: string, config: TTSConfig): Promise<ArrayBuffer> => { // Handle TTS engine selection and audio generation }; const processAndUploadAudio = async (audioBuffer: ArrayBuffer): Promise<string> => { // Convert to WAV, upload, and return URL }; const openaiSpeech = async (text: string): Promise<string | undefined> => { try { setSpeechLoading(true); const audioBuffer = await generateAudio(text, config.ttsConfig); const audioUrl = await processAndUploadAudio(audioBuffer); await playAudio(audioBuffer); return audioUrl; } catch (error) { console.error("[Speech Error]", error); showToast(prettyObject(error)); return undefined; } finally { setSpeechLoading(false); } };This structure would make the code more modular, easier to test, and simpler to maintain.
Line range hint
1850-1912
: Approve audio message rendering with a suggestion.The audio message rendering is implemented correctly. It's properly conditional based on the presence of an
audio_url
, and the audio element is well-structured with appropriate attributes.Consider adding an error handler for the audio element to improve user experience:
<audio preload="auto" controls className={styles["chat-message-item-audio"]} + onError={(e) => { + console.error("Audio playback error:", e); + showToast(Locale.Chat.AudioPlaybackError); + }} > <source type="audio/mp3" src={message.audio_url} /> Sorry, your browser does not support HTML5 audio. </audio>This will help catch and handle any audio playback errors, providing a better user experience.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (4)
- app/components/chat.module.scss (4 hunks)
- app/components/chat.tsx (7 hunks)
- app/styles/globals.scss (1 hunks)
- app/utils/audio.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- app/components/chat.module.scss
- app/styles/globals.scss
🧰 Additional context used
🔇 Additional comments (3)
app/utils/audio.ts (2)
92-99
: LGTM! Bounds checking implemented as suggested.The
writeString
helper function is well-implemented and now includes the bounds checking as suggested in the previous review. This addition prevents potential buffer overruns, making the function more robust and safe to use.
Line range hint
1-99
: Summary: Audio utility functions well-implemented, aligning with PR objectives.The changes in this file, particularly the addition of the
arrayBufferToWav
function, directly support the PR's objective of implementing audio caching and playback features. This function will be crucial in converting the TTS audio data to a format that can be easily stored and played back.The implementation is solid, with only minor suggestions for improvement. These changes, combined with the existing
createTTSPlayer
function, provide a robust foundation for the new audio-related features described in the PR objectives.app/components/chat.tsx (1)
1810-1815
: Approve audio playback integration.The audio playback functionality is well-integrated into the existing chat actions. It correctly calls the
openaiSpeech
function and updates the message with the new audio URL.
这个功能设计是可以的 但是有下面几个需求需要改一下:
所以 至少要保证同一条文本 啊 在点击上面那个顶端右最右边那个播放按钮的时候 它的播放跟下面的进度条应该是同步的 而且他们两个的播放状态应该是一致的 20241014-105549.mp4 |
This functional design is possible, but there are several requirements that need to be changed:
20241014-105549.mp4 |
@@ -39,6 +39,7 @@ export interface MultimodalContent { | |||
export interface RequestMessage { | |||
role: MessageRole; | |||
content: string | MultimodalContent[]; | |||
audio_url?: string; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里需要多测试一下不同的模型。因为不同的client/platform/xxxx.ts里面处理消息的逻辑可能是不一致的。要测一下不同的模型,这里加了一个字段会不会有影响。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
应该去扩展ChatMessage的类型 而不是RequestMessage
RequestMessage是模型参数
audio_url属于扩展属性
const audioFile = new Blob([waveFile], { type: "audio/wav" }); | ||
|
||
const audioUrl: string = await uploadImageRemote(audioFile); | ||
await ttsPlayer.play(audioBuffer, () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不应该在保存音频之后再播放。这样会增加延迟。体验不好
前面提到的,可能听到两个声音。 是不是可以考虑,这个功能“侵入性”不那么强(也就是说,不展示音频,还是只保留原来的播放按钮。只是,存在audio_url的情况下,直接下载,播放即可,不用再次调用llm)这样也就不存在两个播放声音的问题了。 |
As mentioned earlier, two voices may be heard. Is it possible to consider that this function is less "intrusive" (that is, the audio is not displayed, and the original play button is still retained. However, if audio_url exists, it can be downloaded and played directly without calling llm again. ) In this way, there is no problem of two playing sounds. |
tts以及stt相关功能可能会暂停一下 |
tts and stt related functions may be suspended for a while |
showToast(prettyObject(e)); | ||
try { | ||
const waveFile = arrayBufferToWav(audioBuffer); | ||
const audioFile = new Blob([waveFile], { type: "audio/wav" }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wav格式体积比较大,后面可以尝试增加mp3格式进行保存?
💻 变更类型 | Change Type
🔀 变更说明 | Description of Change
按下 tts 语音播放按钮之后会自动缓存语音文件,在消息框内可以播放、调整音量、播放速度和下载。
📝 补充信息 | Additional Information
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Chores