Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add tts #5459

Merged
merged 13 commits into from
Sep 18, 2024
Merged

add tts #5459

merged 13 commits into from
Sep 18, 2024

Conversation

DDMeaqua
Copy link
Contributor

@DDMeaqua DDMeaqua commented Sep 18, 2024

💻 变更类型 | Change Type

  • feat
  • fix
  • refactor
  • perf
  • style
  • test
  • docs
  • ci
  • chore
  • build

🔀 变更说明 | Description of Change

📝 补充信息 | Additional Information

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced text-to-speech (TTS) functionality, allowing users to listen to chat messages.
    • Added new speech methods across various API classes to support future speech processing capabilities.
    • Enhanced TTS configurations with default settings for voice models and speeds.
    • Implemented new icons for speech actions in the chat component, improving user interaction.
  • Bug Fixes

    • Enhanced error handling for speech requests.

These updates significantly enhance user interaction and accessibility within the application.

Copy link

vercel bot commented Sep 18, 2024

@DDMeaqua is attempting to deploy a commit to the NextChat Team on Vercel.

A member of the Team first needs to authorize it.

Copy link
Contributor

coderabbitai bot commented Sep 18, 2024

Walkthrough

The changes introduce enhancements to the application's text-to-speech (TTS) capabilities across multiple files. New constants, interfaces, and methods are added to support speech processing, including a structured approach for configuring speech options. The chat component integrates TTS functionality, allowing users to listen to messages. Additionally, a new dependency is added for Markdown processing, expanding the application's overall functionality related to audio and user interaction.

Changes

File Path Change Summary
app/client/api.ts Added constants for TTS models, defined SpeechOptions interface, updated LLMApi with a new abstract speech method, and modified getHeaders function.
app/client/platforms/*.ts Added speech methods to various platform APIs (ClaudeApi, SparkApi, MoonshotApi, ChatGPTApi) with unimplemented functionality.
app/components/chat.tsx Introduced TTS functionality with new icons, state management for speech playback, and an openaiSpeech function for handling speech requests.
app/constant.ts Added constants related to TTS, including SpeechPath and various default settings for TTS engines, models, and voices.
package.json Added dependency markdown-to-txt for converting Markdown content to plain text.

Possibly related PRs

  • fix: safaLocalStorage #5386: This PR introduces a new abstraction for local storage management in app/components/chat.tsx, which aligns with the changes in the main PR that enhance speech processing functionality in the same file.
  • feat: add shortcut key #5396: The addition of the ShortcutKeyModal in app/components/chat.tsx relates to the main PR's updates in the same file, as both involve enhancements to user interaction and functionality within the chat component.
  • feat: summarize model customization #5426: This PR modifies the app/components/chat.tsx to improve chat session summarization, which is relevant to the main PR's focus on enhancing the API's capabilities, particularly in handling speech processing requests.

Suggested reviewers

  • Dogtiti
  • lloydzhou

🐰 In the land of code, where rabbits play,
New voices arise to brighten the day.
With TTS in tow, we hop and we cheer,
Listening to chats, oh so crystal clear!
So let’s celebrate this joyful feat,
For every new change makes our work sweet! 🎉


Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between a3b6647 and 10d7a64.

Files selected for processing (1)
  • app/client/platforms/openai.ts (2 hunks)
Additional comments not posted (2)
app/client/platforms/openai.ts (2)

145-181: LGTM!

The speech function is a well-structured and reusable method for making speech requests to the OpenAI API. It correctly constructs the request payload, handles timeouts using AbortController, and manages errors appropriately. The use of ArrayBuffer for the response data is suitable for speech data.

Great job on implementing this new capability!


36-36: Remove the unused model parameter.

The model parameter added to the path function is not being used anywhere in the function body. As mentioned in the previous review comments, this parameter seems to serve no purpose.

To improve code clarity and maintainability, consider removing the unused model parameter from the function signature.

Apply this diff to remove the unused parameter:

-path(path: string, model?: string): string {
+path(path: string): string {

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    -- I pushed a fix in commit <commit_id>, please review it.
    -- Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    -- @coderabbitai generate unit testing code for this file.
    -- @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    -- @coderabbitai generate interesting stats about this repository and render them as a table.
    -- @coderabbitai read src/utils.ts and generate unit testing code.
    -- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    -- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

Your build has completed!

Preview deployment

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Outside diff range and nitpick comments (7)
app/components/tts.module.scss (1)

1-119: Excellent work on the plugin page styles!

The tts.module.scss file provides a well-structured and comprehensive styling framework for the plugin page layout. The styles are responsive, visually appealing, and enhance the user experience with animations and consistent design patterns.

Some key highlights:

  • The use of flexbox and media queries ensures a responsive layout that adapts to different screen sizes.
  • The BEM-like naming convention for classes promotes modularity and maintainability.
  • The styles for individual plugin items, including borders and border-radius, create a visually pleasing and organized presentation.
  • The integration of animations adds a polished and interactive feel to the user interface.

Overall, the code is clean, well-organized, and follows best practices for SCSS.

Here are a few minor suggestions for further improvement:

  1. Consider adding comments to describe the purpose of each major section or complex styles. This can help with code readability and maintainability.

  2. If the animation styles in animation.scss are specific to the plugin page, consider moving them into this file to keep the styles self-contained.

  3. To improve specificity and avoid potential conflicts with other styles, consider prefixing the class names with a unique identifier for the plugin page, such as .tts-plugin-page, .tts-plugin-item, etc.

  4. If the plugin page layout is used in multiple places, consider extracting the common styles into a separate SCSS file that can be imported and reused.

These are just minor suggestions, and the current implementation is already in great shape. Keep up the excellent work!

app/components/tts-config.tsx (1)

1-133: LGTM! The TTSConfigList component is well-structured and provides a user-friendly way to configure TTS settings.

The component is modular, using separate ListItem components for each setting, and correctly validates the selected options using TTSConfigValidator. The use of Select and InputRange components enhances the user experience.

A few suggestions to consider:

  • Add a brief description or help text for each setting to explain its purpose and impact on the TTS functionality.
  • Implement a preview feature that allows users to test the selected TTS settings with a sample text. This can help users fine-tune the settings to their preference.
Tools
Biome

[error] 28-28: The assignment should not be in an expression.

The use of assignments in expressions is confusing.
Expressions are often considered as side-effect free.

(lint/suspicious/noAssignInExpressions)


[error] 53-55: The assignment should not be in an expression.

The use of assignments in expressions is confusing.
Expressions are often considered as side-effect free.

(lint/suspicious/noAssignInExpressions)


[error] 74-76: The assignment should not be in an expression.

The use of assignments in expressions is confusing.
Expressions are often considered as side-effect free.

(lint/suspicious/noAssignInExpressions)


[error] 96-98: The assignment should not be in an expression.

The use of assignments in expressions is confusing.
Expressions are often considered as side-effect free.

(lint/suspicious/noAssignInExpressions)


[error] 122-124: The assignment should not be in an expression.

The use of assignments in expressions is confusing.
Expressions are often considered as side-effect free.

(lint/suspicious/noAssignInExpressions)

app/client/platforms/iflytek.ts (1)

62-64: Approve the method signature, but remind the developer to implement the functionality.

The speech method signature looks good and matches the expected interface for speech processing based on the SpeechOptions parameter and Promise<ArrayBuffer> return type.

However, please note that the method is currently unimplemented and throws an error. Make sure to implement the actual speech processing functionality in the future to complete the feature.

app/client/platforms/alibaba.ts (1)

87-89: Speech method is a placeholder for future implementation.

The speech method is currently a placeholder that throws an unimplemented error. This is acceptable for introducing the method signature as part of the interface.

Consider adding a TODO comment to track the pending implementation. For example:

// TODO: Implement speech method
speech(options: SpeechOptions): Promise<ArrayBuffer> {
  throw new Error("Method not implemented.");
}

Let me know if you would like assistance with implementing the speech functionality or if you have any specific questions!

app/client/platforms/baidu.ts (1)

79-81: New speech method added to support text-to-speech functionality

The addition of the speech method to the ErnieApi class is a step towards supporting the text-to-speech (TTS) feature mentioned in the PR objectives. However, the current implementation throws an error, indicating that the actual functionality is not yet developed.

To facilitate future development and maintain code clarity, consider the following suggestions:

  1. Provide more information in the PR description about the planned implementation of the speech method and how it will integrate with the existing codebase.
  2. Add TODO comments in the method body to outline the expected behavior and any dependencies.
  3. Create a follow-up issue to track the remaining implementation work and ensure it aligns with the overall project goals.

By addressing these points, the PR will provide a clearer roadmap for the TTS feature and make it easier for other contributors to understand and build upon the changes.

app/utils/ms_edge_tts.ts (2)

1-1: Remove the commented out import.

The "axios" import is commented out and not being used in the code. It's a good practice to remove unused imports to keep the code clean and maintainable.

Apply this diff to remove the unused import:

-// import axios from "axios";

251-258: Remove the commented out axios implementation.

The getVoices method has two implementations: one using axios (commented out) and another using the fetch API. Since the axios implementation is not being used, it's recommended to remove it to keep the code clean and maintainable.

Apply this diff to remove the unused axios implementation:

-  // getVoices(): Promise<Voice[]> {
-  //   return new Promise((resolve, reject) => {
-  //     axios
-  //       .get(MsEdgeTTS.VOICES_URL)
-  //       .then((res) => resolve(res.data))
-  //       .catch(reject);
-  //   });
-  // }
Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between b4dc4d3 and 3ae8ec1.

Files ignored due to path filters (4)
  • app/icons/speak-stop.svg is excluded by !**/*.svg
  • app/icons/speak.svg is excluded by !**/*.svg
  • app/icons/voice-white.svg is excluded by !**/*.svg
  • yarn.lock is excluded by !**/yarn.lock, !**/*.lock
Files selected for processing (24)
  • app/client/api.ts (4 hunks)
  • app/client/platforms/alibaba.ts (2 hunks)
  • app/client/platforms/anthropic.ts (2 hunks)
  • app/client/platforms/baidu.ts (2 hunks)
  • app/client/platforms/bytedance.ts (2 hunks)
  • app/client/platforms/google.ts (2 hunks)
  • app/client/platforms/iflytek.ts (2 hunks)
  • app/client/platforms/moonshot.ts (2 hunks)
  • app/client/platforms/openai.ts (3 hunks)
  • app/client/platforms/tencent.ts (2 hunks)
  • app/components/chat.tsx (8 hunks)
  • app/components/settings.tsx (2 hunks)
  • app/components/tts-config.tsx (1 hunks)
  • app/components/tts.module.scss (1 hunks)
  • app/constant.ts (3 hunks)
  • app/layout.tsx (1 hunks)
  • app/locales/cn.ts (3 hunks)
  • app/locales/en.ts (3 hunks)
  • app/locales/index.ts (1 hunks)
  • app/store/access.ts (2 hunks)
  • app/store/config.ts (3 hunks)
  • app/utils/audio.ts (1 hunks)
  • app/utils/ms_edge_tts.ts (1 hunks)
  • package.json (2 hunks)
Files skipped from review due to trivial changes (2)
  • app/layout.tsx
  • package.json
Additional context used
Biome
app/utils/audio.ts

[error] 3-3: void is confusing inside a union type.

Unsafe fix: Use undefined instead.

(lint/suspicious/noConfusingVoidType)


[error] 16-16: void is confusing inside a union type.

Unsafe fix: Use undefined instead.

(lint/suspicious/noConfusingVoidType)

app/components/tts-config.tsx

[error] 28-28: The assignment should not be in an expression.

The use of assignments in expressions is confusing.
Expressions are often considered as side-effect free.

(lint/suspicious/noAssignInExpressions)


[error] 53-55: The assignment should not be in an expression.

The use of assignments in expressions is confusing.
Expressions are often considered as side-effect free.

(lint/suspicious/noAssignInExpressions)


[error] 74-76: The assignment should not be in an expression.

The use of assignments in expressions is confusing.
Expressions are often considered as side-effect free.

(lint/suspicious/noAssignInExpressions)


[error] 96-98: The assignment should not be in an expression.

The use of assignments in expressions is confusing.
Expressions are often considered as side-effect free.

(lint/suspicious/noAssignInExpressions)


[error] 122-124: The assignment should not be in an expression.

The use of assignments in expressions is confusing.
Expressions are often considered as side-effect free.

(lint/suspicious/noAssignInExpressions)

Gitleaks
app/utils/ms_edge_tts.ts

121-121: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)

Additional comments not posted (47)
app/utils/audio.ts (3)

7-14: LGTM!

The createTTSPlayer function and the init method are implemented correctly. Initializing and suspending the AudioContext is a good practice to prevent unexpected audio playback.


32-42: LGTM!

The stop method is implemented correctly. It properly cleans up the AudioBufferSourceNode and the AudioContext, ensuring that resources are released when the audio playback is stopped.


44-45: LGTM!

The createTTSPlayer function correctly returns an object that implements the TTSPlayer interface, providing the necessary methods for initializing, playing, and stopping the TTS audio.

app/locales/index.ts (3)

138-138: LGTM!

The constant DEFAULT_STT_LANG is appropriately defined with a default value of "zh-CN" for the speech-to-text language.


139-159: LGTM!

The STT_LANG_MAP constant provides a clean and structured mapping between the application's language codes and their corresponding ISO language strings for speech-to-text functionality. The keys and values are consistent and follow the expected formats.


161-167: LGTM!

The getSTTLang() function provides a clean and robust way to retrieve the appropriate speech-to-text language based on the current language setting. It handles potential errors by falling back to the default STT language, ensuring a consistent behavior.

To verify the behavior of the getSTTLang() function, you can run the following script:

Verification successful

Verification successful: getSTTLang() function is correctly implemented

The getSTTLang() function in app/locales/index.ts is implemented as described in the original review comment. It correctly uses the STT_LANG_MAP and getLang() function to retrieve the appropriate speech-to-text language based on the current language setting. The error handling with a fallback to DEFAULT_STT_LANG ensures robust behavior.

Key observations:

  • STT_LANG_MAP is properly defined as a Record<Lang, string> with appropriate language mappings.
  • getLang() is a core function used throughout the application for language-related operations.
  • The try-catch block in getSTTLang() provides necessary error handling.

The implementation is consistent with the codebase and follows good practices for language handling in the application.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the behavior of the `getSTTLang()` function.

# Test 1: Search for the `getSTTLang()` function usage and analyze its behavior.
# Expect: The function should be used to retrieve the STT language based on the current language.
rg --type typescript -A 5 $'getSTTLang\(\)'

# Test 2: Search for the `STT_LANG_MAP` usage and analyze its integration with the `getSTTLang()` function.
# Expect: The `STT_LANG_MAP` should be used within the `getSTTLang()` function to map the current language to the corresponding STT language.
rg --type typescript -A 5 $'STT_LANG_MAP'

Length of output: 155


Script:

#!/bin/bash
# Description: Verify the behavior of the `getSTTLang()` function and STT_LANG_MAP usage

# Test 1: Search for the `getSTTLang()` function usage
echo "Searching for getSTTLang() function usage:"
fd -e ts -e tsx | xargs rg -n -A 5 'getSTTLang\(\)'

# Test 2: Search for the `STT_LANG_MAP` usage
echo "Searching for STT_LANG_MAP usage:"
fd -e ts -e tsx | xargs rg -n -A 5 'STT_LANG_MAP'

# Test 3: Search for the `getLang()` function usage (related to getSTTLang)
echo "Searching for getLang() function usage:"
fd -e ts -e tsx | xargs rg -n -A 5 'getLang\(\)'

Length of output: 3898

app/store/config.ts (6)

8-13: LGTM!

The imported constants for TTS configuration follow the existing naming convention and are relevant to the TTS feature being added.


20-22: LGTM!

The new type definitions for TTS models, voices, and engines provide type safety and follow the existing naming convention.


80-88: LGTM!

The new ttsConfig property in the DEFAULT_CONFIG object encapsulates all the TTS-related settings with descriptive property names and default values set using the imported constants.


94-94: LGTM!

The new TTSConfig type definition provides type safety for the ttsConfig property and is consistent with the existing type definitions.


109-122: LGTM!

The new TTSConfigValidator object provides validation methods for the TTS configuration properties, ensuring that the values conform to the expected types and constraints. It follows the existing pattern of configuration validation in the file.


123-123: LGTM!

The empty line after the TTSConfigValidator object improves code readability and is consistent with the existing code formatting.

app/store/access.ts (2)

123-125: LGTM! The addition of the edgeTTSVoiceName property enhances TTS customization.

The new edgeTTSVoiceName property in the DEFAULT_ACCESS_STATE object allows the application to specify a custom voice for text-to-speech functionality. The default value "zh-CN-YunxiNeural" suggests that a Chinese neural voice is being used as the default TTS voice, which may be suitable for Chinese-speaking users or content.

This change provides flexibility in customizing the TTS voice based on the application's requirements.


138-142: LGTM! The edgeVoiceName method provides a convenient way to retrieve the configured TTS voice name.

The new edgeVoiceName method in the useAccessStore function allows other parts of the application to fetch and retrieve the configured text-to-speech voice name. It ensures that the latest configuration is fetched from the server by calling this.fetch() before returning the value of edgeTTSVoiceName from the store.

This method can be used to customize the TTS voice based on the retrieved voice name, enhancing the flexibility and usability of the text-to-speech functionality in the application.

app/client/platforms/moonshot.ts (1)

29-29: Verify the definition of the SpeechOptions type.

The SpeechOptions type is imported but not defined in this file. Ensure that the type is correctly defined in the ../api module.

Run the following script to verify the type definition:

Verification successful

SpeechOptions type is correctly defined in the API module

The SpeechOptions interface is properly defined in the app/client/api.ts file. It includes the following properties:

  • model: string
  • input: string
  • voice: string
  • response_format?: string (optional)
  • speed?: number (optional)

This definition aligns with the expected location and structure for use in speech synthesis operations.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the definition of the `SpeechOptions` type.

# Test: Search for the type definition. Expect: Type definition found.
rg --type typescript -A 5 $'type SpeechOptions'

Length of output: 83


Script:

#!/bin/bash
# Description: Search for SpeechOptions type or interface definition in TypeScript files

# Search for type SpeechOptions
echo "Searching for 'type SpeechOptions':"
fd -e ts -e tsx | xargs rg -A 5 'type SpeechOptions'

echo -e "\nSearching for 'interface SpeechOptions':"
fd -e ts -e tsx | xargs rg -A 5 'interface SpeechOptions'

echo -e "\nSearching for SpeechOptions in api directory:"
fd -e ts -e tsx -p api | xargs rg -A 5 'SpeechOptions'

Length of output: 1262

app/client/platforms/iflytek.ts (1)

10-16: LGTM!

The import statement for SpeechOptions is correctly added and grouped with other relevant imports from the "../api" module.

app/client/api.ts (5)

29-29: LGTM!

The addition of the TTSModels constant is a good way to introduce text-to-speech models into the codebase. The use of as const provides type safety.


58-65: LGTM!

The SpeechOptions interface is well-defined and provides a structured approach for configuring speech-related requests. The inclusion of the optional onController function is a good practice for handling abort operations.


101-101: LGTM!

The addition of the speech method to the LLMApi abstract class is a good way to ensure that all implementations of the API provide speech processing functionality. The method signature is well-defined, taking SpeechOptions as input and returning a Promise<ArrayBuffer>, which is suitable for binary data like audio.


220-229: LGTM!

The modification to the getHeaders function to accept an optional ignoreHeaders parameter is a good way to provide more control over the headers included in requests. This change allows the caller to bypass the default headers if necessary, providing more flexibility in request handling.


223-228: This code segment is a duplicate of the changes reviewed in the previous code segment.

app/client/platforms/google.ts (1)

2-9: LGTM!

The import statements have been updated to include the SpeechOptions type from the ../api module, which is likely used in the implementation of the speech functionality.

app/client/platforms/anthropic.ts (1)

2-8: LGTM!

The import statement has been updated correctly to include SpeechOptions from the ../api module, following the existing import pattern. This change sets the stage for introducing speech-related functionality in the ClaudeApi class.

app/constant.ts (5)

155-155: LGTM!

The addition of the SpeechPath property to the OpenaiPath object is a valid extension to support speech-related functionalities using the OpenAI API.


262-262: Looks good!

The introduction of the DEFAULT_TTS_ENGINE constant with the value "OpenAI-TTS" is a suitable way to specify the default text-to-speech engine for the application.


263-265: Approved!

The introduction of the DEFAULT_TTS_ENGINES, DEFAULT_TTS_MODEL, and DEFAULT_TTS_VOICE constants is a good way to define default settings for TTS engines, models, and voices. This centralized configuration approach enhances maintainability and flexibility.


266-274: Looks good to me!

The introduction of the DEFAULT_TTS_MODELS and DEFAULT_TTS_VOICES constants is a suitable way to define the available options for TTS models and voices. This centralized configuration approach improves maintainability and allows for easy extension of supported models and voices.


297-297: Approved!

The addition of the "o1-preview" model to the openaiModels array is a valid way to include support for a new OpenAI model in the application.

app/client/platforms/openai.ts (4)

36-36: LGTM!

The import statement for SpeechOptions is syntactically correct and likely used in the new speech method.


151-187: LGTM!

The speech method implementation looks good:

  • It constructs the request payload correctly based on the provided options.
  • It uses AbortController for proper request timeout management.
  • It makes the POST request to the correct OpenAI speech endpoint.
  • It returns the response as an ArrayBuffer, which is the expected format.
  • It catches and logs any errors that occur during the request.
  • It logs the request payload for debugging purposes.

166-166: LGTM!

The path method is called with the correct arguments:

  • OpenaiPath.SpeechPath is likely a constant representing the speech endpoint path.
  • options.model is passed to construct the URL based on the model being used.

88-88: Verify the path method usage in the codebase.

The method signature change is syntactically correct. However, ensure that all calls to the path method have been updated to match the new signature.

Run the following script to verify the path method usage:

Verification successful

The path method usage in ChatGPTApi is consistent with the new signature.

The verification process confirms that the path method in the ChatGPTApi class (app/client/platforms/openai.ts) has been updated to path(path: string, model?: string): string, and its usage within the class is consistent with this new signature. Other classes have their own implementations of the path method, which is expected behavior for different API clients.

No issues were found regarding the usage of the modified path method in the codebase.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify all calls to the `path` method match the new signature.

# Test: Search for the method usage. Expect: No occurrences of the old signature.
rg --type typescript -A 5 $'path\(.*\)'

Length of output: 75


Script:

#!/bin/bash
# Description: Verify all calls to the `path` method match the new signature.

# Search for .ts files
echo "Searching in .ts files:"
fd -e ts -x rg -n -C 3 'path\(' {}

# Search for .js files
echo "Searching in .js files:"
fd -e js -x rg -n -C 3 'path\(' {}

Length of output: 6897

app/locales/cn.ts (5)

49-50: LGTM!

The added localization strings for speech actions look good. They are consistent with the existing format and language.


84-85: LGTM!

The added localization strings for speech commands look good. They are consistent with the existing format and language.


503-522: LGTM!

The added TTS section in the settings looks great. It provides a comprehensive set of options for users to configure text-to-speech functionality. The entries are well-structured and include clear titles and subtitles.


504-507: LGTM!

The Enable entry for enabling text-to-speech service looks good. The title and subtitle provide clear information about the functionality.


508-511: LGTM!

The Autoplay entry for enabling automatic speech playback looks good. The title and subtitle provide clear information about the functionality and its dependency on the TTS switch.

app/utils/ms_edge_tts.ts (2)

8-390: LGTM!

The MsEdgeTTS class and its associated enums and types are well-structured and provide a clean interface for interacting with Microsoft's Edge Text-to-Speech service. The code follows best practices, such as:

  • Using enums for clear parameter specification
  • Defining structured types for voice properties and speech synthesis options
  • Encapsulating WebSocket connection logic within the class
  • Providing a clean public interface for fetching voices, setting metadata, and synthesizing speech
  • Organizing the code in a modular and maintainable manner

Overall, the code changes look good and are ready to be merged.

Tools
Gitleaks

121-121: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)


339-355: LGTM!

The toArrayBuffer method provides a useful utility to obtain the synthesized audio data as an ArrayBuffer. It leverages the existing toStream method and efficiently collects the audio data chunks from the readable stream.

The implementation looks good and follows best practices:

  • It uses the data event to collect the audio data chunks in an array.
  • It concatenates the chunks into a single ArrayBuffer using Buffer.concat once the stream ends.
  • It handles errors by rejecting the promise if an error occurs during streaming.

The code is readable, maintainable, and provides a convenient way to work with the synthesized audio data in the ArrayBuffer format.

app/locales/en.ts (2)

50-51: LGTM!

The new entries Speech: "Play" and StopSpeech: "Stop" are appropriately added under the Actions section. The entry names and corresponding values are clear and align with the intended functionality of controlling speech playback.


509-529: LGTM!

The addition of the TTS section under the Settings object is a great improvement for accessibility and user experience. The sub-entries within the TTS section are well-structured and provide clear options for users to configure text-to-speech functionality.

app/components/settings.tsx (1)

1650-1660: LGTM!

The addition of the <TTSConfigList> component and the updateConfig function looks good. It allows users to manage their text-to-speech (TTS) configurations directly from the settings interface.

The updateConfig function correctly creates a copy of the existing ttsConfig, applies the updates, and then updates the config object with the modified ttsConfig.

Overall, this change enhances the functionality and user experience of the settings page.

app/components/chat.tsx (5)

1201-1242: LGTM! Nicely implemented text-to-speech functionality.

The openaiSpeech function handles the core TTS functionality effectively. It integrates well with the UI to reflect the speech status and supports fallback to a default TTS engine. Error handling is in place to log and surface any issues during playback.

Suggestions for minor improvements:

  • Add JSDoc comments to document the function parameters and return value.
  • Consider extracting the TTS engine selection logic into a separate function for better readability.

1783-1801: LGTM! The TTS chat action is well-integrated.

This code block effectively integrates the TTS functionality into the chat message actions. It provides a user-friendly way to start and stop speech playback for individual messages. The dynamic rendering based on the speechStatus state ensures that the appropriate action and icon are displayed.

Suggestion for minor improvement:

  • Consider extracting the speechStatus condition into a separate variable for better readability.

Line range hint 1920-1960: LGTM! Image pasting functionality is well-implemented.

The handlePaste function enables users to conveniently paste images directly into the chat input. It integrates seamlessly with the existing attachImages state and ensures that the total number of attached images is limited to 3. The function handles the image upload process asynchronously using the uploadImageRemote function.

Suggestion for improvement:

  • Consider adding error handling for the image upload process to gracefully handle any upload failures and provide appropriate feedback to the user.

Line range hint 1962-1998: LGTM! Image upload functionality is well-implemented.

The uploadImage function provides a user-friendly way to upload images by opening a file input dialog. It allows users to select multiple image files with specific accepted formats. The function integrates seamlessly with the existing attachImages state and ensures that the total number of attached images is limited to 3. It handles the image upload process asynchronously using the uploadImageRemote function.

Suggestion for improvement:

  • Consider adding error handling for the image upload process to gracefully handle any upload failures and provide appropriate feedback to the user.

Line range hint 1920-1960: LGTM! Attached images rendering and deletion functionality is well-implemented.

This code block provides a visual representation of the attached images within the chat input panel. It allows users to see the images they have attached before sending the message. Each attached image is displayed as a div with the image set as the background. The code block includes a delete functionality for each attached image, allowing users to remove individual images from the attachImages array using the DeleteImageButton component.

Suggestion for improvement:

  • Consider adding a confirmation dialog or undo functionality when deleting an attached image to prevent accidental deletions.

Comment on lines +1 to +5
type TTSPlayer = {
init: () => void;
play: (audioBuffer: ArrayBuffer, onended: () => void | null) => Promise<void>;
stop: () => void;
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using undefined instead of void in the callback function's return type.

The static analysis tool has flagged the usage of void in the play method's callback function as confusing when used in a union type with null. To align with the TypeScript style guide and improve clarity, consider changing void to undefined.

Apply this diff to fix the issue:

-play: (audioBuffer: ArrayBuffer, onended: () => void | null) => Promise<void>;
+play: (audioBuffer: ArrayBuffer, onended: () => undefined | null) => Promise<void>;
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
type TTSPlayer = {
init: () => void;
play: (audioBuffer: ArrayBuffer, onended: () => void | null) => Promise<void>;
stop: () => void;
};
type TTSPlayer = {
init: () => void;
play: (audioBuffer: ArrayBuffer, onended: () => undefined | null) => Promise<void>;
stop: () => void;
};
Tools
Biome

[error] 3-3: void is confusing inside a union type.

Unsafe fix: Use undefined instead.

(lint/suspicious/noConfusingVoidType)

Comment on lines +16 to +30
const play = async (audioBuffer: ArrayBuffer, onended: () => void | null) => {
if (audioBufferSourceNode) {
audioBufferSourceNode.stop();
audioBufferSourceNode.disconnect();
}

const buffer = await audioContext!.decodeAudioData(audioBuffer);
audioBufferSourceNode = audioContext!.createBufferSource();
audioBufferSourceNode.buffer = buffer;
audioBufferSourceNode.connect(audioContext!.destination);
audioContext!.resume().then(() => {
audioBufferSourceNode!.start();
});
audioBufferSourceNode.onended = onended;
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using undefined instead of void in the onended callback function's return type.

The static analysis tool has flagged the usage of void in the onended callback function as confusing when used in a union type with null. To align with the TypeScript style guide and improve clarity, consider changing void to undefined.

Apply this diff to fix the issue:

-const play = async (audioBuffer: ArrayBuffer, onended: () => void | null) => {
+const play = async (audioBuffer: ArrayBuffer, onended: () => undefined | null) => {
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const play = async (audioBuffer: ArrayBuffer, onended: () => void | null) => {
if (audioBufferSourceNode) {
audioBufferSourceNode.stop();
audioBufferSourceNode.disconnect();
}
const buffer = await audioContext!.decodeAudioData(audioBuffer);
audioBufferSourceNode = audioContext!.createBufferSource();
audioBufferSourceNode.buffer = buffer;
audioBufferSourceNode.connect(audioContext!.destination);
audioContext!.resume().then(() => {
audioBufferSourceNode!.start();
});
audioBufferSourceNode.onended = onended;
};
const play = async (audioBuffer: ArrayBuffer, onended: () => undefined | null) => {
if (audioBufferSourceNode) {
audioBufferSourceNode.stop();
audioBufferSourceNode.disconnect();
}
const buffer = await audioContext!.decodeAudioData(audioBuffer);
audioBufferSourceNode = audioContext!.createBufferSource();
audioBufferSourceNode.buffer = buffer;
audioBufferSourceNode.connect(audioContext!.destination);
audioContext!.resume().then(() => {
audioBufferSourceNode!.start();
});
audioBufferSourceNode.onended = onended;
};
Tools
Biome

[error] 16-16: void is confusing inside a union type.

Unsafe fix: Use undefined instead.

(lint/suspicious/noConfusingVoidType)

Comment on lines +76 to +78
speech(options: SpeechOptions): Promise<ArrayBuffer> {
throw new Error("Method not implemented.");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Complete the implementation of the speech function.

The speech function is currently unimplemented and throws an error. This could lead to runtime exceptions if the function is called.

Consider adding a TODO comment to track the pending implementation:

+  // TODO: Implement speech function
   speech(options: SpeechOptions): Promise<ArrayBuffer> {
     throw new Error("Method not implemented.");
   }
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
speech(options: SpeechOptions): Promise<ArrayBuffer> {
throw new Error("Method not implemented.");
}
// TODO: Implement speech function
speech(options: SpeechOptions): Promise<ArrayBuffer> {
throw new Error("Method not implemented.");
}

Comment on lines +81 to +83
speech(options: SpeechOptions): Promise<ArrayBuffer> {
throw new Error("Method not implemented.");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Speech method added but not implemented.

The speech method has been added to the DoubaoApi class, suggesting an intention to support speech-related functionality in the future. However, the method is currently not implemented and throws an error.

Consider the following suggestions:

  • Provide a clear timeline for when the speech functionality will be implemented, and update the method accordingly.
  • If the speech functionality is not planned for the near future, consider removing the method until it is fully implemented to avoid confusion and potential errors if the method is called.

Comment on lines +93 to +95
speech(options: SpeechOptions): Promise<ArrayBuffer> {
throw new Error("Method not implemented.");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implement the speech function or remove the placeholder.

The speech function has been added to the HunyuanApi class, but it is not yet implemented. The function currently throws an error indicating that the method is not implemented.

Please consider the following:

  • Review the SpeechOptions type imported from ../api to understand the expected input for this function.
  • Implement the function logic to handle speech-related operations as intended.
  • If speech functionality is not needed, remove the placeholder function to avoid runtime exceptions.

Comment on lines +66 to +68
speech(options: SpeechOptions): Promise<ArrayBuffer> {
throw new Error("Method not implemented.");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Complete the implementation of the speech method.

The speech method has been added to the GeminiProApi class, but the method body is currently throwing an error indicating that it is not implemented. To enable the speech functionality, please complete the implementation of the method body.

Consider the following:

  • Implement the necessary logic to handle the speech-related functionality based on the provided SpeechOptions.
  • Ensure that the method returns a Promise<ArrayBuffer> as per the method signature.
  • Test the implementation to verify that it works as expected.
  • Update the documentation to reflect the new speech functionality and provide usage instructions.

Comment on lines +89 to +91
speech(options: SpeechOptions): Promise<ArrayBuffer> {
throw new Error("Method not implemented.");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implement the speech method.

The speech method has been added to the ClaudeApi class, but it is not yet implemented and throws an error. To provide the intended speech functionality, please implement the method body to handle the SpeechOptions parameter and return a Promise<ArrayBuffer>.

Do you want me to generate a sample implementation for the speech method or open a GitHub issue to track this task?


export class MsEdgeTTS {
static OUTPUT_FORMAT = OUTPUT_FORMAT;
private static TRUSTED_CLIENT_TOKEN = "6A5AA1D4EAFF4E9FB37E23D68491D6F4";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: Avoid hardcoding sensitive information.

The TRUSTED_CLIENT_TOKEN property contains a hardcoded value that appears to be an API key or token. Hardcoding sensitive information directly in the source code is a security risk. If the code is pushed to a public repository or shared with others, the API key would be exposed, potentially allowing unauthorized access to the Text-to-Speech service.

To address this security concern, consider the following:

  1. Remove the hardcoded API key from the source code.
  2. Store the API key securely in an environment variable or a configuration file that is not tracked by version control.
  3. Modify the code to read the API key from the environment variable or configuration file at runtime.

Here's an example of how you can modify the code to read the API key from an environment variable:

private static TRUSTED_CLIENT_TOKEN = process.env.MS_EDGE_TTS_TOKEN;

Make sure to set the MS_EDGE_TTS_TOKEN environment variable with the actual API key value in your runtime environment.

By following this approach, you can keep sensitive information separate from the source code and reduce the risk of accidental exposure.

Tools
Gitleaks

121-121: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)

Comment on lines +85 to +86
StartSpeak: "Start Speak",
StopSpeak: "Stop Speak",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the duplicate entries.

The StartSpeak and StopSpeak entries are redundant with the Speech and StopSpeech entries added earlier under the Actions section. Having duplicate entries for the same functionality can lead to confusion and maintenance issues.

Consider removing these entries:

-    StartSpeak: "Start Speak",
-    StopSpeak: "Stop Speak",
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
StartSpeak: "Start Speak",
StopSpeak: "Stop Speak",

app/client/api.ts Outdated Show resolved Hide resolved
@@ -78,7 +79,7 @@ export interface DalleRequestPayload {
export class ChatGPTApi implements LLMApi {
private disableListModels = true;

path(path: string): string {
path(path: string, model?: string): string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里model的意义是?好像没用到?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

下面 speech 里用到了

Copy link
Member

@Dogtiti Dogtiti Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

path这个函数里面好像没用到吧?没看到呀

@@ -78,7 +79,7 @@ export interface DalleRequestPayload {
export class ChatGPTApi implements LLMApi {
private disableListModels = true;

path(path: string): string {
path(path: string, model?: string): string {
Copy link
Member

@Dogtiti Dogtiti Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

path这个函数里面好像没用到吧?没看到呀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants