Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(tts): Implement naive response_format for tts endpoint #4035

Merged
merged 2 commits into from
Nov 2, 2024

Conversation

n-Arno
Copy link
Contributor

@n-Arno n-Arno commented Nov 2, 2024

Description

This PR fixes #2732

Notes for Reviewers

This is a naive implementation as a starting point / workaround. I coded and use it for Livekit Agent integration, since the default mp3 format is expected from OpenAI plugin. It leverage ffmpeg for conversion of the generated wav file at endpoint level, not backend level.

It is neither the best nor the prettiest but since it works, i contribute :D

Copy link

netlify bot commented Nov 2, 2024

Deploy Preview for localai ready!

Name Link
🔨 Latest commit 6b5dbfd
🔍 Latest deploy log https://app.netlify.com/sites/localai/deploys/67265138ae70380008d8bb31
😎 Deploy Preview https://deploy-preview-4035--localai.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@n-Arno n-Arno changed the title Implement naive response_format for tts endpoint feat: Implement naive response_format for tts endpoint Nov 2, 2024
@n-Arno n-Arno changed the title feat: Implement naive response_format for tts endpoint feat(tts): Implement naive response_format for tts endpoint Nov 2, 2024
@dave-gray101
Copy link
Collaborator

dave-gray101 commented Nov 2, 2024

One potential issue with this:

For license reasons, not all of our images include ffmpeg.

We'll need to verify the error handling works on systems and return an error in that situation?

To be more clear: your function already -has- error handling, I think we just need to make sure the default option is to -not- format, and add some documentation that this option requires ffmpeg

@n-Arno
Copy link
Contributor Author

n-Arno commented Nov 2, 2024

Indeed, i didn't consider the "problem" with ffmpeg licencing since i am rebuilding the image with --build-arg FFMPEG=true and it was done to leverage LocalAI (which is altogether VERY COOL) for an internal demo.

I'll add a quick note and check (like i said, it's a very naive implementation, not something rock solid yet)

@mudler mudler added the enhancement New feature or request label Nov 2, 2024
Copy link
Owner

@mudler mudler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good here, thanks @n-Arno !

@mudler mudler enabled auto-merge (squash) November 2, 2024 17:37
@mudler
Copy link
Owner

mudler commented Nov 2, 2024

To be more clear: your function already -has- error handling, I think we just need to make sure the default option is to -not- format, and add some documentation that this option requires ffmpeg

I think it does already, no ?

https://github.com/mudler/LocalAI/pull/4035/files#diff-09ebe993ca661f1de77ed47e54d46755a5189bde28f94f70f52a095b244c904eR38

If no format is specified, wav is implied, which in turn does skip calling ffmpeg completely

@n-Arno
Copy link
Contributor Author

n-Arno commented Nov 2, 2024

Indeed, if no format is given, wav is used and ffmpeg is not called. I think the idea was to avoid a failure due to its absence if a format is specified.

I am adding a "simple" function like this to test if ffmpeg is ok:

// FFmpegReady tests if ffmpeg is available by trying to print help
func FFmpegReady() bool {
        commandArgs := []string{"-h"}
        _, err := ffmpegCommand(commandArgs)
        return (err == nil)
}

Once the build is done, i'll do a quick test ok and i'll commit this "security" (if i figure how to squash two commits with a merge from master in between :D)

@mudler
Copy link
Owner

mudler commented Nov 2, 2024

Indeed, if no format is given, wav is used and ffmpeg is not called. I think the idea was to avoid a failure due to its absence if a format is specified.

gotcha, yes in this case we should error out in a sane way so the user is aware of the image limitation (no ffmpeg present).

Can be done in a follow-up tho, if tests are passing I'd merge it as is, unless you want to improve it with the error propagation in this PR.

@n-Arno
Copy link
Contributor Author

n-Arno commented Nov 2, 2024

As-Is, the error is propagated correctly, so a merge is possible. It does not fallback silently, but maybe that's is not a desired behaviour.

@mudler mudler merged commit 65c3df3 into mudler:master Nov 2, 2024
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implements response_format to the /v1/audio/speech endpoint
3 participants