Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to configure TTS engines in the UI #151

Open
2 of 3 tasks
bertfrees opened this issue Jun 27, 2023 · 17 comments · Fixed by #154
Open
2 of 3 tasks

Ability to configure TTS engines in the UI #151

bertfrees opened this issue Jun 27, 2023 · 17 comments · Fixed by #154
Assignees
Labels
engine Issues that require something to change on the engine side enhancement New feature or request ready-for-testing An implementation is ready to be tested
Milestone

Comments

@bertfrees
Copy link
Member

bertfrees commented Jun 27, 2023

  • Some engines need to be configured before they can work. Examples are the cloud based engines from Google and Microsoft. There should be a config panel for entering the available properties for each engine. A nice and simple way to present the settings could be to have one panel per available engine. Each engine would have a status to indicate whether it is working or not. (The GUI can check this using the API to retrieve voices.) The settings would be nicely hidden for unconfigured or disabled engines. Upon enabling an engine, the user is asked to fill in the required properties. If the properties are not filled in correctly, the engine is not enabled. The user can also disable engines that are configured correctly or that don't need configuration (e.g. the engines native to Windows and macOS).

  • Scripts with TTS support have a "Text-to-speech configuration file". For usability it would be nicer if this configuration would be integrated in the UI somehow. #154 adds a config panel for selecting preferred voices.

  • A nice addition to the voices config panel could be a way to have a live preview of the voices (blocked on Add ability to get a "preview" of a voice pipeline-modules#89).

@ways2read
Copy link
Member

I guess "the cloud based engines always work" isn't true if there is no internet connection.

@bertfrees
Copy link
Member Author

bertfrees commented Jul 4, 2023

Some idea's from yesterday's team meeting:

@rdeltour:

One thing that c/b nice is to have a way to have a live preview of the TTS from a config page in the UI.

@bertfrees: (see daisy/pipeline-modules#66)

I want to eliminate the "Text-to-speech configuration file" options, and replace it with the following:

  • a dedicated option to specify CSS style sheets (in addition to the possibility to attach style sheets to the input)
  • a dedicated option to specify lexicons (in addition to the possibility to attach lexicons to the input)
  • dedicated options for certain TTS properties
    • org.daisy.pipeline.tts.log: done
    • org.daisy.pipeline.tts.mp3.bitrate: to do
    • org.daisy.pipeline.tts.lame.cli.options: has been deprecated
  • it should not be possible anymore to set other TTS properties dynamically (per job) (note that org.daisy.pipeline.tts.host.protection has already been deprecated)
  • per-job voice configuration should be replaced by a system wide voice configuration

@bertfrees
Copy link
Member Author

This issue is not quite fixed yet I think. #154 fixes only a part of it, namely selecting preferred voices.

@marisademeglio
Copy link
Member

This issue is not quite fixed yet I think. #154 fixes only a part of it, namely selecting preferred voices.

Can you create more issues for what still needs to be done? Or is it more of keeping everyone's ideas around?

@bertfrees
Copy link
Member Author

bertfrees commented Sep 26, 2023

It is more a collection of ideas. We can create some new issues with concrete things to do.

@bertfrees bertfrees reopened this Sep 27, 2023
@bertfrees
Copy link
Member Author

On second thought, the first comment in this issue sums it up pretty well I think. Instead of creating a new issue with more or less the same in it, I'm gonna reword this one, and convert it into a list of tasks.

@marisademeglio
Copy link
Member

Will there be an API to describe engines' properties? Or should I hardcode it based on the engine configuration docs?

Voice preview will come from the API too, right? Is that ready?

@bertfrees
Copy link
Member Author

I think hardcoding the properties makes the most sense for now. But an API for the engines can definitely be useful too. Let's keep the idea.

Voice preview will come from the API, yes. I'm not sure yet what the API should look like though. I guess it could also be a general purpose "speak" command, that could even accept SSML. That wouldn't be so hard to do. (A while back I already wrote a mock of the Google TTS API that dispatches to the available TTS engines.)

@marisademeglio marisademeglio added the enhancement New feature or request label Sep 29, 2023
@marisademeglio
Copy link
Member

Added credential fields for Azure and Google voices in 4d97c87

Verifying the credentials is a new issue: #164

From this convo we have now implemented all the engine settings for our current goal

@marisademeglio
Copy link
Member

Noting here that we also got a feature request from a tester:
"When selecting between voices, offer a preview."

@marisademeglio
Copy link
Member

Is there anything left in the first task above that is still relevant? We have designed the settings dialog in a different way based on other convos about engine properties that we wanted to support.

And the "voice preview" task is still pending engine implementation.

@bertfrees
Copy link
Member Author

bertfrees commented Apr 17, 2024

Is there anything left in the first task above that is still relevant?

The way the settings dialog looks now is great!

It's very minor, but one thing that would be nice is if the status (connected or disconnected) would somehow be made even more clear. I don't know how though.

By the way, this note at the top:

After configuring these engines with the required credentials, they will be available under 'Voices'. Save and reopen the settings dialog to see changes.

Is it really needed? It seems the voices are updated without closing and reopening the settings.

@marisademeglio
Copy link
Member

True, that wording can be simplified as the changes now are effective immediately.

How is this for a slightly clearer connected/disconnected status?

Screenshot 2024-04-17 at 09 35 56

@bertfrees
Copy link
Member Author

Yes, I also thought of emphasizing it visually like that. That is indeed slightly better, however I don't know whether that fundamentally changes anything? 'Cause it will just be decoration, right?

Perhaps the engines could be grouped by connection status? There would be two main headings with the connection status, under which the subheadings "Azure" and "Google" would go. The main headings wouldn't need to be visible for sighted users, the sections could be indicated some other way.

Just thinking out loud. As I said, it is already good the way it is now.

@marisademeglio
Copy link
Member

Ok I will commit this since it seems better.

We can keep this thread going for ideas. I don't like grouping it by connection status because then doesn't it get reordered when the status changes? That's visually disruptive and probably bad accessibility.

@bertfrees
Copy link
Member Author

Yes, that's probably true.

@marisademeglio marisademeglio added this to the 1.6 milestone Jun 17, 2024
@marisademeglio
Copy link
Member

@marisademeglio marisademeglio added the engine Issues that require something to change on the engine side label Jun 24, 2024
marisademeglio added a commit that referenced this issue Sep 16, 2024
@marisademeglio marisademeglio added the ready-for-testing An implementation is ready to be tested label Oct 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
engine Issues that require something to change on the engine side enhancement New feature or request ready-for-testing An implementation is ready to be tested
Projects
Status: In Progress
Development

Successfully merging a pull request may close this issue.

3 participants