No reading tracking with Piper speech synthesis #361

patrick-emmabuntus · 2024-01-26T18:57:36Z

Hello,

I used Calibre 6.13 with ebook-speaker on Debian 12.

The goal is to allow blind people to listen to the content of ebooks.

In order to have a better reading, I want to replace eSpeak-ng with Piper. Playback with Piper works well but this one compared to eSpeak-ng does not review the playback tracking data in Calibre like eSpeak-ng does see screenshot below.

In the Speak-ng synthesizer engine the "EspeakIndexing" option is set to 1 which activates word tracking.

This function is very important because it allows when reopening an ebook to return to where it left off because Calibre followed the voice reading.

Do you know if such a function is available in Piper?

And if so, how to activate it?

Thank you in advance for your advice.

SeymourNickelson · 2024-02-07T15:40:35Z

This would be an awesome feature to have. It should be possible to add code to synthesize each word independently of each other and provide a callback just before the audio is played on each word boundary, but I would assume that the voice wouldn't sound as realistic because you're feeding the model one word at a time.

I wonder if there is another way to highlight on words as they are played without impacting the quality of the output.

patrick-emmabuntus · 2024-02-11T11:18:53Z

Thank you @SeymourNickelson for your advice.

Indeed, if the words are read one by one, this will alter the reading of the voice synthesis.

On the other hand, you must continue to read the words normally using voice synthesis and you must send a reading position to caliber so there may be a small gap between the word read and the word displayed in Caliber. The goal is to allow the blind person to return to the position where they were in the book during the previous reading and not have to reread the entire chapter from the last chapter read.

contentnation · 2024-02-22T22:19:55Z

I had time to into the way Calibre works.
Sadly, I got bad news.
Short version: Calibre uses speech-dispatch for generating the audio. You can add custom tools for text-to-speech (like piper).
But for the highlighting feature you need to add direct support for piper in speech-dispatch to add "magic".
Plus some work on piper side for the other part of the magic.

For those, who want to go on developing, a few notes (or TODOs):
speech-dispatch needs similar marker functionality as in src/modules/espeak.c:
As soon as such a marker is received, wait for the audio data and tell upstream about the marker.
On the piper side, the markers need to used to split the input and if a marker is reached, send the generated audio timestamp and audio data until that point.
Current generic output always filters those markers before it is sent to piper (or any external tts).

patrick-emmabuntus · 2024-02-24T10:40:58Z

Thank you very much @contentnation for your advice.

omega3 mentioned this issue Feb 19, 2024

Feature Request - Add Timestamp Functionality #364

Open

contentnation mentioned this issue Feb 23, 2024

support for alignment output in tsv format #407

Open

zenny mentioned this issue Aug 20, 2024

[Feature Request] not sending "word completion data" to Calibre #578

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No reading tracking with Piper speech synthesis #361

No reading tracking with Piper speech synthesis #361

patrick-emmabuntus commented Jan 26, 2024

SeymourNickelson commented Feb 7, 2024

patrick-emmabuntus commented Feb 11, 2024

contentnation commented Feb 22, 2024 •

edited

Loading

patrick-emmabuntus commented Feb 24, 2024

No reading tracking with Piper speech synthesis #361

No reading tracking with Piper speech synthesis #361

Comments

patrick-emmabuntus commented Jan 26, 2024

SeymourNickelson commented Feb 7, 2024

patrick-emmabuntus commented Feb 11, 2024

contentnation commented Feb 22, 2024 • edited Loading

patrick-emmabuntus commented Feb 24, 2024

contentnation commented Feb 22, 2024 •

edited

Loading