-
-
Notifications
You must be signed in to change notification settings - Fork 300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PicoTTS offline Text-to-Speech support #841
Comments
Thanks a lot for this. In the future I plan to switch to speech-dispatcher (which has a Pico module, I think). And more importantly Foliate needs to properly parse and extract the contents of the book, including any SSML markup (though I don't know how many books actually include those). See #829. Then we can also keep our own set of default pronunciation tweaks if no pronunciation info is included in the book. |
You’re welcome! Trying spd-say, I couldn’t find a PicoTTS option, but I didn’t really look very closely. The standard spd voices sound rather robotic on my system. SSML might be a nice option, though I think almost no one uses it. As you see in my script, PicoTTS also has scripting options (which I use heavily in my home automation). Too bad they were bought and put in the drawer… they had great voices, back in the days. Should you switch over to spd—or something else—please don’t remove the scripting possibility! There’s still much to be gained when writing some adaptations (just check sound variation and some pronunciation help I add by "brute-force" sed). This gives Foliate a real advantage. (I’m also using Calibre’s reader which can only handle PicoTTS unmodified, and it’s much worse.) Is there a reason that there are no linebreaks (for paragraph separation, which PicoTTS uses) and changing back ndashes, mdashes, ellipses to their ASCII equivalents? And the many semicolons added? All these I had to undo again to make it pronounce better. |
As I mentioned, Foliate currently does not parse and extract content properly. By "not properly" I mean that it uses Another problem is that it speaks each page separately so that Foliate can turn to the next page when it finishes speaking. This approach obviously has many problems. So this is mainly what I want to change. For example, it could process the document and insert linebreaks at block element boundaries. That would be much better than Speech-dispatcher is unrelated to all issues above. I want to switch to that for different reasons. The first is that I do not want to reinvent the wheel. Currently Foliate is already sort of a very poor man's speech-dispatcher. It has the advantage of having a much, much simpler interface, but it lacks features such selecting different voice, speed, etc. The second is security. In a sandbox environment, ideally you don't want to allow Foliate to run arbitrary commands outside the sandbox. Speech-dispatcher is itself configurable and extensible, so there's should be no significant loss of customizability if we limit access to only speech-dispatcher in the sandbox. The last reason is that it is already used by many other apps such as Firefox or Chromium. So in a sense it might make things easier for users (no need to configure different apps separately). But really, Foliate should not even care or know about TTS programs. Ideally it should just use the SpeechSynthesis Web API. It would help make Foliate's code more reusable and portable as the Web API can be run on any browser on any platform. Unfortunately that's not supported by WebKitGTK, which is ideally where all this TTS code should live, where it would also benefit other WebKitGTK apps like Epiphany. So that is why I wrote in the other issue that while it would use speech-dispatcher, we should still use the SpeechSynthesis API and only defer to speech-dispatcher under the hood.
I do understand the value in that, but really it's more of a by-product of the fact that TTS support in Foliate is extremely barebones. You can even abuse the TTS command to launch other non-TTS programs, for example. But that's not really how it's meant to be used. Design-wise speaking, this is no different from injecting userstyles or userscripts to modify the content of the book. So ideally, if this kind of scripting is to be supported by Foliate, it should be done properly with a proper plugin or userscript API. Also it could be argued that for forcing a certain pronunciation, one should be able to configure it in the TTS program, rather than doing it specifically for Foliate (provided that the content extraction issues mentioned above are fixed in Foliate). |
All your points are valuable—and correct. Let’s see how it eventually evolves, looking forward to it! And yes, of course I’m brute-forcing a lot here, because TTS on Linux is still not too great, and we sadly won’t get any more development on PicoTTS. |
Thank you for Foliate and the script which works perfectly with Foliate !! Windows
Thanks a lot ! |
@Lume6: It may be possible I didn’t check out all languages, resulting in the file FOLIATE_TTS_LANG_LOWER='fr'; echo "J'utilise Windows 10." | foliate-picotts If you get something like an Change this part in the script to have sox commands for all languages as follows: # use sox to make output better understandable (voices are rather muffled)
# adding some treble in the range of +3 to +6 dB helps
# some voices might need a little bass reduction, use s/th like "bass -6 400"
# to avoid clipping, give headroom (gain -h) and reclaim afterwards (gain -r)
case "${FOLIATE_TTS_LANG_LOWER:0:2}" in
"de")
sox /tmp/foliate.wav /tmp/foliate-sox.wav gain -h treble +6 gain -r
;;
"en")
sox /tmp/foliate.wav /tmp/foliate-sox.wav gain -h treble +3 gain -r
;;
"fr")
sox /tmp/foliate.wav /tmp/foliate-sox.wav gain -h treble +3 gain -r
;;
"it")
sox /tmp/foliate.wav /tmp/foliate-sox.wav gain -h treble +3 gain -r
;;
"es")
sox /tmp/foliate.wav /tmp/foliate-sox.wav gain -h treble +3 gain -r
;;
*)
cp /tmp/foliate.wav /tmp/foliate-sox.wav
;;
esac (You can adjust the For "Windows" (the operating system), it might even be better to use X-SAMPA phonemes, which PicoTTS supports. Try something like: "fr")
lang="fr-FR"
# "Windows" (the operating system)
text=$(echo "$text" | sed 's|\bwindows\b|<phoneme ph=\"win.doz\"/>|gI')
;; Sounds like: Happy experimenting! |
Updated version of the script: Try: FOLIATE_TTS_LANG_LOWER='fr'; echo "Je préfère Linux à Windows." | foliate-picotts ;-) |
Meanwhile, should be possible to use gTTS, but BTW, should be mention that this is a ISO 639-1 language code, not a three-letter 639-3 code (like used e.g. by Tesseract). |
Hi !
As you notice, it stops line withe the |
Good evening, |
The GTK 4 version now uses speech-dispatcher exclusively. Probably one can still add back the scripting ability. But it should have a better interface that works similarly to how it currently works with speech-dispatcher:
|
Is your feature request related to a problem? Please describe.
I wanted to add offline TTS (Text-to-Speech), and I’m not happy with eSpeak or Festival, but use PicoTTS for many other things already (it supports EN, DE, FR, IT, ES).
Describe the solution you'd like
Asssuming that most modern Linux systems already have sox and use PulseAudio, I wrote a little output script to be used with Foliate. Just copy into your
~/bin
folder or another appropriate location and make it executable (chmod +x foliate-picotts
).Describe alternatives you've considered
eSpeak, gTTS
Additional context
Here is my script – feel free to include it with your software and/or website!
The text was updated successfully, but these errors were encountered: