Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mbrola languages (Windows support) #2

Open
klvbdmh opened this issue Feb 2, 2017 · 19 comments
Open

mbrola languages (Windows support) #2

klvbdmh opened this issue Feb 2, 2017 · 19 comments

Comments

@klvbdmh
Copy link
Contributor

klvbdmh commented Feb 2, 2017

Since mbrola and espeak both provide Windows binaries, I'd like to give voxpopuli a shot and see if I can make it work on Windows.

However, mbrola project website seems to be unavailable and I can't download neither the binary nor language files. Is there any mirror that hosts those files?

@hadware
Copy link
Owner

hadware commented Feb 2, 2017

Ouh. My first contributer. Alright, first, I really appreciate your help, however I've always been programming python on linux, so for me, Windows porting is dark magic.

I can try and set up a python dev env on windows as to help you, but i really don't know how the subprocess library behaves on windows. Nor do i know how the pyaudio sound backend does on windows, so, those two things should be considered.
Otherwise, well, since voxpopuli is basically a wrapper for espeak and mbrola I/O, everything should be the same.

EDIT: mbrola's website seems to be down...

@hadware
Copy link
Owner

hadware commented Feb 3, 2017

Here are the windows binaries, btw: http://tcts.fpms.ac.be/synthesis/mbrola/bin/pcwin/MbrolaTools35.exe

@klvbdmh
Copy link
Contributor Author

klvbdmh commented Feb 6, 2017

Oh thanks! Yeah, the site was down but it looks like it works now. I'll download the binaries and language files and I'll give it a go later (and of course I'll report my findings).

@hadware
Copy link
Owner

hadware commented Feb 6, 2017

Nice. I'll set up a python dev env to test out your discoveries :)

@klvbdmh
Copy link
Contributor Author

klvbdmh commented Feb 7, 2017

Ok, the first obvious problem is that mbrola_voices_folder variable in Voice class is hardcoded to /usr/share/mbrola. Folder like this doesn't exist on Windows. In fact, there's no default location for voice databases and it's left up to the user. So anyone who wants to use voxpopuli on Windows needs to provide their own path. I've done it with

from voxpopuli import Voice
Voice.mbrola_voices_folder = 'D:\Downloads\mbrola-voices'

The next problem (kind of a recurring theme) is that private methods in Voice class related to mbrola and espeak have hardcoded strings. Since espeak and mbrola aren't likely to be included in the Windows PATH, we need to spell out whole paths to executables. I think providing default paths creating during the install (C:\Program Files (x86)\Mbrola Tools and C:\Program Files (x86)\eSpeak\command_line\espeak) is a sensible approach.

Another Windows-specific issue is that mbrola.exe doesn't exit. Mbrola Tools have a bunch of executables - two of them are GUIs which are not that interesting. phoplayer.exe is a command line client and I was able to successfully generate a .wav file from a .pho file. As far as I tested, espeak works without any problems.

I've noticed that most of the run commands are formatted as strings, which makes them less flexible and harder to maintain. I think using an argument list adds clarity and makes it easier to include platform-specific settings like MALLOC_CHECK (btw a comment in the source code explaining why it's required would be great).

I've also noticed that a functionality of _str_to_audio method is duplicated - it's basically the same as _str_to_phonems and _phonems_to_audio. Can we combine those two in Python instead of piping the outputs?

Anyway, I managed to fork the project locally and played with it a bit. So far I got TestStrToPhonems tests to pass (except test_german, for some reason de1 mbrola voice file is not detected - but de2 is). Can you check my fork and see if there's no regression on Linux?

@hadware
Copy link
Owner

hadware commented Feb 7, 2017

Aw, well, I'm a bit too tired and I merged your forked without looking at this very exhaustive message. I'm sorry for that 🤕 . It doesn't matter much anyway, since I haven't pushed the current version to the pypi repo. I'm sorry, it's my first package, i'm a bit clumsy with these things.

I'm currently fixing some regressions that have been introduced, but nothing bad, don't worry.

Concerning the problems linked to the windows version of mbrola, I truly have no clue about what to do. I should be able to set up a python windows env by the end of the week to see what I can do.

I've also noticed that a functionality of _str_to_audio method is duplicated - it's basically the same as _str_to_phonems and _phonems_to_audio. Can we combine those two in Python instead of piping the outputs?

Oh, that's on purpose. It's a small optimisation. Since, most of the time, people don't care about phonems, and just want some audio out of some text (I have some very strong statistics to support this claim!), I figured it's more efficient to have this pipeline:

python -> espeak -> mbrola -> python

instead of this pipeline (which is used when editing phonems):

python -> espeak -> python -> mbrola -> python

It spares the parsing of the phonem object, to immediately synthesize it back to a string. Hope this makes sense to you.

@klvbdmh
Copy link
Contributor Author

klvbdmh commented Feb 8, 2017

It totally makes sense. At the same time it looks like premature optimization to me. How big are the gains exactly? Also it leads to code duplication - we have big chunks of code doing exactly the same thing.

In other news, I got a working .wav file with voice.to_audio()! I had to add .wav to the stdout parameter. Otherwise the produced wav files were unplayable. However, I noticed this warning in mbrola readme:

Never use .wav format when you pipe the ouput (mbrola can't rewind the
file to write the audio size in the header). Wav format was not
developped for Unix (on the contrary Au format let you specify in the
header "we're on a pipe, read until end of file").

Can you check if it's going to be a problem?

Also, I can't get de1 database to work on my system no matter what, so I can't complete two tests. I even downloaded the Ubuntu package and extracted voice files from it (they were the same).

@hadware
Copy link
Owner

hadware commented Feb 9, 2017

I have no idea about the de1 database either. Maybe try getting a trace of the error somehow (although i don't thing mbrola has a --verbose option).

Regarding the .wav stuff, i'll have to admit I copied someone else's code to "package" mbrola's output into a "real" wav file (c.f. the _wav_format method). I also thought that, even if the .wav format is old and from microsoft, it perfectly works on linux, it's uncompressed so it leaves to the user the choice of formatting. Moreover, wav is widely used (.au and .aiff are mostly used by audio professionals, if i'm not mistaken). At last, we've been using mbrola's output (as wav) in the exact same format for about a year on https://loult.family/ very intensively, and it has proved to be reliable.

Your concern is a valid one though.

We could, however, add an output_format option that defaults to wav.

PS: it'd be good though if we could comment that mysterious line from _wav_format so regular people understand it. I'll probably investigate.

@klvbdmh
Copy link
Contributor Author

klvbdmh commented Feb 14, 2017

I can't even run in from the command line. It simply says Failed to read voice 'mb-de1'.

I see. So there's no problem with using wav as default.

Fully agree on commenting the _wav_format. Perhaps WAVE PCM soundfile format could be a good start.

@hadware
Copy link
Owner

hadware commented Feb 14, 2017

Nice, Im going to dig into it to figure out how that bytes-packing sorcery works. It's the kind of stuff I tend to like.

REgarding the wav dicussion, I think keeping only wav is a good option for now, but if someone asks for more formats, we could add those. In the meantime, i'm probably going to add some examples on how to use that .wav bytes object to do other kind of stuff, a bit like in the README.md.

Rregarding the mb-de1 problem, is the error coming from espeak or from mbrola?

Thanks again for your help.

EDIT: I figured out how the bytes-packing sorcery works, and more especially, using your quoted text from mbrola, why exactly did the dude that made this code did it. I'll add comments as soon as possible to _wav_format

@klvbdmh
Copy link
Contributor Author

klvbdmh commented Feb 14, 2017

Good call on the examples. And it's great that you figured out the bytes method.

The error is coming from espeak.

@hadware
Copy link
Owner

hadware commented Feb 14, 2017

Hm. I'll check again on linux this evening, see if it's related to windows.

@klvbdmh
Copy link
Contributor Author

klvbdmh commented Feb 15, 2017

Ok, we're almost there. There's one more problem with running tests on Windows. For some reason I still fail all ToAudio tests. They all raise AssertionError when comparing bytes of .wav files. Example:

Failure
Traceback (most recent call last):
  File "D:\lib\env\py35\lib\unittest\case.py", line 58, in testPartExecutor
    yield
  File "D:\lib\env\py35\lib\unittest\case.py", line 600, in run
    testMethod()
  File "D:\Dev\voicesynth\tests\tests.py", line 65, in test_en
    self.assertEqual(wavfile.read(), wav_byte)
  File "D:\lib\env\py35\lib\unittest\case.py", line 820, in assertEqual
    assertion_func(first, second, msg=msg)
  File "D:\lib\env\py35\lib\unittest\case.py", line 813, in _baseAssertEqual
    raise self.failureException(msg)
AssertionError: b'RIF[6565 chars]03\x9d\x03a\x03\x06\x03$\x03G\x03\x14\x04\xe5\[156127 chars]\x00' != b'RIF[6565 chars]03\x9e\x03a\x03\x06\x03$\x03G\x03\x14\x04\xe5\[156127 chars]\x00'
Failure
Traceback (most recent call last):
  File "D:\lib\env\py35\lib\unittest\case.py", line 58, in testPartExecutor
    yield
  File "D:\lib\env\py35\lib\unittest\case.py", line 600, in run
    testMethod()
  File "D:\Dev\voicesynth\tests\tests.py", line 39, in test_salut
    self.assertEqual(wavfile.read(), wav_byte)
  File "D:\lib\env\py35\lib\unittest\case.py", line 820, in assertEqual
    assertion_func(first, second, msg=msg)
  File "D:\lib\env\py35\lib\unittest\case.py", line 813, in _baseAssertEqual
    raise self.failureException(msg)
AssertionError: b'RIFF2\xad\x00\x00WAVEfmt \x10\x00\x00\x00\x01[159028 chars]\x00' != b'RIFF\x00}\x00\x00WAVEfmt \x10\x00\x00\x00\x01[114443 chars]\x00'
-------------------- >> begin captured stdout << ---------------------
'Salut les amis'

--------------------- >> end captured stdout << ----------------------

Interestingly, voice.say("Salut les amis") plays the sentence without any problems.

@hadware
Copy link
Owner

hadware commented Feb 16, 2017

I think i know what causes these errors, I think it has to do with the quoting you introduced to fix your previous error on windows. I'll have to test this at home this evening.

Btw, i had some problems on travis to install PyAudio, but it's fixed. Once the unittests are OK on ubuntu 14.04, we should be good to go for a 0.2 release. \o/

@klvbdmh
Copy link
Contributor Author

klvbdmh commented Feb 16, 2017

Awesome news about travis!

The error happened before I made those changes too (and I confirmed it by reverting to an old version on my local repo).

I added a default voice folder location on Windows (it will be empty at first, of course).

Also, I figured out the cause of my mb-de1 problem. My espeak installation didn't have mb-de1 file in eSpeak\espeak-data\voices\mb. Simply copying and renaming mb-de2 solved it. It's very peculiar; I checked the latest source version (1.48.04) from SourceForge and it is indeed missing from there too. Do you have that file on your Linux installation?

@hadware
Copy link
Owner

hadware commented Feb 21, 2017

Sorry, I was away for some time. I checked my installation on an Ubuntu 14.04 machine, and it did not have the de2 mbrola voice file. You can see all of those in this listing:
http://packages.ubuntu.com/trusty/amd64/espeak-data/filelist (directory /voice/mb/).

This is probably a problem to be solved using the self.sex variable, which i should probably rename to self.espeak_voice_id. I also kind of copied the weird logic that concerns espeak voices off some other code, it's a bit tricky to understand what happens there. Maybe this should also be clarified.

PS: I've just remarked that, in theory, it's possible to make espeak say stuff in greek with a german accent : /usr/lib/x86_64-linux-gnu/espeak-data/voices/mb/mb-de6-grc . How ironic is that?

@klvbdmh
Copy link
Contributor Author

klvbdmh commented Feb 22, 2017

Then how are you able to pass the tests with de1 voice?

@Rachine
Copy link

Rachine commented Nov 7, 2018

Ok, the first obvious problem is that mbrola_voices_folder variable in Voice class is hardcoded to /usr/share/mbrola. Folder like this doesn't exist on Windows. In fact, there's no default location for voice databases and it's left up to the user. So anyone who wants to use voxpopuli on Windows needs to provide their own path. I've done it with

from voxpopuli import Voice
Voice.mbrola_voices_folder = 'D:\Downloads\mbrola-voices'

The next problem (kind of a recurring theme) is that private methods in Voice class related to mbrola and espeak have hardcoded strings. Since espeak and mbrola aren't likely to be included in the Windows PATH, we need to spell out whole paths to executables. I think providing default paths creating during the install (C:\Program Files (x86)\Mbrola Tools and C:\Program Files (x86)\eSpeak\command_line\espeak) is a sensible approach.

Another Windows-specific issue is that mbrola.exe doesn't exit. Mbrola Tools have a bunch of executables - two of them are GUIs which are not that interesting. phoplayer.exe is a command line client and I was able to successfully generate a .wav file from a .pho file. As far as I tested, espeak works without any problems.

I've noticed that most of the run commands are formatted as strings, which makes them less flexible and harder to maintain. I think using an argument list adds clarity and makes it easier to include platform-specific settings like MALLOC_CHECK (btw a comment in the source code explaining why it's required would be great).

I've also noticed that a functionality of _str_to_audio method is duplicated - it's basically the same as _str_to_phonems and _phonems_to_audio. Can we combine those two in Python instead of piping the outputs?

Anyway, I managed to fork the project locally and played with it a bit. So far I got TestStrToPhonems tests to pass (except test_german, for some reason de1 mbrola voice file is not detected - but de2 is). Can you check my fork and see if there's no regression on Linux?

mbrola_voices_folder should be custom because without sudo rights, impossible to use voices even on linux for simple users

@PierreOrhan
Copy link

PierreOrhan commented Dec 26, 2023

For anyone looking for Windows support in 2023, one can install mbrola and espeak through WSL, the new linux interpreter/partition provided by microsoft. Here I was running python from the windows system, but simply calling the linux mbrola and espeak. (I guess one could also do everything in the linux partition, which would require no change to this software.)

Effectively one just need to change a few line of code in this library:

    if platform in ('linux', 'darwin'):
        espeak_binary = 'espeak'
        mbrola_binary = 'mbrola'
        mbrola_voices_folder = "/usr/share/mbrola"
    elif platform == 'win32':
        # If the path has spaces it needs to be enclosed in double quotes.
        espeak_binary = '"C:\\Program Files (x86)\\eSpeak\\command_line\\espeak"'
        mbrola_binary = '"C:\\Program Files (x86)\\Mbrola Tools\\mbrola"'
        mbrola_voices_folder = os.path.expanduser('~\\.mbrola\\')
        if not os.path.exists(mbrola_voices_folder):
            os.makedirs(mbrola_voices_folder)
        if not os.path.exists(espeak_binary):
            espeak_binary = "wsl espeak"
            mbrola_binary = "wsl MALLOC_CHECK_=0 mbrola"
            mbrola_voices_folder = "/usr/share/mbrola"
        # if (Path(self.mbrola_voices_folder)
        #     / Path(voice_name)
        #     / Path(voice_name)).is_file():
        #     self.lang = lang
        #     self.voice_id = voice_id
        # else:
        #     raise self.InvalidVoiceParameters(
        #         "Voice %s not found. Check language and voice id, or install "
        #         "by running 'sudo apt install mbrola-%s'. On Windows download "
        #         "voices from https://github.com/numediart/MBROLA-voices"
        #         % (voice_name, voice_name))
        self.lang = lang
        self.voice_id = voice_id
    def _phonemes_to_audio(self, phonemes: PhonemeList) -> bytes:
        # voice_path_template = ('%s/%s%d/%s%d'
        #                        if platform in ("linux", "darwin")
        #                        else '%s\\%s%d\\%s%d')
        voice_path_template = "%s/%s%d/%s%d"
        voice_phonemic_db = (voice_path_template
                             % (self.mbrola_voices_folder, self.lang,
                                self.voice_id, self.lang, self.voice_id))

The reason for commenting the path checking and changing the path template to a linux path template is that I am not sure how to properly deal with path formatting toward the linux partition in my situation.
So a quick, dirty, fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants