Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to Read Longer Audio (ie Audiobooks) #54

Closed
fakerybakery opened this issue Nov 22, 2023 · 10 comments
Closed

Ability to Read Longer Audio (ie Audiobooks) #54

fakerybakery opened this issue Nov 22, 2023 · 10 comments
Labels
enhancement New feature or request

Comments

@fakerybakery
Copy link
Contributor

fakerybakery commented Nov 22, 2023

Hi,
Might it be possible to implement a tqdm progress bar for longer text? This would make it possible to easily narrate entire audiobooks!
Thanks!

@yl4579
Copy link
Owner

yl4579 commented Nov 22, 2023

I think so far it’s not very good at narrating the entire audiobook because the training data isn’t the entire audiobook. The training data is purely independent clip taken from amateur audiobooks readings, rather than an entire audiobook. It won’t be like ElevenLabs that are trained with professional audiobook datasets as these data are usually not public domains. However, if we do have the data, it can be easily changed to train on this sort of data, by conditioning on the previous style to sample the current style. This probably would reproduce the effect of ElevenLabs, especially for dialogues.

The closest dataset to entire audiobook is LJSpeech, but again it’s completely non-fiction, so it won’t be good for any fiction reading (no dialogue), and it might produce unnatural intonation’s because each clip was treated independently during training.

@fakerybakery
Copy link
Contributor Author

Hmm. Thanks. LibriVox seems like a good place to get public domain audiobooks. Are there any plans to add this capability in the future?

@yl4579
Copy link
Owner

yl4579 commented Nov 22, 2023

LibriTTS is already taken from LibriVox, but for some reason they aren’t complete audiobook narration but very fragmentized clips taken from complete audiobook narrations. I don’t know why they remove a lot of clips.

@fakerybakery
Copy link
Contributor Author

fakerybakery commented Nov 22, 2023

I feel like the quality would be lower if you trained it on an entire audiobook, right? I don't know, I guess it just feels like the longer the samples are the worse it will be (I might be wrong). Maybe we can use Tortoise TTS's splitting script with this?

However, if it's possible to train a TTS model on long text without degrading quality, it shouldn't be too hard to write a script to scrape LibriVox based on readers (they have an API). I was able to make this dataset a while back using their API, but I didn't include readers at that time.

@fakerybakery fakerybakery changed the title Progress bar for long text Ability to Read Longer Audio (ie Audiobooks) Nov 22, 2023
@yl4579
Copy link
Owner

yl4579 commented Nov 22, 2023

No we do have to train on audio clips, but the idea is we condition the current style sampling on previous text and style, so it will be more continuous and possibly also makes it handle dialogue better (if the audio clips are split according to dialogues). It won’t work if we train on entire audio clips because we don’t have enough RAM.

@fakerybakery
Copy link
Contributor Author

Hmm interesting! Are you planning to implement something like this in the future?

@yl4579
Copy link
Owner

yl4579 commented Nov 22, 2023

Yeah probably, but I don't think it'll be that simple. If the effort is more than trivial concatenation it could be a different project or paper, but now the difference probably won't be big enough on LibriTTS dataset because there is no dialogue. It's more useful if we can get some fictional audiobook datasets that are separated by characters.

@fakerybakery
Copy link
Contributor Author

Hmm. Hypothetically, if there was a long audiobook dataset available, how difficult do you think it would be to implement?

@fakerybakery
Copy link
Contributor Author

fakerybakery commented Nov 23, 2023

I implemented a basic long-text reader on the online demo by splitting text, but it isn't perfect yet. (update: I removed it because someone said it made it harder to clone with Docker)

@MariasStory
Copy link

MariasStory commented Nov 23, 2023

I implemented a basic long-text reader on the online demo by splitting text, but it isn't perfect yet. (update: I removed it because someone said it made it harder to clone with Docker)

I am fine with removing the long-text option, because I think that it should be a default setting in every task.
I say that the long text can/should be split and processed automatically.

@yl4579 yl4579 added discussion enhancement New feature or request and removed discussion labels Nov 24, 2023
Repository owner locked and limited conversation to collaborators Nov 25, 2023
@yl4579 yl4579 converted this issue into discussion #83 Nov 25, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants