Ability to Read Longer Audio (ie Audiobooks) #54

fakerybakery · 2023-11-22T01:56:34Z

Hi,
Might it be possible to implement a tqdm progress bar for longer text? This would make it possible to easily narrate entire audiobooks!
Thanks!

The text was updated successfully, but these errors were encountered:

yl4579 · 2023-11-22T03:38:46Z

I think so far it’s not very good at narrating the entire audiobook because the training data isn’t the entire audiobook. The training data is purely independent clip taken from amateur audiobooks readings, rather than an entire audiobook. It won’t be like ElevenLabs that are trained with professional audiobook datasets as these data are usually not public domains. However, if we do have the data, it can be easily changed to train on this sort of data, by conditioning on the previous style to sample the current style. This probably would reproduce the effect of ElevenLabs, especially for dialogues.

The closest dataset to entire audiobook is LJSpeech, but again it’s completely non-fiction, so it won’t be good for any fiction reading (no dialogue), and it might produce unnatural intonation’s because each clip was treated independently during training.

fakerybakery · 2023-11-22T03:46:35Z

Hmm. Thanks. LibriVox seems like a good place to get public domain audiobooks. Are there any plans to add this capability in the future?

yl4579 · 2023-11-22T03:51:19Z

LibriTTS is already taken from LibriVox, but for some reason they aren’t complete audiobook narration but very fragmentized clips taken from complete audiobook narrations. I don’t know why they remove a lot of clips.

fakerybakery · 2023-11-22T16:40:28Z

I feel like the quality would be lower if you trained it on an entire audiobook, right? I don't know, I guess it just feels like the longer the samples are the worse it will be (I might be wrong). Maybe we can use Tortoise TTS's splitting script with this?

However, if it's possible to train a TTS model on long text without degrading quality, it shouldn't be too hard to write a script to scrape LibriVox based on readers (they have an API). I was able to make this dataset a while back using their API, but I didn't include readers at that time.

yl4579 · 2023-11-22T17:16:10Z

No we do have to train on audio clips, but the idea is we condition the current style sampling on previous text and style, so it will be more continuous and possibly also makes it handle dialogue better (if the audio clips are split according to dialogues). It won’t work if we train on entire audio clips because we don’t have enough RAM.

fakerybakery · 2023-11-22T17:21:16Z

Hmm interesting! Are you planning to implement something like this in the future?

yl4579 · 2023-11-22T20:21:48Z

Yeah probably, but I don't think it'll be that simple. If the effort is more than trivial concatenation it could be a different project or paper, but now the difference probably won't be big enough on LibriTTS dataset because there is no dialogue. It's more useful if we can get some fictional audiobook datasets that are separated by characters.

fakerybakery · 2023-11-22T23:18:47Z

Hmm. Hypothetically, if there was a long audiobook dataset available, how difficult do you think it would be to implement?

fakerybakery · 2023-11-23T00:10:11Z

I implemented a basic long-text reader on the online demo by splitting text, but it isn't perfect yet. (update: I removed it because someone said it made it harder to clone with Docker)

MariasStory · 2023-11-23T13:34:03Z

I implemented a basic long-text reader on the online demo by splitting text, but it isn't perfect yet. (update: I removed it because someone said it made it harder to clone with Docker)

I am fine with removing the long-text option, because I think that it should be a default setting in every task.
I say that the long text can/should be split and processed automatically.

fakerybakery changed the title ~~Progress bar for long text~~ Ability to Read Longer Audio (ie Audiobooks) Nov 22, 2023

yl4579 added discussion enhancement New feature or request and removed discussion labels Nov 24, 2023

fakerybakery mentioned this issue Nov 24, 2023

Add importables + Gradio GUI #78

Closed

Repository owner locked and limited conversation to collaborators Nov 25, 2023

yl4579 converted this issue into discussion #83 Nov 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Ability to Read Longer Audio (ie Audiobooks) #54

Ability to Read Longer Audio (ie Audiobooks) #54

fakerybakery commented Nov 22, 2023 •

edited

Loading

yl4579 commented Nov 22, 2023 •

edited

Loading

fakerybakery commented Nov 22, 2023

yl4579 commented Nov 22, 2023

fakerybakery commented Nov 22, 2023 •

edited

Loading

yl4579 commented Nov 22, 2023

fakerybakery commented Nov 22, 2023

yl4579 commented Nov 22, 2023 •

edited

Loading

fakerybakery commented Nov 22, 2023

fakerybakery commented Nov 23, 2023 •

edited

Loading

MariasStory commented Nov 23, 2023 •

edited

Loading

This issue was moved to a discussion.

This issue was moved to a discussion.

Ability to Read Longer Audio (ie Audiobooks) #54

Ability to Read Longer Audio (ie Audiobooks) #54

Comments

fakerybakery commented Nov 22, 2023 • edited Loading

yl4579 commented Nov 22, 2023 • edited Loading

fakerybakery commented Nov 22, 2023

yl4579 commented Nov 22, 2023

fakerybakery commented Nov 22, 2023 • edited Loading

yl4579 commented Nov 22, 2023

fakerybakery commented Nov 22, 2023

yl4579 commented Nov 22, 2023 • edited Loading

fakerybakery commented Nov 22, 2023

fakerybakery commented Nov 23, 2023 • edited Loading

MariasStory commented Nov 23, 2023 • edited Loading

This issue was moved to a discussion.

fakerybakery commented Nov 22, 2023 •

edited

Loading

yl4579 commented Nov 22, 2023 •

edited

Loading

fakerybakery commented Nov 22, 2023 •

edited

Loading

yl4579 commented Nov 22, 2023 •

edited

Loading

fakerybakery commented Nov 23, 2023 •

edited

Loading

MariasStory commented Nov 23, 2023 •

edited

Loading