Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve issue #23 #26

Closed
wants to merge 2 commits into from
Closed

Resolve issue #23 #26

wants to merge 2 commits into from

Conversation

go-dockly
Copy link

This PR makes astideepspeech compatible with coqui's speech to text lib or deepspeech 1.0.

Linked issue can be closed #23

@asticode
Copy link
Owner

There's something I don't understand: why would we remove all reference to Mozilla's Deepspeech project ? Has the project been terminated ? If so, could you link some articles about it ? And why Coqui ?

@go-dockly
Copy link
Author

go-dockly commented Oct 24, 2021

Hi that's a good point.
What initially got me into this was this issue here 3693
There has been some activity on this since yesterday. That is new to me. Not sure what is gonna come out of this. Nevertheless, I wanted to try coqui after reading this Best EVER 🐸 STT English model. The accuracy trained on common voice v7 feels improved while loading a much smaller model.

@bobkleiner
Copy link

The accuracy trained on common voice v7 is really improved while loading a much smaller model.

@go-dockly Did you actually try it? This new model is just 1% better on real-life data and extremely slow (10 time slower than realtime).

@go-dockly
Copy link
Author

go-dockly commented Oct 25, 2021

Yes I tried it. How would I have created the PR?
The smaller model got subjectively better at understanding me yes.
Not perfect but definitely improved and running on gpu not too slow.
If it was faster and more accurate I'd be delighted. Give or take another year I guess.
I certainly like to try out the contributions you mentioned :)

@bobkleiner
Copy link

Not perfect but definitely improved and running on gpu not too slow.

Haha, GPU is the case here. On CPU you have to wait for ages.

Did you ever try https://gitlab.com/Jaco-Assistant/Scribosermo ? It is much more accurate.

@bobkleiner
Copy link

I certainly like to try out the contributions you mentioned :)

We are also uncertain on this just like you, waiting for Mozilla decision and very confused

@reuben
Copy link

reuben commented Oct 25, 2021

We released unquantized versions of the acoustic model initially with the 1.0.0 release. This has since been fixed, performance should be equivalent or better than 0.9.3. If you're seeing something different I'd love to hear about it!

@reuben
Copy link

reuben commented Oct 25, 2021

There's something I don't understand: why would we remove all reference to Mozilla's Deepspeech project ? Has the project been terminated ? If so, could you link some articles about it ? And why Coqui ?

Mozilla's lack of clear communication on this is above my pay grade, but as clearly visible from the activity in the repository, the DeepSpeech project is no longer maintained. I'm one of the main authors of DeepSpeech and am a co-founder at Coqui. We continue to build upon the project in the new repo, https://github.com/coqui-ai/STT, together with most of the community, who are now now hanging out in our Gitter room instead of Matrix, and discussions are now happening on GitHub instead of Discourse.

@go-dockly
Copy link
Author

@reuben thanks for the insight and the continued improvements on DeepSpeech!

@go-dockly
Copy link
Author

I certainly like to try out the contributions you mentioned :)

We are also uncertain on this just like you, waiting for Mozilla decision and very confused

@bobkleiner I am curious what would stop you guys from publishing those contributions in a separate repo?

@go-dockly
Copy link
Author

go-dockly commented Oct 25, 2021

Not perfect but definitely improved and running on gpu not too slow.

Haha, GPU is the case here. On CPU you have to wait for ages.

Did you ever try https://gitlab.com/Jaco-Assistant/Scribosermo ? It is much more accurate.

@bobkleiner Thanks for sharing I will give it a try.

@go-dockly
Copy link
Author

go-dockly commented Oct 26, 2021

What makes the recognition much better if you speak like the snippets in common voice.
I contribute voice samples often to that project and know that many sentence snippets are in relation to people and places.
So phrases "he lived in iowa" are well understood whereas terms like "alpha echo bravo charlie" or "punct comma enter" are just not in it's grasp. It would also be beneficial if the source of those sentences was more casual in the common voice app by for instance using subtitle snippets of movies.

@reuben do you consider changing the sentence source in the common voice app realistically doable? What is the future of the common voice project? Do you think Mozilla might shut it down?

@reuben
Copy link

reuben commented Oct 26, 2021

I no longer have any influence on the Common Voice project, so I can't comment on their project direction. It's also possible that the language model text source in our release models doesn't include such phrases as "alpha echo bravo charlie" and so on. Experimenting on the text side of things is cheaper and faster than augmenting the labeled speech data and re-training or fine tuning the acoustic model, so I encourage trying that first. And please report results! :D

@@ -1,4 +1,4 @@
.DS_Store
.STT_Store
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spurious replacement, .DS_Store is a macOS system file.

@@ -1,62 +1,58 @@
[![GoReportCard](http://goreportcard.com/badge/github.com/asticode/go-astideepspeech)](http://goreportcard.com/report/github.com/asticode/go-astideepspeech)
[![GoDoc](https://godoc.org/github.com/asticode/go-astideepspeech?status.svg)](https://godoc.org/github.com/asticode/go-astideepspeech)

Golang bindings for Mozilla's [DeepSpeech](https://github.com/mozilla/DeepSpeech) speech-to-text library.
Golang bindings for Mozilla's/Coqui's [STT](https://github.com/coqui-ai/STT) speech-to-text library.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mozilla has no involvement in Coqui STT.

@asticode
Copy link
Owner

Thanks for all the clarifications.

I've decided to create a new repo specifically for Coqui since I'm not comfortable using this one to do so. I've started renaming files and packages so that Coqui is mentioned instead of Deepspeech.

@go-dockly could you:

  1. Create a new PR in this new repo with almost the same changes you submitted here (minus @reuben comments + there shouldn't be any mention of Deepspeech or Mozilla anymore)?
  2. Use this PR to only indicate at the top of the README that a new repo has been created for Coqui bindings?

Cheers

@go-dockly
Copy link
Author

let's do it :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Upcoming 1.0 and renaming
4 participants