Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Print best and worst results in a WER report. #2724

Merged
merged 12 commits into from
Feb 7, 2020
Merged

Conversation

DanBmh
Copy link
Contributor

@DanBmh DanBmh commented Feb 5, 2020

This is also correcting the error that the results with the highest WER instead of the lowest WER are printed.

@community-tc-integration
Copy link

No Taskcluster jobs started for this pull request
The `allowPullRequests` configuration for this repository (in `.taskcluster.yml` on the
default branch) does not allow starting tasks for this pull request.

@lissyx
Copy link
Collaborator

lissyx commented Feb 5, 2020

@DanBmh Can I ask you why you think it's useful to print the best ones ?

evaluate.py Outdated Show resolved Hide resolved
@DanBmh
Copy link
Contributor Author

DanBmh commented Feb 5, 2020

I didnt know for some time that the examples are the worst predictions from the test set. I thought they were chosen randomly and i was wondering why they were always so bad. So i did ignore the examples after a while. After i found the flag description (which by the way says they are the best results as lower is better) i understood they are the worst results, This is makes more sense than just printing random ones too.

So i think printing both the best and the worst will provide new users with an intuitive way to see that we output the worst results.
Another benefit is that if you have a bad performing network you can still see the network learnt something:)

Maybe the best idea is to print median results too, so that you get a more realistic estimate of the prediction quality? Like this:
1
2
[...]
5
6
[...]
9
10

@victornoriega
Copy link

I think that when you're modeling and trying a lot of configurations, you want to know the flaws of your model but also the best it can show in which cases it can excel. I also think this should be an additional flag to DeepSpeech.py, not something by default.

@lissyx
Copy link
Collaborator

lissyx commented Feb 6, 2020

Maybe the best idea is to print median results too, so that you get a more realistic estimate of the prediction quality? Like this:
1
2
[...]
5
6
[...]
9
10

It makes sense, but I have to admit this is not something that ever crossed our mind.

evaluate.py Outdated Show resolved Hide resolved
evaluate.py Outdated Show resolved Hide resolved
evaluate.py Outdated Show resolved Hide resolved
evaluate.py Outdated Show resolved Hide resolved
evaluate.py Outdated Show resolved Hide resolved
util/flags.py Outdated Show resolved Hide resolved
@lissyx
Copy link
Collaborator

lissyx commented Feb 6, 2020

@DanBmh You might need to factorize / apply that to evaluate.py as well as evaluate_tflite.py.

@DanBmh
Copy link
Contributor Author

DanBmh commented Feb 6, 2020

evaluate_tflite.py does not print any samples. Shall i add it?

@reuben
Copy link
Contributor

reuben commented Feb 6, 2020

Thanks for the PR!

@lissyx
Copy link
Collaborator

lissyx commented Feb 6, 2020

evaluate_tflite.py does not print any samples. Shall i add it?

I'd say no then.

evaluate_tflite.py Outdated Show resolved Hide resolved
util/evaluate_tools.py Outdated Show resolved Hide resolved
util/flags.py Show resolved Hide resolved
Copy link
Collaborator

@lissyx lissyx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@lissyx
Copy link
Collaborator

lissyx commented Feb 6, 2020

I've triggered some PR to have TaskCluster running on that.

@lissyx lissyx requested a review from reuben February 6, 2020 14:07
@lissyx lissyx merged commit 33efd9b into mozilla:master Feb 7, 2020
@lissyx
Copy link
Collaborator

lissyx commented Feb 7, 2020

Thanks @DanBmh !

@lock
Copy link

lock bot commented Mar 10, 2020

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants