Releases: mozilla/DeepSpeech
v0.5.0-alpha.5
Merge pull request #2019 from lissyx/bump-v0.5.0-alpha-5 Bump VERSION to 0.5.0-alpha.5
v0.5.0-alpha.4
Merge pull request #1969 from lissyx/bump-v0.5.0-alpha.4 Bump VERSION to v0.5.0-alpha.4
v0.5.0-alpha.3
Merge pull request #1966 from lissyx/bump-v0.5.0-alpha.3 Bump VERSION to v0.5.0-alpha.3
v0.5.0-alpha.2
Merge pull request #1951 from lissyx/force-rebuild Force rebuild
v0.5.0-alpha.1
Merge pull request #1851 from lissyx/bump-v0.5.0-alpha.1 Bump VERSION to 0.5.0-alpha.1
v0.5.0-alpha.0
Merge pull request #1846 from lissyx/bump-v0.5.0-alpha.0 Bump VERSION to 0.5.0-alpha.0
Deep Speech 0.4.1
General
This is the 0.4.1 release of Deep Speech, an open speech-to-text engine. This release includes source code
and a trained model
deepspeech-0.4.1-models.tar.gz
trained on American English which achieves an 8.26% word error rate on the LibriSpeech clean test corpus (models with "rounded" in their file name have rounded weights and those with a "*.pbmm" extension are memory mapped and much more memory efficient), and example audio
which can be used to test the engine and checkpoint files
deepspeech-0.4.1-checkpoint.tar.gz
which can be used as the basis for further fine-tuning.
Notable changes from the previous release
- Removed
initialize_from_frozen_model
flag and support code (46d1cec) - Updated to TensorFlow r1.12 (26f7bc3, d243839)
- Switched to CTC algorithm from TensorFlow's to to ctcdecode (#1679, #1693, #1696)
- Remove old AOT model (1540fa3)
- Introduced NodeJS v11.x support (5536e54, 4867d6b)
- Directly exporting TFLite ready model (5d30afd, d6642da)
- Added streaming API Support to the GUI Tool (851fb4e)
- Fixed Fisher + Switchboard importers (9aa23ed, f036691, b37f1a7, b2f967a, 6c903c8, ba62289, 81b1600)
- Fixes in Switchboard importer (8507f99, e47e344, 4efc5c6)
- Add example for Python streaming from mic with VAD (74cebb8)
- Fixed MFCC window size + stride (1df9602, 2a8128b)
- Renamed
LM_WEIGHT
andVALID_WORD_COUNT_WEIGHT
toLM_ALPHA
andLM_BETA
respectively and changed their values (fc46f43)
Hyperparameters for fine-tuning
The hyperparameters used to train the model are useful for fine tuning. Thus, we document them here along with the hardware used, a server with 8 TitanX Pascal GPUs (12GB of VRAM).
train_files
Fisher, LibriSpeech, Switchboard training corpora, as well as a pre-release snapshot of the English Common Voice training corpus.dev_files
LibriSpeech clean and other dev corpora, as well as a pre-release snapshot of the English Common Voice validation corpus.test_files
LibriSpeech clean test corpustrain_batch_size
24dev_batch_size
48test_batch_size
48epoch
30learning_rate
0.0001display_step
0validation_step
1dropout_rate
0.15checkpoint_step
1n_hidden
2048lm_alpha
0.75lm_beta
1.85
The weights with the best validation loss were selected at the end of the 30 epochs.
Bindings
This release also includes a Python based command line tool deepspeech
, installed through
pip install deepspeech
Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:
pip install deepspeech-gpu
Also, it exposes bindings for the following languages
- Python (Versions 2.7, 3.4, 3.5, 3.6 and 3.7) installed via
Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:
pip install deepspeech
pip install deepspeech-gpu
- NodeJS (Versions 4.x, 5.x, 6.x, 7.x, 8.x, 9.x, 10.x, and 11.x) installed via
Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:
npm install deepspeech
npm install deepspeech-gpu
- C++ which requires the appropriate shared objects are installed from
native_client.tar.xz
(See the section in the main README which describesnative_client.tar.xz
installation.)
In addition there are third party bindings that are supported by external developers, for example
- Rust which is installed by following the instructions on the external Rust repo.
- Go which is installed by following the instructions on the external Go repo.
Supported Platforms
- OS X 10.10, 10.11, 10.12, 10.13 and 10.14
- Linux x86 64 bit with a modern CPU (Needs at least AVX/FMA)
- Linux x86 64 bit with a modern CPU + NVIDIA GPU (Compute Capability at least 3.0, see NVIDIA docs)
- Raspbian Stretch on Raspberry Pi 3
- ARM64 built against Debian/ARMbian Stretch and tested on LePotato boards
- Java Android bindings / demo app. Early preview, tested only on Pixel 2 device, TF Lite model only
Known Issues
- Feature caching speeds training but increases memory usage
- Current
v2 TRIE
handling still triggers ~600MB memory usage
Contact/Getting Help
- FAQ - We have a list of common questions, and their answers, in our FAQ. When just getting started, it's best to first check the FAQ to see if your question is addressed.
- Discourse Forums - If your question is not addressed in the FAQ, the Discourse Forums is the next place to look. They contain conversations on General Topics, Using Deep Speech, Alternative Platforms, and Deep Speech Development.
- IRC - If your question is not addressed by either the FAQ or Discourse Forums, you can contact us on the
#machinelearning
channel on Mozilla IRC; people there can try to answer/help - Issues - Finally, if all else fails, you can open an issue in our repo if there is a bug with the current code base.
Contributors to 0.4.1 release
Deep Speech 0.4.0
General
This is the 0.4.0 release of Deep Speech, an open speech-to-text engine. This release includes source code
and a trained model
deepspeech-0.4.0-models.tar.gz
trained on American English which achieves an 8.26% word error rate on the LibriSpeech clean test corpus (The incorrect model was uploaded this will be fixed in 0.4.1) (models with "rounded" in their file name have rounded weights and those with a "*.pbmm" extension are memory mapped and much more memory efficient), and example audio
which can be used to test the engine and checkpoint files
deepspeech-0.4.0-checkpoint.tar.gz
which can be used as the basis for further fine-tuning.
Notable changes from the previous release
- Removed
initialize_from_frozen_model
flag and support code (46d1cec) - Updated to TensorFlow r1.12 (26f7bc3, d243839)
- Switched to CTC algorithm from TensorFlow's to to ctcdecode (#1679, #1693, #1696)
- Remove old AOT model (1540fa3)
- Introduced NodeJS v11.x support (5536e54, 4867d6b)
- Directly exporting TFLite ready model (5d30afd, d6642da)
- Added streaming API Support to the GUI Tool (851fb4e)
- Fixed Fisher + Switchboard importers (9aa23ed, f036691, b37f1a7, b2f967a, 6c903c8, ba62289, 81b1600)
- Fixes in Switchboard importer (8507f99, e47e344, 4efc5c6)
- Add example for Python streaming from mic with VAD (74cebb8)
- Fixed MFCC window size + stride (1df9602, 2a8128b)
Hyperparameters for fine-tuning
The hyperparameters used to train the model are useful for fine tuning. Thus, we document them here along with the hardware used, a server with 8 TitanX Pascal GPUs (12GB of VRAM).
train_files
Fisher, LibriSpeech, Switchboard training corpora, as well as a pre-release snapshot of the English Common Voice training corpus.dev_files
LibriSpeech clean and other dev corpora, as well as a pre-release snapshot of the English Common Voice validation corpus.test_files
LibriSpeech clean test corpustrain_batch_size
24dev_batch_size
48test_batch_size
48epoch
30learning_rate
0.0001display_step
0validation_step
1dropout_rate
0.15checkpoint_step
1n_hidden
2048lm_alpha
0.75lm_beta
1.85
The weights with the best validation loss were selected at the end of the 30 epochs.
Bindings
This release also includes a Python based command line tool deepspeech
, installed through
pip install deepspeech
Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:
pip install deepspeech-gpu
Also, it exposes bindings for the following languages
- Python (Versions 2.7, 3.4, 3.5, 3.6 and 3.7) installed via
Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:
pip install deepspeech
pip install deepspeech-gpu
- NodeJS (Versions 4.x, 5.x, 6.x, 7.x, 8.x, 9.x, 10.x, and 11.x) installed via
Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:
npm install deepspeech
npm install deepspeech-gpu
- C++ which requires the appropriate shared objects are installed from
native_client.tar.xz
(See the section in the main README which describesnative_client.tar.xz
installation.)
In addition there are third party bindings that are supported by external developers, for example
Supported Platforms
- OS X 10.10, 10.11, 10.12, 10.13 and 10.14
- Linux x86 64 bit with a modern CPU (Needs at least AVX/FMA)
- Linux x86 64 bit with a modern CPU + NVIDIA GPU (Compute Capability at least 3.0, see NVIDIA docs)
- Raspbian Stretch on Raspberry Pi 3
- ARM64 built against Debian/ARMbian Stretch and tested on LePotato boards
Known Issues
- Feature caching speeds training but increases memory usage
- Current
v2 TRIE
handling still triggers ~600MB memory usage - Incorrect model was uploaded to release which will be fixed in 0.4.1
Contact/Getting Help
- FAQ - We have a list of common questions, and their answers, in our FAQ. When just getting started, it's best to first check the FAQ to see if your question is addressed.
- Discourse Forums - If your question is not addressed in the FAQ, the Discourse Forums is the next place to look. They contain conversations on General Topics, Using Deep Speech, Alternative Platforms, and Deep Speech Development.
- IRC - If your question is not addressed by either the FAQ or Discourse Forums, you can contact us on the
#machinelearning
channel on Mozilla IRC; people there can try to answer/help - Issues - Finally, if all else fails, you can open an issue in our repo if there is a bug with the current code base.
Contributors to 0.4.0 release
v0.4.0-alpha.3
Merge pull request #1800 from lissyx/bump-v0.4.0-alpha.3 Bump VERSION to 0.4.0-alpha.3
v0.4.0-alpha.2
Merge pull request #1789 from lissyx/fix-nc-asset-name Move nc_asset_name to extra