Swedish changes (#1242) #7
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Made the same modifications to sprakbanken as @jtrmal suggested for sprakbanken_swe and removed deprecated commands from run.sh
Modified python scripts called by sprak_data_prep.sh so they work with python 2 and 3 on the request of @jtrmal (I think they are slower now because we use more regexes). Changed the preprocessing so case is not normalised and altered default behaviour to delete sentence-final '.' rather than convert to a token because it is more often the case that they are not spoken aloud.
Modified run.sh and tuned #leaves and #Gauss on dev set for for GMM-based systems. Changed the scoring scripts in local/ to be similar to WSJ to get better analyses and changed the local/wer* scripts to fit this recipe.
Modify the filters in local/wer_* so they remove accents and umlauts, but particular Danish characters. Corrected error in previous commit that changes openfst version tools/Makefile
Added new lexicon from openslr to copy_dict.sh