This is the unofficial implementation of Vocoder part of HiFi++ : a Unified Framework for Neural Vocoding, Bandwidth Extension and Speech Enhancement.
python train.py --config config_v2.json
- iSTFTNet (https://github.com/rishikksh20/iSTFTNet-pytorch) produces better sounding audios than HiFi++.
@misc{https://doi.org/10.48550/arxiv.2203.13086,
doi = {10.48550/ARXIV.2203.13086},
url = {https://arxiv.org/abs/2203.13086},
author = {Andreev, Pavel and Alanov, Aibek and Ivanov, Oleg and Vetrov, Dmitry},
keywords = {Sound (cs.SD), Machine Learning (cs.LG), Audio and Speech Processing (eess.AS), FOS: Computer and information sciences, FOS: Computer and information sciences, FOS: Electrical engineering, electronic engineering, information engineering, FOS: Electrical engineering, electronic engineering, information engineering},
title = {HiFi++: a Unified Framework for Neural Vocoding, Bandwidth Extension and Speech Enhancement},
publisher = {arXiv},
year = {2022},
copyright = {arXiv.org perpetual, non-exclusive license}
}