-
Notifications
You must be signed in to change notification settings - Fork 765
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* add audio doc * fix typo * fix code link && punctuation * fix typo * fix features overivew link * add example * fix mfcc doc * add get_window * update code example * rm example * format * rm code example in cn
- Loading branch information
Showing
13 changed files
with
396 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
.. _cn_overview_callbacks: | ||
|
||
paddle.audio | ||
--------------------- | ||
|
||
paddle.audio 目录是飞桨在语音领域的高层 API。具体如下: | ||
|
||
- :ref:`音频特征相关 API <about_features>` | ||
- :ref:`音频处理基础函数相关 API <about_functional>` | ||
|
||
.. _about_features: | ||
|
||
音频特征相关 API | ||
:::::::::::::::::::: | ||
|
||
.. csv-table:: | ||
:header: "API 名称", "API 功能" | ||
:widths: 10, 30 | ||
|
||
" :ref:`LogMelSpectrogram <cn_api_audio_features_LogMelSpectrogram>` ", "计算语音特征 LogMelSpectrogram" | ||
" :ref:`MelSpectrogram <cn_api_audio_features_MelSpectrogram>` ", "计算语音特征 MelSpectrogram" | ||
" :ref:`MFCC <cn_api_audio_features_MFCC>` ", "计算语音特征 MFCC" | ||
" :ref:`Spectrogram <cn_api_audio_features_Spectrogram>` ", "计算语音特征 Spectrogram" | ||
|
||
.. _about_functional: | ||
|
||
音频处理基础函数相关 API | ||
:::::::::::::::::::: | ||
|
||
.. csv-table:: | ||
:header: "API 名称", "API 功能" | ||
:widths: 10, 30 | ||
|
||
" :ref:`compute_fbank_matrix <cn_api_audio_functional_compute_fbank_matrix>` ", "计算 fbank 矩阵" | ||
" :ref:`create_dct <cn_api_audio_functional_create_dct>` ", "计算离散余弦变化矩阵" | ||
" :ref:`fft_frequencies <cn_api_audio_functional_fft_frequencies>` ", "计算离散傅里叶采样频率" | ||
" :ref:`hz_to_mel<cn_api_audio_functional_hz_to_mel>` ", "转换 hz 频率为 mel 频率" | ||
" :ref:`mel_to_hz<cn_api_audio_functional_mel_to_hz>` ", "转换 mel 频率为 hz 频率" | ||
" :ref:`mel_frequencies<cn_api_audio_functional_mel_frequencies>` ", "计算 mel 频率" | ||
" :ref:`power_to_db<cn_api_audio_functional_power_to_db>` ", "转换能量谱为分贝" | ||
" :ref:`get_window<cn_api_audio_functional_get_window>` ", "得到各种窗函数" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
.. _cn_api_audio_features_LogMelSpectrogram: | ||
|
||
LogMelSpectrogram | ||
------------------------------- | ||
|
||
.. py:class:: paddle.audio.features.LogMelSpectrogram(sr=22050, n_fft=2048, hop_length=512, win_length=None, window='hann', power=2.0, center=True, pad_mode='reflect', n_mels=64, f_min=50.0, f_max=None, htk=False, norm='slaney', ref_value=1.0, amin=1e-10, top_db=None, dtype='float32') | ||
计算给定信号的 log-mel 谱。 | ||
|
||
参数 | ||
:::::::::::: | ||
|
||
- **sr** (int) - 采样率,默认 22050。 | ||
- **n_fft** (int) - 离散傅里叶变换中频率窗大小,默认 512。 | ||
- **hop_length** (int,可选) - 帧移,默认 512。 | ||
- **win_length** (int,可选) - 短时 FFT 的窗长,默认为 None。 | ||
- **window** (str) - 窗函数名,默认'hann'。 | ||
- **power** (float) - 幅度谱的指数。 | ||
- **center** (bool) - 对输入信号填充,如果 True,那么 t 以 t*hop_length 为中心,如果为 False,则 t 以 t*hop_length 开始。 | ||
- **pad_mode** (str) - 如果 center 是 True,选择填充的方式,默认值是'reflect'。 | ||
- **n_mels** (int) - mel bins 的数目。 | ||
- **f_min** (float,可选) - 最小频率(hz),默认 50.0。 | ||
- **f_max** (float,可选) - 最大频率(hz),默认为 None。 | ||
- **htk** (bool,可选) - 在计算 fbank 矩阵时是否用在 HTK 公式缩放. | ||
- **norm** (Union[str,float],可选) - 计算 fbank 矩阵时正则化的种类,默认是'slaney',你也可以 norm=0.5,使用 p-norm 正则化. | ||
- **ref_value** (float) - 参照值,如果小于 1.0,信号的 db 会被提升,相反 db 会下降,默认值为 1.0. | ||
- **amin** (float) - 输入的幅值的最小值. | ||
- **top_db** (float,可选) - log-mel 谱的最大值(db). | ||
- **dtype** (str) - 输入和窗的数据类型,默认是'float32'. | ||
|
||
|
||
返回 | ||
::::::::: | ||
|
||
计算``LogMelSpectrogram``的可调用对象. | ||
|
||
代码示例 | ||
::::::::: | ||
|
||
COPY-FROM: paddle.audio.features.layers.LogMelSpectrogram |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
.. _cn_api_audio_features_MFCC: | ||
|
||
MFCC | ||
------------------------------- | ||
|
||
.. py:class:: paddle.audio.features.MFCC(sr=22050, n_mfcc=40, n_fft=2048, hop_length=512, win_length=None, window='hann', power=2.0, center=True, pad_mode='reflect', n_mels=64, f_min=50.0, f_max=None, htk=False, norm='slaney', ref_value=1.0, amin=1e-10, top_db=None, dtype='float32') | ||
计算给定信号的 MFCC。 | ||
|
||
参数 | ||
:::::::::::: | ||
|
||
- **sr** (int,可选) - 采样率,默认 22050。 | ||
- **n_mfcc** (int,可选) - mfcc 的维度,默认 40。 | ||
- **n_fft** (int) - 离散傅里叶变换中频率窗大小,默认 512。 | ||
- **hop_length** (int,可选) - 帧移,默认 512。 | ||
- **win_length** (int,可选) - 短时 FFT 的窗长,默认为 None。 | ||
- **window** (str) - 窗函数名,默认'hann'。 | ||
- **power** (float) - 幅度谱的指数。 | ||
- **center** (bool) - 对输入信号填充,如果 True,那么 t 以 t*hop_length 为中心,如果为 False,则 t 以 t*hop_length 开始。 | ||
- **pad_mode** (str) - 如果 center 是 True,选择填充的方式,默认值是'reflect'. | ||
- **n_mels** (int) - mel bins 的数目。 | ||
- **f_min** (float,可选) - 最小频率(hz),默认 50.0。 | ||
- **f_max** (float,可选) - 最大频率(hz),默认为 None。 | ||
- **htk** (bool,可选) - 在计算 fbank 矩阵时是否用在 HTK 公式缩放。 | ||
- **norm** (Union[str, float], optional) - 计算 fbank 矩阵时正则化的种类,默认是'slaney',你也可以 norm=0.5,使用 p-norm 正则化。 | ||
- **ref_value** (float) - 参照值, 如果小于 1.0,信号的 db 会被提升, 相反 db 会下降, 默认值为 1.0。 | ||
- **amin** (float) - 输入的幅值的最小值。 | ||
- **top_db** (float,可选) - log-mel 谱的最大值(db)。 | ||
- **dtype** (str) - 输入和窗的数据类型,默认是'float32'。 | ||
|
||
返回 | ||
::::::::: | ||
|
||
计算``MFCC``的可调用对象。 | ||
|
||
代码示例 | ||
::::::::: | ||
|
||
COPY-FROM: paddle.audio.features.layers.MFCC |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
.. _cn_api_audio_features_MelSpectrogram: | ||
|
||
MelSpectrogram | ||
------------------------------- | ||
|
||
.. py:class:: paddle.audio.features.MelSpectrogram(sr=22050, n_fft=2048, hop_length=512, win_length=None, window='hann', power=2.0, center=True, pad_mode='reflect', n_mels=64, f_min=50.0, f_max=None, htk=False, norm='slaney', dtype='float32') | ||
求得给定信号的 Mel 谱。 | ||
|
||
参数 | ||
:::::::::::: | ||
|
||
- **sr** (int,可选) - 采样率,默认 22050。 | ||
- **n_fft** (int) - 离散傅里叶变换中频率窗大小,默认 512。 | ||
- **hop_length** (int,可选) - 帧移,默认 512。 | ||
- **win_length** (int,可选) - 短时 FFT 的窗长,默认为 None。 | ||
- **window** (str) - 窗函数名,默认'hann'。 | ||
- **power** (float) - 幅度谱的指数。 | ||
- **center** (bool) - 对输入信号填充,如果 True,那么 t 以 t*hop_length 为中心,如果为 False,则 t 以 t*hop_length 开始。 | ||
- **pad_mode** (str) - 如果 center 是 True,选择填充的方式.默认值是'reflect'。 | ||
- **n_mels** (int) - mel bins 的数目。 | ||
- **f_min** (float,可选) - 最小频率(hz),默认 50.0。 | ||
- **f_max** (float,可选) - 最大频率(hz),默认为 None。 | ||
- **htk** (bool,可选) - 在计算 fbank 矩阵时是否用在 HTK 公式缩放。 | ||
- **norm** (Union[str,float],可选) -计算 fbank 矩阵时正则化的种类,默认是'slaney',你也可以 norm=0.5,使用 p-norm 正则化。 | ||
- **dtype** (str) - 输入和窗的数据类型,默认是'float32'。 | ||
|
||
|
||
返回 | ||
::::::::: | ||
|
||
计算``MelSpectrogram``的可调用对象。 | ||
|
||
代码示例 | ||
::::::::: | ||
|
||
COPY-FROM: paddle.audio.features.MelSpectrogram |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
.. _cn_api_audio_features_Spectrogram: | ||
|
||
Spectrogram | ||
------------------------------- | ||
|
||
.. py:class:: paddle.audio.features.Spectrogram(n_fft=512, hop_length=512, win_length=None, window='hann', power=1.0, center=True, pad_mode='reflect', dtype='float32') | ||
通过给定信号的短时傅里叶变换得到频谱。 | ||
|
||
参数 | ||
:::::::::::: | ||
|
||
- **n_fft** (int) - 离散傅里叶变换中频率窗大小,默认 512。 | ||
- **hop_length** (int,可选) - 帧移,默认 512。 | ||
- **win_length** (int,可选) - 短时 FFT 的窗长,默认为 None。 | ||
- **window** (str) - 窗函数名,默认'hann'。 | ||
- **power** (float) - 幅度谱的指数。 | ||
- **center** (bool) - 对输入信号填充,如果 True,那么 t 以 t*hop_length 为中心,如果为 False,则 t 以 t*hop_length 开始。 | ||
- **pad_mode** (str) - 如果 center 是 True,选择填充的方式.默认值是'reflect'。 | ||
- **dtype** (str) - 输入和窗的数据类型,默认是'float32'。 | ||
|
||
|
||
返回 | ||
::::::::: | ||
|
||
计算``Spectrogram``的可调用对象. | ||
|
||
代码示例 | ||
::::::::: | ||
COPY-FROM: paddle.audio.features.Spectrogram |
30 changes: 30 additions & 0 deletions
30
docs/api/paddle/audio/functional/compute_fbank_matrix_cn.rst
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
.. _cn_api_audio_functional_compute_fbank_matrix: | ||
|
||
compute_fbank_matrix | ||
------------------------------- | ||
|
||
.. py:function:: paddle.audio.functional.compute_fbank_matrix(sr, n_fft, n_mels=64, f_min=0.0, f_max=None, htk=False, nrom='slaney', dtype='float32') | ||
计算 mel 变换矩阵。 | ||
|
||
参数 | ||
:::::::::::: | ||
|
||
- **sr** (int) - 采样率。 | ||
- **n_fft** (int) - fft bins 的数目。 | ||
- **n_mels** (float) - mels bins 的数目。 | ||
- **f_min** (float) - 最小频率(hz)。 | ||
- **f_max** (Optional[float]) -最大频率(hz)。 | ||
- **htk** (bool) -是否使用 htk 缩放。 | ||
- **norm** (Union[str,float]) -norm 的类型,默认是'slaney'。 | ||
- **dtype** (str) - 返回矩阵的数据类型,默认'float32'。 | ||
|
||
返回 | ||
::::::::: | ||
|
||
``paddle.Tensor``,Tensor shape (n_mels, n_fft//2 + 1)。 | ||
|
||
代码示例 | ||
::::::::: | ||
|
||
COPY-FROM: paddle.audio.functional.compute_fbank_matrix |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
.. _cn_api_audio_functional_create_dct: | ||
|
||
create_dct | ||
------------------------------- | ||
|
||
.. py:function:: paddle.audio.functional.create_dct(n_mfcc, n_mels, norm='ortho', dtype='float32') | ||
计算离散余弦变换矩阵。 | ||
|
||
参数 | ||
:::::::::::: | ||
|
||
- **n_mfcc** (float) - mel 倒谱系数数目。 | ||
- **n_mels** (int) - mel 的 fliterbank 数。 | ||
- **norm** (float) - 正则化类型, 默认值是'ortho'。 | ||
- **dtype** (str) - 默认'float32'。 | ||
|
||
返回 | ||
::::::::: | ||
|
||
``paddle.Tensor``,Tensor shape (n_mels, n_mfcc)。 | ||
|
||
代码示例 | ||
::::::::: | ||
|
||
COPY-FROM: paddle.audio.functional.create_dct |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
.. _cn_api_audio_functional_fft_frequencies: | ||
|
||
fft_frequencies | ||
------------------------------- | ||
|
||
.. py:function:: paddle.audio.functional.fft_frequencies(sr, n_fft, dtype='float32') | ||
计算 fft 频率。 | ||
|
||
参数 | ||
:::::::::::: | ||
|
||
- **sr** (int) - 采样率。 | ||
- **n_fft** (int) - fft bins 的数目。 | ||
- **dtype** (str) - 默认'float32'。 | ||
|
||
返回 | ||
::::::::: | ||
|
||
``paddle.Tensor``,Tensor shape (n_fft//2 + 1,)。 | ||
|
||
代码示例 | ||
::::::::: | ||
|
||
COPY-FROM: paddle.audio.functional.fft_frequencies |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
.. _cn_api_audio_functional_get_window: | ||
|
||
get_window | ||
------------------------------- | ||
|
||
.. py:function:: paddle.audio.functional.get_window(window, win_length, fftbins=True, dtype='float64') | ||
根据参数给出对应长度和类型的窗函数。 | ||
|
||
参数 | ||
:::::::::::: | ||
|
||
- **window** (str 或者 Tuple[str, float]) - 窗函数类型,或者(窗参数类型, 窗函数参数), 支持的窗函数类型'hamming', 'hann', 'kaiser', 'gaussian', 'exponential', 'triang', 'bohman', 'blackman', 'cosine', 'tukey', 'taylor'。 | ||
- **win_length** (int) - 采样点数。 | ||
- **fftbins** (bool) - 如果是 True,给出一个周期性的窗, 如果是 False 给出一个对称性的窗,默认是 True。 | ||
- **dtype** (str) - 默认'float64'。 | ||
|
||
返回 | ||
::::::::: | ||
|
||
``paddle.Tensor``,对应窗表征的 Tensor 。 | ||
|
||
代码示例 | ||
::::::::: | ||
|
||
COPY-FROM: paddle.audio.functional.get_window |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
.. _cn_api_audio_functional_hz_to_mel: | ||
|
||
hz_to_mel | ||
------------------------------- | ||
|
||
.. py:function:: paddle.audio.functional.hz_to_mel(feq, htk=False) | ||
转换 Hz 为 Mels。 | ||
|
||
参数 | ||
:::::::::::: | ||
|
||
- **freq** (Tensor, float) - 输入 tensor。 | ||
- **htk** (bool) - 是否使用 htk 缩放, 默认 False。 | ||
|
||
返回 | ||
::::::::: | ||
|
||
``paddle.Tensor 或 float``, mels 值。 | ||
|
||
代码示例 | ||
::::::::: | ||
|
||
COPY-FROM: paddle.audio.functional.hz_to_mel |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
.. _cn_api_audio_functional_mel_frequencies: | ||
|
||
mel_frequencies | ||
------------------------------- | ||
|
||
.. py:function:: paddle.audio.functional.mel_frequencies(n_mels=64, f_min=0.0, f_max=11025, htk=False, dtype='float32') | ||
计算 Mels 频率。 | ||
|
||
参数 | ||
:::::::::::: | ||
|
||
- **n_mels** (int) - 输入 tensor, 默认 64。 | ||
- **f_min** (float) - 最小频率(hz), 默认 0.0。 | ||
- **f_max** (float) - 最大频率(hz), 默认 11025.0。 | ||
- **htk** (bool) - 是否使用 htk 缩放, 默认 False。 | ||
- **dtype** (str) - 默认'float32'。 | ||
|
||
返回 | ||
::::::::: | ||
|
||
``paddle.Tensor``,Tensor shape (n_mels,)。 | ||
|
||
代码示例 | ||
::::::::: | ||
|
||
COPY-FROM: paddle.audio.functional.mel_frequencies |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
.. _cn_api_audio_functional_mel_to_hz: | ||
|
||
mel_to_hz | ||
------------------------------- | ||
|
||
.. py:function:: paddle.audio.functional.mel_to_hz(feq, htk=False) | ||
转换 Mels 为 Hz。 | ||
|
||
参数 | ||
:::::::::::: | ||
|
||
- **mel** (Tensor, float) - 输入 tensor。 | ||
- **htk** (bool) - 是否使用 htk 缩放, 默认 False。 | ||
|
||
返回 | ||
::::::::: | ||
|
||
``paddle.Tensor 或 float``, hz 为单位的频率。 | ||
|
||
代码示例 | ||
::::::::: | ||
|
||
COPY-FROM: paddle.audio.functional.mel_to_hz |
Oops, something went wrong.