Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add user defined kaldi feature type #1101

Merged
merged 8 commits into from
Jul 21, 2023
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions lhotse/bin/modes/kaldi.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,14 @@ def kaldi():
type=int,
help="Number of jobs for computing recording durations.",
)
@click.option(
"-t",
"--feature-type",
default="kaldi-fbank",
show_default=True,
type=click.Choice(["kaldi-fbank", "kaldi-mfcc"]),
help="Feature type when importing precomputed features from feats.scp",
)
@click.option(
"-d",
"--compute-durations",
Expand All @@ -55,6 +63,7 @@ def import_(
frame_shift: float,
map_string_to_underscores: Optional[str],
num_jobs: int,
feature_type: str,
compute_durations: bool,
):
"""
Expand All @@ -70,6 +79,7 @@ def import_(
map_string_to_underscores=map_string_to_underscores,
num_jobs=num_jobs,
use_reco2dur=not compute_durations,
feature_type=feature_type,
)
manifest_dir = Path(manifest_dir)
manifest_dir.mkdir(parents=True, exist_ok=True)
Expand Down
3 changes: 2 additions & 1 deletion lhotse/kaldi.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,7 @@ def load_kaldi_data_dir(
map_string_to_underscores: Optional[str] = None,
use_reco2dur: bool = True,
num_jobs: int = 1,
feature_type: str = 'kaldi-fbank'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You also need to update

@click.option(
"-j",
"--num-jobs",
default=1,
type=int,
help="Number of jobs for computing recording durations.",
)

You can add another option for feature_type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You forgot to update the subsequent python function.

You have to add it as an argument for the function that follows.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for my carelessness, see if any further modifications need to be done.

) -> Tuple[RecordingSet, Optional[SupervisionSet], Optional[FeatureSet]]:
"""
Load a Kaldi data directory and convert it to a Lhotse RecordingSet and
Expand Down Expand Up @@ -239,7 +240,7 @@ def fix_id(t: str) -> str:

features.append(
Features(
type="kaldi_native_io",
type=feature_type,
num_frames=mat_shape.num_rows,
num_features=mat_shape.num_cols,
frame_shift=frame_shift,
Expand Down
Binary file modified test/fixtures/mini_librispeech2/lhotse/features.jsonl.gz
Binary file not shown.