Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alignment "doubling" #31

Open
aolney opened this issue May 30, 2019 · 3 comments
Open

Alignment "doubling" #31

aolney opened this issue May 30, 2019 · 3 comments

Comments

@aolney
Copy link

aolney commented May 30, 2019

align.sh is doubling output, and the times are way off. Here is the STM, which was generated from the SRT subtitles (CC) from FFMPEG:

11	A	FakeSpeaker	3.103	5.606	and now a fireside chat
11	A	FakeSpeaker	5.606	6.607	with the creators of comedy central's south park,
11	A	FakeSpeaker	6.607	8.609	matt stone and trey parker.
11	A	FakeSpeaker	13.614	15.116	hi. i'm trey parker.
11	A	FakeSpeaker	15.115	16.617	and i'm matt stone.

Here is the ali file

1-1-S0---0025.380-0032.830 1 25.38 0.06 now
1-1-S0---0025.380-0032.830 1 25.44 0.00 a
1-1-S0---0025.380-0032.830 1 25.44 0.00 fireside
1-1-S0---0025.380-0032.830 1 25.44 1.77 chat
1-1-S0---0025.380-0032.830 1 27.21 0.09 now
1-1-S0---0025.380-0032.830 1 27.30 0.36 a
1-1-S0---0025.380-0032.830 1 27.66 0.87 fireside
1-1-S0---0025.380-0032.830 1 28.53 0.24 chat
1-1-S0---0025.380-0032.830 1 28.77 0.06 now
1-1-S0---0025.380-0032.830 1 28.83 0.12 a
1-1-S0---0025.380-0032.830 1 28.95 1.08 fireside
1-1-S0---0025.380-0032.830 1 30.03 0.63 chat
1-1-S0---0025.380-0032.830 1 30.66 0.12 now
1-1-S0---0025.380-0032.830 1 30.78 0.45 a
1-1-S0---0025.380-0032.830 1 31.23 0.39 fireside
1-1-S0---0025.380-0032.830 1 31.62 1.20 chat
1-1-S0---0032.830-0051.960 1 32.83 0.03 the
1-1-S0---0032.830-0051.960 1 32.86 0.99 creators
1-1-S0---0032.830-0051.960 1 33.85 0.12 of
1-1-S0---0032.830-0051.960 1 33.97 1.29 comedy
1-1-S0---0032.830-0051.960 1 35.26 4.26 central's
1-1-S0---0032.830-0051.960 1 39.52 0.00 south
1-1-S0---0032.830-0051.960 1 39.52 3.51 <unk>
1-1-S0---0032.830-0051.960 1 43.03 0.24 the
1-1-S0---0032.830-0051.960 1 43.27 0.87 creators
1-1-S0---0032.830-0051.960 1 44.14 0.06 of
1-1-S0---0032.830-0051.960 1 44.20 1.08 comedy
1-1-S0---0032.830-0051.960 1 45.28 1.47 central's
1-1-S0---0032.830-0051.960 1 46.75 0.00 south
1-1-S0---0032.830-0051.960 1 46.75 3.09 <unk>
1-1-S0---0032.830-0051.960 1 49.84 0.18 the
1-1-S0---0032.830-0051.960 1 50.02 0.54 creators
1-1-S0---0032.830-0051.960 1 50.56 0.03 of
1-1-S0---0032.830-0051.960 1 50.59 0.00 comedy
1-1-S0---0032.830-0051.960 1 50.59 1.11 central's
1-1-S0---0051.960-0064.490 1 51.96 0.42 stone
1-1-S0---0051.960-0064.490 1 52.38 0.09 and
1-1-S0---0051.960-0064.490 1 52.47 0.33 trey
1-1-S0---0051.960-0064.490 1 52.80 0.00 <unk>
1-1-S0---0051.960-0064.490 1 52.80 0.00 stone
1-1-S0---0051.960-0064.490 1 52.80 0.27 and
1-1-S0---0051.960-0064.490 1 53.07 0.00 trey
1-1-S0---0051.960-0064.490 1 53.07 0.00 <unk>
1-1-S0---0051.960-0064.490 1 53.07 0.00 stone
1-1-S0---0051.960-0064.490 1 53.07 0.00 and
1-1-S0---0051.960-0064.490 1 53.07 0.00 trey
1-1-S0---0051.960-0064.490 1 53.07 0.00 <unk>
1-1-S0---0051.960-0064.490 1 53.07 0.00 stone

Any suggestions would be appreciated. Regular ASR functionality (with kaldi) is working fine. FWIW my steps and utils are linked to kaldi and not to eesen.

@aolney
Copy link
Author

aolney commented May 31, 2019

The problem seemed to be in the Makefile. Instead of using the STM, it was running LIUM. Below is my align.sh that seems to have fixed this problem:

#!/bin/bash

# Copyright 2016  er1k
# Apache 2.0

# Prepare data for, and run align_ctc_utts.sh script that generates word-level alignments
# in an "Eesen Transccriber-centric" way  output is found in build/output/<basename>.ali

# Required inputs:
#
# * a 'hypothesis' text file for which to compute alignments, extension .txt
#   one utterance per line. If no hypothesis text is found, text
#   is obtained from the STM file below
# * an STM file with utterance/segment timings - 'perfect' transcription
# * an audio file, extension can vary (.mp3, .wav, .mp4 etc)

BASEDIR=$(dirname $0)
EESEN_ROOT=~/eesen

# Change these if you're using different models 
#GRAPH_DIR=$EESEN_ROOT/asr_egs/tedlium/v2-30ms/data/lang_phn_test_test_newlm
GRAPH_DIR=$EESEN_ROOT/asr_egs/tedlium/v2-30ms/data/lang_phn_test
MODEL_DIR=$EESEN_ROOT/asr_egs/tedlium/v2-30ms/exp/train_phn_l5_c320_v1s

# Defaults
frame_shift=0.03  # 30 ms frames
lm_weight=0.8     # same as best setting for 30ms eesen tedlium transcriber

. path.sh
. $BASEDIR/utils/parse_options.sh

filename=$(basename "$1")
basename="${filename%.*}"
dirname=$(dirname "$1")
extension="${filename##*.}"

cd $BASEDIR
echo "In $BASEDIR"

if [ $# -ne 1 ]; then
  echo "Usage: align.sh <basename>.{wav,mp3,mp4,sph}"
  echo " in same folder is test text named <basename>.txt"
  echo " and STM file named <basename>.stm (for segments)"
  echo " ./align.sh /vagrant/GaryFlake_2010.wav"
  echo " output is build/output/<basename>.ali"
  exit 1;
fi

mkdir -p $BASEDIR/build/audio/base $BASEDIR/build/output

# un-shorten-ify SPH files
#if [ $extension == "sph" ]; then
#    sph2pipe $1 > build/audio/base/$basename.unshorten
#    sox build/audio/base/$basename.unshorten -c 1 build/audio/base/$basename.wav rate -v 16k
#fi

mkdir -p $BASEDIR/src-audio
cp $1 $BASEDIR/src-audio
#prefixing with BASEDIR throws off make rule?
#make $BASEDIR/build/audio/base/$basename.wav
make build/audio/base/$basename.wav

# 8k
# sox $1 -c 1 -e signed-integer build/audio/base/$basename.wav rate -v 8k

mkdir -p $BASEDIR/build/diarization/$basename
# make STM from cha
if [ -f $dirname/$basename.cha -a ! -f $dirname/$basename.stm ]; then
  local/cha2stm.sh $dirname/$basename.cha | sed 's/xxx/\<unk\>/g' > build/output/$basename.stm
elif [ -f $dirname/$basename.stm ]; then
  cp $dirname/$basename.stm build/output/
elif [ ! -f $dirname/$basename.stm ]; then
  echo "Needs either a .cha or .stm file to get utterances"
  exit 1
fi

#if [ ! -f $dirname/$basename.txt ]; then
#  echo "Needs .txt file with utterance per line as reference text to align"
#  exit 1
#fi

# make segments from $1.stm
cat build/output/$basename.stm | grep -v ';;' | grep -v "inter_segment_gap" | grep -v "ignore_time_segment_in_scoring" | awk '{OFMT = "%.0f"; print $1,$2,$4*100,($5-$4)*100,"M S U",$2}' > build/diarization/$basename/show.seg


# Generate features
cd $BASEDIR
rm -rf build/trans/$basename

make SEGMENTS=show.seg build/trans/$basename/fbank

# Expect test text in format with utterance IDs per line
uttdata=build/trans/$basename
#if [ -f $dirname/$basename.txt ];
#  then
#    echo "Aligning text found at $dirname/$basename.txt"
#    cat $dirname/$basename.txt | awk '{print NR" "$0}' > $uttdata/text
#  else
    echo "Aligning text found in build/output/$basename.stm"
    cat build/output/$basename.stm | awk '{$1="";$2="";$3="";$4="";$5="";$6=""; print NR$0}' \
	| sed 's/ \+/ /' > $uttdata/text
#fi
cp build/diarization/$basename/show.seg $uttdata

#local/align_ctc_multi_utts.sh --acoustic_scale 0.8 $GRAPH_DIR $GRAPH_DIR $uttdata  $MODEL_DIR $uttdata/align
#                                                   <langdir>  <data>     <uttdata> <mdldir>   <dir>
local/align_ctc_multi_utts.sh --acoustic_scale $lm_weight $GRAPH_DIR $GRAPH_DIR $uttdata  $MODEL_DIR $uttdata/align

# Copy results to someplace useful
cp $uttdata/align/ali build/output/$basename.ali

@aolney aolney closed this as completed May 31, 2019
@aolney aolney reopened this May 31, 2019
@fmetze
Copy link
Contributor

fmetze commented Jun 4, 2019

will need to look into this some other time, please let me know if you have other information or updates

@aolney
Copy link
Author

aolney commented Jun 4, 2019

Only that once the STM was properly used, the doubling issue went away. However, the alignments still seemed off.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants