Skip to content

Conversation

@naxingyu
Copy link
Contributor

@naxingyu naxingyu commented Oct 10, 2016

Place holder for addressing #1031 . WIP log:

  1. self_loop_pdf_class added to HmmState, done
  2. self_loop_pdf added to Tuple in TransitionModel. done
  3. another branch of ContextDependencyInterface::GetPdfInfo. ugly done
  4. create test code for new structures. done
  5. back compatability for all read code. done
  6. normal HMM validation using RM. done
  7. chain code modification. done
  8. chain validation using RM. done
  9. iterate 2nd version of GetPdfInfo. done
  10. documents and comments. tbd...

@danpovey
Copy link
Contributor

@naxingyu, any progress on this? Any intermediate work to push?

@naxingyu
Copy link
Contributor Author

Yes, I'll commit something today.

X.

On 2016/10/17 3:38, Daniel Povey wrote:

@naxingyu https://github.com/naxingyu, any progress on this? Any
intermediate work to push?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1105 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADKpxD21EKt3yi-S3tE9N5g7dJ2XwYthks5q0n0ngaJpZM4KSXb2.

@naxingyu
Copy link
Contributor Author

@danpovey It's still a heavy draft, lots of todos, tuple-related modification, testing, documentation, etc., but the basic idea is structured. The 2nd version of GetPdfInfo is implemented using brute force enumeration. Please check if I get the intension correct or not.

Copy link
Contributor

@danpovey danpovey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice from the travis build that hmm-utils.cc failed.

entries_[i].resize(thist_sz);
for (int32 j = 0 ; j < thist_sz; j++) {
ReadBasicType(is, binary, &(entries_[i][j].pdf_class));
ReadBasicType(is, binary, &(entries_[i][j].self_loop_pdf_class));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remember the reading code needs to be back compatible to previously written models.

KALDI_ERR << "HmmTopology::Check(), duplicate transition found.";
if (dst_state == k) { // self_loop...
KALDI_ASSERT(entries_[i][j].pdf_class != kNoPdf && "Nonemitting states cannot have self-loops.");
KALDI_ASSERT(entries_[i][j].self_loop_pdf_class != kNoPdf && "Nonemitting states cannot have self-loops.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line length.

/// but may be different to enable us to hardwire sharing of state, and may be
/// equal to \ref kNoPdf == -1 in order to specify nonemitting states (unusual).
int32 pdf_class;
int32 self_loop_pdf_class;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should probably have a comment for this.


explicit HmmState(int32 p): pdf_class(p) { }
explicit HmmState(int32 p): pdf_class(p),self_loop_pdf_class(p) { }
explicit HmmState(int32 p, int32 self_p) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to rename these to pdf_class and self_loop_pdf_class [self-constructor still works even though it's the same as the variable names.]

}

HmmState(): pdf_class(-1) { }
HmmState(): pdf_class(-1),self_loop_pdf_class(-1) { }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space after comma, and put 2 spaces before HmmState().

to_hmm_state_list[std::make_pair(phone, pdf_class)].push_back(j);

if (SelfLoopEqualsForward()) {
// this branch deals with when self_loop_pdf_class is always the same as the pdf_class
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could comment that this is the case for normal models, but not for chain models.

for (int32 j = 0; j < static_cast<int32>(pdf_info[i].size()); j++) {
int32 pdf_class = pdf_class_pairs[i][j].first,
self_loop_pdf_class = pdf_class_pairs[i][j].second;
const std::vector<int32> &state_vec = to_hmm_state_list[phone][std::make_pair(pdf_class, self_loop_pdf_class)];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line length

}
} else {
std::vector<std::vector<std::vector<std::pair<int32, int32> > > > pdf_info;
std::vector<std::vector<std::pair<int32, int32> > > pdf_class_pairs;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need a comment saying what pdf_info represents.-- and of course you have to figure out pdf_class_pairs, and also explain what it is.

@naxingyu naxingyu force-pushed the modify-transition-model branch 3 times, most recently from 2511125 to aaeb2e4 Compare October 21, 2016 02:15
@naxingyu
Copy link
Contributor Author

@danpovey comments are resolved. And I have updated the task list here.

@danpovey
Copy link
Contributor

Thanks!
I'll check it more thoroughly when it's done, or closer to done.

@naxingyu naxingyu force-pushed the modify-transition-model branch from aaeb2e4 to 4a3a250 Compare October 22, 2016 02:30
@naxingyu
Copy link
Contributor Author

naxingyu commented Oct 25, 2016

While validating the gmm models, I found that this line only works in Perl 5.12 and later. Modifying the while loop to

@ans = readdir $dh;

will work for most version of Perl.

@naxingyu
Copy link
Contributor Author

@danpovey comments are resolved.
I have updated the task list here
#1105 (comment)
please see if it make sense.

On 2016/10/19 2:31, Daniel Povey wrote:

@danpovey commented on this pull request.

I notice from the travis build that hmm-utils.cc failed.


In src/hmm/hmm-topology.cc
#1105 (review):

@@ -125,6 +132,8 @@ void HmmTopology::Read(std::istream &is, bool binary) {
entries_[i].resize(thist_sz);
for (int32 j = 0 ; j < thist_sz; j++) {
ReadBasicType(is, binary, &(entries_[i][j].pdf_class));

  •    ReadBasicType(is, binary, &(entries_[i][j].self_loop_pdf_class));
    

remember the reading code needs to be back compatible to previously
written models.


In src/hmm/hmm-topology.cc
#1105 (review):

@@ -248,7 +261,7 @@ void HmmTopology::Check() {
if (seen_transition.count(dst_state) != 0)
KALDI_ERR << "HmmTopology::Check(), duplicate transition found.";
if (dst_state == k) { // self_loop...

  •      KALDI_ASSERT(entries_[i][j].pdf_class != kNoPdf && "Nonemitting states cannot have self-loops.");
    
  •      KALDI_ASSERT(entries_[i][j].self_loop_pdf_class != kNoPdf && "Nonemitting states cannot have self-loops.");
    

line length.


In src/hmm/hmm-topology.h
#1105 (review):

@@ -99,19 +99,28 @@ class HmmTopology {
/// but may be different to enable us to hardwire sharing of state, and may be
/// equal to \ref kNoPdf == -1 in order to specify nonemitting states (unusual).
int32 pdf_class;

  • int32 self_loop_pdf_class;

you should probably have a comment for this.


In src/hmm/hmm-topology.h
#1105 (review):

  /// A list of transitions, indexed by what we call a 'transition-index'.
  /// The first member of each pair is the index of the next HmmState, and the
  /// second is the default transition probability (before training).
  std::vector<std::pair<int32, BaseFloat> > transitions;
  • explicit HmmState(int32 p): pdf_class(p) { }
  • explicit HmmState(int32 p): pdf_class(p),self_loop_pdf_class(p) { }
  • explicit HmmState(int32 p, int32 self_p) {

better to rename these to pdf_class and self_loop_pdf_class
[self-constructor still works even though it's the same as the
variable names.]


In src/hmm/hmm-topology.h
#1105 (review):

 }
  • HmmState(): pdf_class(-1) { }
  • HmmState(): pdf_class(-1),self_loop_pdf_class(-1) { }

space after comma, and put 2 spaces before HmmState().


In src/hmm/transition-model.cc
#1105 (review):

- // can correspond to.

  • std::map<std::pair<int32, int32>, std::vector > to_hmm_state_list;
  • // to_hmm_state_list is a map from (phone, pdf_class) to the list
  • // of hmm-states in the HMM for that phone that that (phone, pdf-class)
  • // can correspond to.
  • for (size_t i = 0; i < phones.size(); i++) { // setting up to_hmm_state_list.
  • int32 phone = phones[i];
  • const HmmTopology::TopologyEntry &entry = topo_.TopologyForPhone(phone);
  • for (int32 j = 0; j < static_cast(entry.size()); j++) { // for each state...
  •  int32 pdf_class = entry[j].pdf_class;
    
  •  if (pdf_class != kNoPdf) {
    
  •    to_hmm_state_list[std::make_pair(phone, pdf_class)].push_back(j);
    
  • if (SelfLoopEqualsForward()) {
  • // this branch deals with when self_loop_pdf_class is always the same as the pdf_class

you could comment that this is the case for normal models, but not for
chain models.


In src/hmm/transition-model.cc
#1105 (review):

  • // now triples_ is populated with all possible triples of (phone, hmm_state, pdf).
  • std::sort(triples_.begin(), triples_.end()); // sort to enable reverse lookup.
  • for (int32 i = 0; i < static_cast(pdf_info.size()); i++) {
  •  int32 phone = phones[i];
    
  •  for (int32 j = 0; j < static_cast<int32>(pdf_info[i].size()); j++) {
    
  • int32 pdf_class = pdf_class_pairs[i][j].first,
  •      self_loop_pdf_class = pdf_class_pairs[i][j].second;
    
  •    const std::vector<int32> &state_vec = to_hmm_state_list[phone][std::make_pair(pdf_class, self_loop_pdf_class)];
    

line length


In src/hmm/transition-model.cc
#1105 (review):

   }
  }
  • }
  • } else {
  • std::vector<std::vector<std::vector<std::pair<int32, int32> > > > pdf_info;
  • std::vector<std::vector<std::pair<int32, int32> > > pdf_class_pairs;

need a comment saying what pdf_info represents.-- and of course you
have to figure out pdf_class_pairs, and also explain what it is.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1105 (review),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADKpxLfs-z9yPJRXbgZY2CjqYcHS0TiNks5q1RCFgaJpZM4KSXb2.

@danpovey
Copy link
Contributor

Regarding readdir, do you have time to make a separate PR to fix that? Or is the fix in this PR?
Check the Travis logs, the checks failed. I'll look into this when it's all finished-- sorry, I'm just super busy right now. I'm sure any problems will be minor.

@naxingyu
Copy link
Contributor Author

I'm running the rm recipe. Modified version created exactly same model,
graph and alignments with the original version on mono and tri1 stages.
Difference occur on tri2b model. Debugging...

X.

在 2016/10/21 10:29, Daniel Povey 写道:

Thanks!
I'll check it more thoroughly when it's done, or closer to done.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1105 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADKpxIIxalyisXWlae8A94DpeRZ4er48ks5q2COXgaJpZM4KSXb2.

@danpovey
Copy link
Contributor

As long as the WER is in the same ballpark it's OK. None of this is
completely deterministic, esp. if you have machines with different OS
versions or hardware in your cluster. [differences in BLAS, for instance.]

On Tue, Oct 25, 2016 at 2:54 PM, Xingyu Na notifications@github.com wrote:

I'm running the rm recipe. Modified version created exactly same model,
graph and alignments with the original version on mono and tri1 stages.
Difference occur on tri2b model. Debugging...

X.

在 2016/10/21 10:29, Daniel Povey 写道:

Thanks!
I'll check it more thoroughly when it's done, or closer to done.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1105 (comment),
or mute the thread
<https://github.com/notifications/unsubscribe-auth/
ADKpxIIxalyisXWlae8A94DpeRZ4er48ks5q2COXgaJpZM4KSXb2>.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1105 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADJVu1Ns1CKnJ2ifBXF7Z165UUqSEh6_ks5q3lBvgaJpZM4KSXb2
.

@naxingyu
Copy link
Contributor Author

I'll make a separate PR for readdir.
About the Travis failure, yes, that was a trial commit. Now there is this weird thing. Please check this when you are not busy. Here it is: We have this inline member function. If defined in this way, it returns wrong pdf_id, even if the tuples_ is storing the correct transition information, as I checked. This results in incorrect LDA accumulaters, leading to the tri2b flaw that I reported. Now if I define it as a non-inline member function, put the definition in .cc file, the whole toolkit compiles flawless and the RM results of all non-chain stages are correct, when running on my cluster. However, when I push the change, Travis throws this failure.

/home/travis/build/kaldi-asr/kaldi/src/transform/fmpe.cc:540: undefined reference to `kaldi::TransitionModel::TransitionIdToPdf(int) const'
collect2: error: ld returned 1 exit status

@danpovey
Copy link
Contributor

Probably it didn't work on travis because you left 'inline' in the declaration.
In any case, your original problem could have been fixed by doing 'make depend' and recompiling. This kind of thing happens when a class size changes due to new members, and code is not recompiled.

@naxingyu naxingyu force-pushed the modify-transition-model branch from 703cb90 to bbfef0a Compare October 26, 2016 04:40
@naxingyu
Copy link
Contributor Author

On 2016/10/26 10:47, Daniel Povey wrote:

Probably it didn't work on travis because you left 'inline' in the
declaration.

No, I removed 'inline' in the declaration. Otherwise it won't work on my
cluster.

In any case, your original problem could have been fixed by doing
'make depend' and recompiling. This kind of thing happens when a class
size changes due to new members, and code is not recompiled.

OK I'll try that. Thanks. Will keep you posted.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1105 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADKpxD-I_Dr4EWC0SjrDlWG77eJt2mwBks5q3r8lgaJpZM4KSXb2.

@naxingyu
Copy link
Contributor Author

naxingyu commented Nov 1, 2016

I'm stuck in building chain tree for a few days. I started with such a topo.

<Topology>
<TopologyEntry>
<ForPhones>
1 2 3 4 5 6 7 8 9 10 11 12 13 ... 192 193
</ForPhones>
<State> 0 <PdfClass> 0 <SelfLoopPdfClass> 1 <Transition> 0 0.5 <Transition> 1 0.5 </State>
<State> 1 </State>
</TopologyEntry>
</Topology>

The mono.{mdl,tree} are correctly initialized with tuple-type transitions.

Transition-state 1: phone = sil hmm-state = 0 pdf = 0 self-loop-pdf = 1
 Transition-id = 1 p = 0.5 [self-loop]
 Transition-id = 2 p = 0.5 [0 -> 1]
Transition-state 2: phone = sil_B hmm-state = 0 pdf = 0 self-loop-pdf = 1
 Transition-id = 3 p = 0.5 [self-loop]
 Transition-id = 4 p = 0.5 [0 -> 1]
Transition-state 3: phone = sil_E hmm-state = 0 pdf = 0 self-loop-pdf = 1
 Transition-id = 5 p = 0.5 [self-loop]
 Transition-id = 6 p = 0.5 [0 -> 1]
Transition-state 4: phone = sil_I hmm-state = 0 pdf = 0 self-loop-pdf = 1
 Transition-id = 7 p = 0.5 [self-loop]
 Transition-id = 8 p = 0.5 [0 -> 1]
Transition-state 5: phone = sil_S hmm-state = 0 pdf = 0 self-loop-pdf = 1
 Transition-id = 9 p = 0.5 [self-loop]
 Transition-id = 10 p = 0.5 [0 -> 1]
Transition-state 6: phone = aa_B hmm-state = 0 pdf = 2 self-loop-pdf = 3
 Transition-id = 11 p = 0.5 [self-loop]
 Transition-id = 12 p = 0.5 [0 -> 1]
Transition-state 7: phone = aa_E hmm-state = 0 pdf = 2 self-loop-pdf = 3
 Transition-id = 13 p = 0.5 [self-loop]
...

And the converted alignments seems good. However, the output tree does not contain SE -1 nodes, meaning that the pdf-class is never queried in the final tree. Therefore the new GetPdfInfo is pooling same vectors of pdf-ids for pdf-class and self-loop-pdf-class. Currently I'm tracking the BuildTree and mostly the EventMap code to see what went wrong.

@danpovey
Copy link
Contributor

danpovey commented Nov 1, 2016

The place where it's going wrong might be in acc-tree-stats.
Look at tree-accu.cc line 90 where it does:

    int32 pdf_class = trans_model.TransitionIdToPdfClass(
        split_alignment[i+info.central_position][j]);
    // pdf_class will normally by 0, 1 or 2 for 3-state HMM.


    std::pair<EventKeyType, EventValueType> pr(kPdfClass, pdf_class);
    evec_more.push_back(pr);

The problem might be in the function TransitionIdToPdfClass().
Remember that you can no longer look up the pdf-class via the
transition-state, you have to take into account whether it's a self loop.

Dan

On Tue, Nov 1, 2016 at 5:00 AM, Xingyu Na notifications@github.com wrote:

I'm stuck in building chain tree for a few days. I started with such a
topo.

1 2 3 4 5 6 7 8 9 10 11 12 13 ... 192 193 0 0 1 0 0.5 1 0.5 1

The mono.{mdl,tree} are correctly initialized with tuple-type transitions.

Transition-state 1: phone = sil hmm-state = 0 pdf = 0 self-loop-pdf = 1
Transition-id = 1 p = 0.5 [self-loop]
Transition-id = 2 p = 0.5 [0 -> 1]
Transition-state 2: phone = sil_B hmm-state = 0 pdf = 0 self-loop-pdf = 1
Transition-id = 3 p = 0.5 [self-loop]
Transition-id = 4 p = 0.5 [0 -> 1]
Transition-state 3: phone = sil_E hmm-state = 0 pdf = 0 self-loop-pdf = 1
Transition-id = 5 p = 0.5 [self-loop]
Transition-id = 6 p = 0.5 [0 -> 1]
Transition-state 4: phone = sil_I hmm-state = 0 pdf = 0 self-loop-pdf = 1
Transition-id = 7 p = 0.5 [self-loop]
Transition-id = 8 p = 0.5 [0 -> 1]
Transition-state 5: phone = sil_S hmm-state = 0 pdf = 0 self-loop-pdf = 1
Transition-id = 9 p = 0.5 [self-loop]
Transition-id = 10 p = 0.5 [0 -> 1]
Transition-state 6: phone = aa_B hmm-state = 0 pdf = 2 self-loop-pdf = 3
Transition-id = 11 p = 0.5 [self-loop]
Transition-id = 12 p = 0.5 [0 -> 1]
Transition-state 7: phone = aa_E hmm-state = 0 pdf = 2 self-loop-pdf = 3
Transition-id = 13 p = 0.5 [self-loop]
...

And the converted alignments seems good. However, the output tree does not
contain SE -1 nodes, meaning that the pdf-class is never queried in the
final tree. Therefore the new GetPdfInfo is pooling same vectors of pdf-ids
for pdf-class and self-loop-pdf-class. Currently I'm tracking the BuildTree
and mostly the EventMap code to see what went wrong.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1105 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADJVu9eIBz4xjPVm3DrizeKec90jabqBks5q5v-ogaJpZM4KSXb2
.

@naxingyu
Copy link
Contributor Author

naxingyu commented Nov 2, 2016

Nice. A condition of IsSelfLoop() before getting pdf_class works. :)

X.

On 2016/11/2 7:15, Daniel Povey wrote:

The place where it's going wrong might be in acc-tree-stats.
Look at tree-accu.cc line 90 where it does:

int32 pdf_class = trans_model.TransitionIdToPdfClass(
split_alignment[i+info.central_position][j]);
// pdf_class will normally by 0, 1 or 2 for 3-state HMM.

std::pair<EventKeyType, EventValueType> pr(kPdfClass, pdf_class);
evec_more.push_back(pr);

The problem might be in the function TransitionIdToPdfClass().
Remember that you can no longer look up the pdf-class via the
transition-state, you have to take into account whether it's a self loop.

Dan

On Tue, Nov 1, 2016 at 5:00 AM, Xingyu Na notifications@github.com
wrote:

I'm stuck in building chain tree for a few days. I started with such a
topo.

1 2 3 4 5 6 7 8 9 10 11 12 13 ... 192 193 0 0 1 0 0.5 1 0.5 1

The mono.{mdl,tree} are correctly initialized with tuple-type
transitions.

Transition-state 1: phone = sil hmm-state = 0 pdf = 0 self-loop-pdf = 1
Transition-id = 1 p = 0.5 [self-loop]
Transition-id = 2 p = 0.5 [0 -> 1]
Transition-state 2: phone = sil_B hmm-state = 0 pdf = 0
self-loop-pdf = 1
Transition-id = 3 p = 0.5 [self-loop]
Transition-id = 4 p = 0.5 [0 -> 1]
Transition-state 3: phone = sil_E hmm-state = 0 pdf = 0
self-loop-pdf = 1
Transition-id = 5 p = 0.5 [self-loop]
Transition-id = 6 p = 0.5 [0 -> 1]
Transition-state 4: phone = sil_I hmm-state = 0 pdf = 0
self-loop-pdf = 1
Transition-id = 7 p = 0.5 [self-loop]
Transition-id = 8 p = 0.5 [0 -> 1]
Transition-state 5: phone = sil_S hmm-state = 0 pdf = 0
self-loop-pdf = 1
Transition-id = 9 p = 0.5 [self-loop]
Transition-id = 10 p = 0.5 [0 -> 1]
Transition-state 6: phone = aa_B hmm-state = 0 pdf = 2 self-loop-pdf = 3
Transition-id = 11 p = 0.5 [self-loop]
Transition-id = 12 p = 0.5 [0 -> 1]
Transition-state 7: phone = aa_E hmm-state = 0 pdf = 2 self-loop-pdf = 3
Transition-id = 13 p = 0.5 [self-loop]
...

And the converted alignments seems good. However, the output tree
does not
contain SE -1 nodes, meaning that the pdf-class is never queried in the
final tree. Therefore the new GetPdfInfo is pooling same vectors of
pdf-ids
for pdf-class and self-loop-pdf-class. Currently I'm tracking the
BuildTree
and mostly the EventMap code to see what went wrong.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub

#1105 (comment),
or mute
the thread

https://github.com/notifications/unsubscribe-auth/ADJVu9eIBz4xjPVm3DrizeKec90jabqBks5q5v-ogaJpZM4KSXb2
.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1105 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADKpxMWlhzTQ9-vuq27NHTGcC_pootN-ks5q58gegaJpZM4KSXb2.

@naxingyu
Copy link
Contributor Author

naxingyu commented Nov 2, 2016

Now that I get pdf_ids and self_loop_pdf_ids separately for the central
phone and the brute-force combination is making more transition states
than necessary, I see what your proposition really mean...
That's smart. Thanks.

X.

On 2016/11/2 7:15, Daniel Povey wrote:

The place where it's going wrong might be in acc-tree-stats.
Look at tree-accu.cc line 90 where it does:

int32 pdf_class = trans_model.TransitionIdToPdfClass(
split_alignment[i+info.central_position][j]);
// pdf_class will normally by 0, 1 or 2 for 3-state HMM.

std::pair<EventKeyType, EventValueType> pr(kPdfClass, pdf_class);
evec_more.push_back(pr);

The problem might be in the function TransitionIdToPdfClass().
Remember that you can no longer look up the pdf-class via the
transition-state, you have to take into account whether it's a self loop.

Dan

On Tue, Nov 1, 2016 at 5:00 AM, Xingyu Na notifications@github.com
wrote:

I'm stuck in building chain tree for a few days. I started with such a
topo.

1 2 3 4 5 6 7 8 9 10 11 12 13 ... 192 193 0 0 1 0 0.5 1 0.5 1

The mono.{mdl,tree} are correctly initialized with tuple-type
transitions.

Transition-state 1: phone = sil hmm-state = 0 pdf = 0 self-loop-pdf = 1
Transition-id = 1 p = 0.5 [self-loop]
Transition-id = 2 p = 0.5 [0 -> 1]
Transition-state 2: phone = sil_B hmm-state = 0 pdf = 0
self-loop-pdf = 1
Transition-id = 3 p = 0.5 [self-loop]
Transition-id = 4 p = 0.5 [0 -> 1]
Transition-state 3: phone = sil_E hmm-state = 0 pdf = 0
self-loop-pdf = 1
Transition-id = 5 p = 0.5 [self-loop]
Transition-id = 6 p = 0.5 [0 -> 1]
Transition-state 4: phone = sil_I hmm-state = 0 pdf = 0
self-loop-pdf = 1
Transition-id = 7 p = 0.5 [self-loop]
Transition-id = 8 p = 0.5 [0 -> 1]
Transition-state 5: phone = sil_S hmm-state = 0 pdf = 0
self-loop-pdf = 1
Transition-id = 9 p = 0.5 [self-loop]
Transition-id = 10 p = 0.5 [0 -> 1]
Transition-state 6: phone = aa_B hmm-state = 0 pdf = 2 self-loop-pdf = 3
Transition-id = 11 p = 0.5 [self-loop]
Transition-id = 12 p = 0.5 [0 -> 1]
Transition-state 7: phone = aa_E hmm-state = 0 pdf = 2 self-loop-pdf = 3
Transition-id = 13 p = 0.5 [self-loop]
...

And the converted alignments seems good. However, the output tree
does not
contain SE -1 nodes, meaning that the pdf-class is never queried in the
final tree. Therefore the new GetPdfInfo is pooling same vectors of
pdf-ids
for pdf-class and self-loop-pdf-class. Currently I'm tracking the
BuildTree
and mostly the EventMap code to see what went wrong.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub

#1105 (comment),
or mute
the thread

https://github.com/notifications/unsubscribe-auth/ADJVu9eIBz4xjPVm3DrizeKec90jabqBks5q5v-ogaJpZM4KSXb2
.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1105 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADKpxMWlhzTQ9-vuq27NHTGcC_pootN-ks5q58gegaJpZM4KSXb2.

@naxingyu
Copy link
Contributor Author

naxingyu commented Nov 2, 2016

Suppose, for phone aa_B, I have 14 forward pdfs and 12 self-loop pdfs. In the old format, there will be about 26 transition states. In the new format, I need to find possible combinations in the contexts of this phone from the 168 candidates. A naive progressive algorithm is

for phones:
    query forward_pdfs
    if forward_pdf more than 1:
        for phones:
            add phone to right context
            query forward_pdfs
            if forward_pdf more than 1:
                for phones:
                    add phone to left context
                    query forward_pdf (singleton)
                    query self_loop_pdf (singleton)
                    push (forward_pdf, self_loop_pdf) pair to pdf_info
            else:
                query self_loop_pdfs using partial context
                push (forward_pdfs, self_loop_pdfs) pairs to pdf_info
    else:
        query self_loop_pdfs using partial (only central) context
        push (forward_pdfs, self_loop_pdfs) pairs to pdf_info
    sort and uniq pdf_info of this phone

Do you think that this could shrink the number of pdf pairs from 168 to nearly half of the old format, ie 13? I'm trying though...

@naxingyu
Copy link
Contributor Author

naxingyu commented Nov 2, 2016

The example above ended with 53 transition states, compared with 26 transition states using the old topology. And that will double the size of the HMM fst. Is there something wrong with the algorithm?

@danpovey
Copy link
Contributor

danpovey commented Nov 2, 2016

A larger number of transition-states is expected. It may not impact the
size of the graph as much as you think-- most states of the graph have
known left and right context as they are parts of linear chains, and having
more transition states does not necessarily blow the graph up
proportionally.

Regarding your algorithm-- I have a couple of issues with it: it treats the
forward and self-loop pdf differently; and it seems to make some
assumptions about the left and right context being symmetric.

I would use a recursive algorithm: something like the following
(I'm not adding all args to the function; assume the central-position P,
the phone list and the tree, etc., are known, or are class members.

// 'context' is the context-window of phones, of
// length N, with -1 for those positions where phones
// that are currently unknown, treated as wildcards; at least
// the central phone [position P] must be a real phone, i.e.
// not -1.
// This function inserts any allowed pairs (forward_pdf, self_loop_pdf)
// to the set "pairs".
void EnumeratePairs(int32 self_loop_pdf_class,
int32 forward_pdf_class,
const std::vector &context,
std::unordered_set<pair<int32,int32>, PairHasher> *pairs) {
int32 position;
// Choose 'position' as a phone position in 'context' that's currently
-1, and that is as close as possible to the central position P.
std::vector new_context(context);
for each phone {
new_context[position] = phone;
std::vector forward_pdfs, self_loop_pdfs;
// Query the tree twice using MultiMap to get the lists of possible
pdfs 'forward_pdfs' and 'self_loop_pdfs' [only the pdf-class will differ in
the 2 calls.]
if(forward_pdfs.size() == 1 || self_loop_pdfs.size() == 1) {
// trivially enumerate the pairs and insert them into 'pairs'.
} else {
// call EnumeratePairs with 'new_context'
}
}
}

On Wed, Nov 2, 2016 at 5:14 AM, Xingyu Na notifications@github.com wrote:

The example above ended with 53 transition states, compared with 26
transition states using the old topology. And that will double the size of
the HMM fst. Is there something wrong with the algorithm?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1105 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADJVu-XmpRk8xEM_En3DI6b9WcVJPhhZks5q6FR6gaJpZM4KSXb2
.

Copy link
Contributor

@danpovey danpovey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments on this work in progress commit.

leftmost_questions_truncate=10
tree_stats_opts=
cluster_phones_opts=
cluster_phones_opts="--pdf-class-list=0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was it necessary for some reason to change this?
This will have the effect that for finding the phone classes, it just uses the 1st transition of the phone, rather than the self-loop. it's not clear to me that this would be better.
Actually, I can see why this would have been necessary when you had that bug and the tree was not being built right, but this is probably not needed now.

cp $alidir/cmvn_opts $dir 2>/dev/null # cmn/cmvn option.
cp $alidir/delta_opts $dir 2>/dev/null # delta option.

utils/lang/check_phones_compatible.sh $lang/phones.txt $alidir/phones.txt || exit 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason you deleted these lines? Perhaps something went wrong in merge?

print("<State> 1 <PdfClass> 1 <Transition> 1 0.5 <Transition> 2 0.5 </State>")
print("<State> 2 </State>")
print("<State> 0 <PdfClass> 0 <SelfLoopPdfClass> 1 <Transition> 0 0.5 <Transition> 1 0.5 </State>")
#print("<State> 1 <PdfClass> 1 <Transition> 1 0.5 <Transition> 2 0.5 </State>")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can remove this commented-out line. But you might want to put a comment at the top saying that this script was modified around such-and-such a date, when the code was extended to support having a different pdf-class on the self loop.

} else {
WriteIntegerVector(os, binary, phones_);
WriteIntegerVector(os, binary, phone2idx_);
if (!is_hmm) WriteBasicType(os, binary, static_cast<int32>(-1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment that this -1 is a signal that the object has the new, extended format with SelfLoopPdfClass.

if (pdf_class != kNoPdf) {
to_hmm_state_list[std::make_pair(phone, pdf_class)].push_back(j);

if (IsHmm()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is too long.. you should probably break it out into a couple of different functions at least. If you don't need to export something to the header you can declare the function static.

@@ -178,6 +178,50 @@ void ContextDependency::Read (std::istream &is, bool binary) {
to_pdf_ = to_pdf;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put a newline after ( and move things to the left so that you don't go so much above 80 characters.

std::vector<EventAnswerType> self_loop_pdfs; // self-loop pdfs that can be at this pos as this phone.
to_pdf_->MultiMap(vec, &self_loop_pdfs);
SortAndUniq(&self_loop_pdfs);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not use the term 'brute force' for this.. what I meant by brute force was, finding the exact list of pairs by enumerating all triphones (or n-phones, in general). Let's call this 'overgeneration' of pairs.

@danpovey
Copy link
Contributor

@naxingyu, any progress with this? I'm looking forward to finding out how much smaller the graph is (and hopefully there will be a reduction in decoding time as well, and maybe even small WER improvements due to better pruning).

@naxingyu naxingyu closed this Nov 11, 2016
@naxingyu
Copy link
Contributor Author

I'm testing the recursive algorithm and found that there might be a problem with it. The algorithm might not find all possible pdf pairs, e.g. the pairs that only appear at phone sequence borders. This will cause a convert-ali failure. For instance, the alignment of this utterance in the RM training set

ahh05_sr301_trn  [ 2 8 5 18 17 17 17 17 17 ] [ 11392 11391 11391 11391 11400 11399 11399 11399 11420 11419 ] [ 824 823 823 852 851 874 ] [ 10290 10289 10302 10328 10327 ] [ 5422 5462 5461 5461 5461 5461 5461 5461 5510 5509 5509 5509 5509 ] [ 11852 11858 11857 11880 11879 ] [ 3166 3165 3184 3192 3191 ] [ 1634 1640 1639 1746 ] [ 7670 7669 7692 7691 7691 7691 7691 7691 7746 7745 7745 7745 7745 ] [ 4750 4749 4749 4749 4749 4749 4749 4749 4749 4749 4749 4749 4749 4749 4798 4797 4797 4806 4805 4805 ] [ 7390 7389 7438 7466 7465 7465 7465 ] [ 346 345 376 375 382 ] [ 7756 7755 7796 7818 7817 ] [ 11160 11174 11173 11173 11173 11173 11194 11193 11193 11193 ] [ 3742 3741 3741 3782 3781 3781 3810 3809 3809 3809 3809 3809 3809 ] [ 9050 9049 9162 9161 9161 9161 9166 9165 9165 ] [ 6156 6155 6220 6219 6262 6261 ] [ 1870 1908 1907 2006 ] [ 9532 9531 9531 9531 9548 9547 9547 9600 9599 9599 9599 ] [ 8748 8747 8747 8747 8747 8747 8747 8747 8808 8807 8807 8807 8807 8854 8853 8853 ] [ 4748 4747 4747 4747 4747 4786 4785 4810 ] [ 3428 3427 3448 3447 3466 ] [ 5656 5655 5702 5701 5701 5701 5752 5751 ] [ 8130 8129 8134 8133 8138 8137 8137 8137 8137 8137 8137 ] [ 11850 11849 11849 11849 11862 11861 11861 11861 11878 11877 11877 11877 11877 11877 ] [ 2 1 1 1 8 18 ] [ 1396 1395 1395 1395 1395 1400 1399 1399 1399 1399 1399 1399 1399 1496 1495 1495 1495 1495 1495 1495 ] [ 11216 11222 11221 11258 11257 11257 ] [ 3170 3180 3192 3191 ] [ 1606 1644 1643 1643 1748 ] [ 6512 6511 6511 6511 6511 6532 6531 6531 6531 6546 6545 6545 6545 6545 6545 6545 6545 ] [ 9074 9073 9136 9135 9210 9209 9209 9209 9209 ] [ 10982 10981 10981 11016 11015 11038 11037 11037 ] [ 11930 11929 11929 11954 11953 11964 11963 11963 ] [ 4230 4229 4229 4229 4310 4309 4309 4340 4339 4339 4339 4339 ] [ 2 8 18 ] [ 5424 5423 5423 5474 5473 5473 5473 5473 5473 5473 5490 5489 5489 5489 ] [ 7756 7755 7755 7796 7818 7817 7817 ] [ 2468 2467 2486 2485 2520 2519 2519 2519 2519 ] [ 520 519 519 519 519 519 519 519 519 519 519 519 519 519 519 578 577 577 598 597 597 ] [ 9506 9505 9505 9552 9551 9616 ] [ 9422 9458 9457 9457 9457 9457 9496 9495 ] [ 10372 10371 10371 10406 10448 10447 ] [ 2772 2780 2779 2779 2792 ] [ 9054 9164 9163 9163 9163 9163 9163 9163 9163 9194 9193 ] [ 4748 4747 4747 4780 4779 4779 4779 4779 4779 4779 4779 4820 4819 4819 ] [ 10276 10300 10299 10299 10336 10335 10335 10335 10335 10335 10335 ]
ahh05_sr301_trn  sil                         w_B                                                             ah_I                        td_E                              ih_B                                                                 z_E                               dh_B                         ax_E                    n_B                                                                  ey_I                                                                                                    m_E                                    ae_B                    n_E                          v_B                                                             eh_I                                                                 r_I                                              iy_I                              ax_I                    s_E                                                        r_B                                                                                 ey_I                                        dx_I                         ih_I                                        ng_I                                                       z_E                                                                                     sil              ax_B                                                                                                    v_E                                     dh_B                    ax_E                         k_B                                                                                      r_I                                              uw_I                                                z_I                                                 er_E                                                            sil        ih_B                                                                      n_E                                    b_B                                              ae_I                                                                                    s_E                               s_B                                         td_I                                    ch_I                         r_I                                                        ey_I                                                                      td_E

does not end with sil. The final phone window is (76, 159, 0). The pdf pair is (216, 373). It only appear in this biphone context. However, the recursive algorithm only generate the pairs from all triphones, (76, 159, 1) to (96, 159, 194).

@naxingyu naxingyu force-pushed the modify-transition-model branch from 2baaf04 to 7788327 Compare November 21, 2016 03:31
@naxingyu
Copy link
Contributor Author

@danpovey rebase works. ready now.

// length N, with -1 for those positions where phones
// that are currently unknown, treated as wildcards; at least
// the central phone [position P] must be a real phone, i.e.
// not -1.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused now about how you solved the issue about word-boundaries.
The comment says that -1 is used for positions where we have not yet expanded the phone.
However, the actual code seems to be using 0 for the "not-expanded"/wildcard phone.
But for the left and right context when we hit the edge of the file, the phone should be set to zero. Not including zero as an option for positions other than position P_ would cause a problem. But I don't see how this can happen, given that you are (in the code) using 0 for the wildcard.

int32 pdf_class = pdf_class_pairs[phone][j].first,
self_loop_pdf_class = pdf_class_pairs[phone][j].second;
for (size_t win_start = 0; win_start < phone_window.size(); win_start++) {
if (win_start != P_)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong indentation... and I think you should use -1 for the wildcard positions here, so you can use 0 for the context. And then you should probably modify EnumeratePairs() so that when it loops over the phones, it also includes 0 [meaning: hitting the edge of the file] in the context. I'm surprised this hasn't caused a crash... I thought you had been hit by this issue before and fixed it, so now I wonder what the fix was.

@naxingyu
Copy link
Contributor Author

I forgot to update this comment... Yes I chose to use 0 for wildcards so
that pdfs on boundaries can be included by simply Map other than
MultiMap. I'll update this comment.

X.

On 2016/11/21 12:26, Daniel Povey wrote:

@danpovey commented on this pull request.


In src/tree/context-dep.h
#1105 (review):

   const;

private:
int32 N_; //
int32 P_;
EventMap *to_pdf_; // owned here.

  • // 'context' is the context-window of phones, of
  • // length N, with -1 for those positions where phones
  • // that are currently unknown, treated as wildcards; at least
  • // the central phone [position P] must be a real phone, i.e.
  • // not -1.

I'm a little confused now about how you solved the issue about
word-boundaries.
The comment says that -1 is used for positions where we have not yet
expanded the phone.
However, the actual code seems to be using 0 for the
"not-expanded"/wildcard phone.
But for the left and right context when we hit the edge of the file,
the phone should be set to zero. Not including zero as an option for
positions other than position P_ would cause a problem. But I don't
see how this can happen, given that you are (in the code) using 0 for
the wildcard.


You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
#1105 (review),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADKpxEJnGJ-PJVc1doKPk9W8C2NSqzv3ks5rAR2QgaJpZM4KSXb2.

@danpovey
Copy link
Contributor

No, the code is not right... it will lead to a crash for some trees. You
need to make the wildcard phone -1, and include 0 as one of the possible
phones when recurse in line 239.

On Sun, Nov 20, 2016 at 11:35 PM, Xingyu Na notifications@github.com
wrote:

I forgot to update this comment... Yes I chose to use 0 for wildcards so
that pdfs on boundaries can be included by simply Map other than
MultiMap. I'll update this comment.

X.

On 2016/11/21 12:26, Daniel Povey wrote:

@danpovey commented on this pull request.


In src/tree/context-dep.h
<#1105 (review)
:

const;

private:
int32 N_; //
int32 P_;
EventMap *to_pdf_; // owned here.

  • // 'context' is the context-window of phones, of
  • // length N, with -1 for those positions where phones
  • // that are currently unknown, treated as wildcards; at least
  • // the central phone [position P] must be a real phone, i.e.
  • // not -1.

I'm a little confused now about how you solved the issue about
word-boundaries.
The comment says that -1 is used for positions where we have not yet
expanded the phone.
However, the actual code seems to be using 0 for the
"not-expanded"/wildcard phone.
But for the left and right context when we hit the edge of the file,
the phone should be set to zero. Not including zero as an option for
positions other than position P_ would cause a problem. But I don't
see how this can happen, given that you are (in the code) using 0 for
the wildcard.


You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
#1105 (review),

or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADKpxEJnGJ-
PJVc1doKPk9W8C2NSqzv3ks5rAR2QgaJpZM4KSXb2>.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1105 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADJVu5MSs_P54xQkPTzEx2InFT4R69zjks5rAR91gaJpZM4KSXb2
.

std::vector<EventAnswerType> forward_pdfs, self_loop_pdfs;

int32 forward_pdf, self_loop_pdf;
if (Compute(phone_window, forward_pdf_class, &forward_pdf) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see what you meant, with this code here, that's what prevented it from crashing, and it could be right.
But still, I think it would be better if you did it as I said-- that would be clearer, and may prevent unnecessary recursion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and no need to re-test fully. It should lead to an identical transition-model, and you can do a 'diff' after rerunning the stage when you get the transition-model from the tree and the topology.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I remember why it was written in this way. When 0 in included in the event vecter, MultiMap doesn't work correct. I try again and it is the same problem.

@naxingyu
Copy link
Contributor Author

Got it.

On 2016/11/21 12:52, Daniel Povey wrote:

@danpovey commented on this pull request.


In src/tree/context-dep.cc #1105:

@@ -178,9 +178,107 @@ void ContextDependency::Read (std::istream &is, bool binary) {
to_pdf_ = to_pdf;
}

-void ContextDependency::GetPdfInfo(const std::vector &phones,

  •                               const std::vector<int32> &num_pdf_classes,  // indexed by phone,
    
  •                               std::vector<std::vector<std::pair<int32, int32> > > *pdf_info) const {
    
    +void ContextDependency::EnumeratePairs(
  • const std::vector &phones,
  • int32 self_loop_pdf_class, int32 forward_pdf_class,
  • const std::vector &phone_window,
  • unordered_set<std::pair<int32, int32>, PairHasher > *pairs) const {
  • std::vector new_phone_window(phone_window);
  • EventType vec;
  • std::vector forward_pdfs, self_loop_pdfs;
  • int32 forward_pdf, self_loop_pdf;
  • if (Compute(phone_window, forward_pdf_class, &forward_pdf) &&

... and no need to re-test fully. It should lead to an identical
transition-model, and you can do a 'diff' after rerunning the stage
when you get the transition-model from the tree and the topology.


You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
#1105, or mute the thread
https://github.com/notifications/unsubscribe-auth/ADKpxG5Y2StSBEIYo7Jle4qobf7EBsO3ks5rASOUgaJpZM4KSXb2.

@danpovey
Copy link
Contributor

Also, after that, as one final check before I merge, can you please run the RM setup but removing the left-biphone-related options? I want to make sure it doesn't crash for triphone. No need to check anything in related to that-- just run it and make sure there are no crashes.

@danpovey
Copy link
Contributor

What happens specifically?

On Mon, Nov 21, 2016 at 12:42 AM, Xingyu Na notifications@github.com
wrote:

@naxingyu commented on this pull request.

In src/tree/context-dep.cc #1105:

@@ -178,9 +178,107 @@ void ContextDependency::Read (std::istream &is, bool binary) {
to_pdf_ = to_pdf;
}

-void ContextDependency::GetPdfInfo(const std::vector &phones,

  •                               const std::vector<int32> &num_pdf_classes,  // indexed by phone,
    
  •                               std::vector<std::vector<std::pair<int32, int32> > > *pdf_info) const {
    
    +void ContextDependency::EnumeratePairs(
  • const std::vector &phones,
  • int32 self_loop_pdf_class, int32 forward_pdf_class,
  • const std::vector &phone_window,
  • unordered_set<std::pair<int32, int32>, PairHasher > *pairs) const {
  • std::vector new_phone_window(phone_window);
  • EventType vec;
  • std::vector forward_pdfs, self_loop_pdfs;
  • int32 forward_pdf, self_loop_pdf;
  • if (Compute(phone_window, forward_pdf_class, &forward_pdf) &&

Oh I remember why it was written in this way. When 0 in included in the
event vecter, MultiMap doesn't work correct. I try again and it is the same
problem.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1105, or mute the thread
https://github.com/notifications/unsubscribe-auth/ADJVuw1hD1lUXLbi5cJUvkJInNVrlogbks5rAS9BgaJpZM4KSXb2
.

@danpovey
Copy link
Contributor

... BTW, you shouldn't include 0 for the P'th position, only for the other
positions.

On Mon, Nov 21, 2016 at 12:49 AM, Daniel Povey dpovey@gmail.com wrote:

What happens specifically?

On Mon, Nov 21, 2016 at 12:42 AM, Xingyu Na notifications@github.com
wrote:

@naxingyu commented on this pull request.

In src/tree/context-dep.cc #1105
:

@@ -178,9 +178,107 @@ void ContextDependency::Read (std::istream &is, bool binary) {
to_pdf_ = to_pdf;
}

-void ContextDependency::GetPdfInfo(const std::vector &phones,

  •                               const std::vector<int32> &num_pdf_classes,  // indexed by phone,
    
  •                               std::vector<std::vector<std::pair<int32, int32> > > *pdf_info) const {
    
    +void ContextDependency::EnumeratePairs(
  • const std::vector &phones,
  • int32 self_loop_pdf_class, int32 forward_pdf_class,
  • const std::vector &phone_window,
  • unordered_set<std::pair<int32, int32>, PairHasher > *pairs) const {
  • std::vector new_phone_window(phone_window);
  • EventType vec;
  • std::vector forward_pdfs, self_loop_pdfs;
  • int32 forward_pdf, self_loop_pdf;
  • if (Compute(phone_window, forward_pdf_class, &forward_pdf) &&

Oh I remember why it was written in this way. When 0 in included in the
event vecter, MultiMap doesn't work correct. I try again and it is the same
problem.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1105, or mute the thread
https://github.com/notifications/unsubscribe-auth/ADJVuw1hD1lUXLbi5cJUvkJInNVrlogbks5rAS9BgaJpZM4KSXb2
.

@naxingyu
Copy link
Contributor Author

naxingyu commented Nov 21, 2016

No I didn't include 0 for the P'th position. When I use a event vector
such as
(-1,0),(0,76),(1,159),(2,0) to hit the edge, it return a vector of pdfs
that looks like
(-1683789, 1, 79, 2589367, 167, ....)
But when I do it using (-1,0),(0,76),(1,159), it returns
(1, 79, 167, ...) correctly.

On 2016/11/21 13:51, Daniel Povey wrote:

... BTW, you shouldn't include 0 for the P'th position, only for the other
positions.

On Mon, Nov 21, 2016 at 12:49 AM, Daniel Povey dpovey@gmail.com wrote:

What happens specifically?

On Mon, Nov 21, 2016 at 12:42 AM, Xingyu Na notifications@github.com
wrote:

@naxingyu commented on this pull request.

In src/tree/context-dep.cc
#1105
:

@@ -178,9 +178,107 @@ void ContextDependency::Read (std::istream
&is, bool binary) {
to_pdf_ = to_pdf;
}

-void ContextDependency::GetPdfInfo(const std::vector &phones,

  • const std::vector &num_pdf_classes, // indexed by phone,
  • std::vector<std::vector<std::pair<int32, int32> > > *pdf_info)
    const {
    +void ContextDependency::EnumeratePairs(
  • const std::vector &phones,
  • int32 self_loop_pdf_class, int32 forward_pdf_class,
  • const std::vector &phone_window,
  • unordered_set<std::pair<int32, int32>, PairHasher >
    *pairs) const {
  • std::vector new_phone_window(phone_window);
  • EventType vec;
  • std::vector forward_pdfs, self_loop_pdfs;
  • int32 forward_pdf, self_loop_pdf;
  • if (Compute(phone_window, forward_pdf_class, &forward_pdf) &&

Oh I remember why it was written in this way. When 0 in included in the
event vecter, MultiMap doesn't work correct. I try again and it is
the same
problem.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1105, or mute the thread

https://github.com/notifications/unsubscribe-auth/ADJVuw1hD1lUXLbi5cJUvkJInNVrlogbks5rAS9BgaJpZM4KSXb2

.


You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
#1105 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADKpxIDldWY3xI6AnG7ufHh1KTaoVjJ5ks5rATFMgaJpZM4KSXb2.

@naxingyu
Copy link
Contributor Author

The RM setup is triphone. I've tested left-biphone system on swbd, as
shown in the table above.

X.

On 2016/11/21 13:27, Daniel Povey wrote:

Also, after that, as one final check before I merge, can you please
run the RM setup but removing the left-biphone-related options? I want
to make sure it doesn't crash for triphone. No need to check anything
in related to that-- just run it and make sure there are no crashes.


You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
#1105 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADKpxODw-aqC3egRPVAuRs7k3UWEX-yZks5rASvWgaJpZM4KSXb2.

@naxingyu naxingyu force-pushed the modify-transition-model branch from 365fea3 to 620e516 Compare November 21, 2016 07:46
@danpovey
Copy link
Contributor

Merging. Thanks!!

@danpovey danpovey merged commit 45d53f1 into kaldi-asr:master Nov 21, 2016
@naxingyu naxingyu deleted the modify-transition-model branch November 23, 2016 01:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants