Skip to content

Conversation

@KarelVesely84
Copy link
Contributor

  • concatenates all vector-fsts with the same 'key' from the sorted
    'rspecifier' in the order as they appear in I/O,
  • handy for concatenating HCLG graphs: 'phn-loop . some_word . phn-loop',

@danpovey
Copy link
Contributor

I don't think I want to merge this with its current design.
The reason is that I'm not crazy about the way it concatenates all FSTs with the same key in a single input.
Typically if you want to concatenate things, you'd have several pipelines producing different types of thing that you'd then want to concatenate. The most natural usage would be something like

fsts-concat 'ark:command1|' 'ark:command2|' 'ark:command3|' ... 'ark:-' | ...

in my opinion, which would require different code.
But if you don't have the time to change it, it's OK; we can just wait till it's needed for something that needs to be checked in.

@KarelVesely84
Copy link
Contributor Author

KarelVesely84 commented Jul 23, 2018

Well, I was thinking of what you just proposed, but then I decided to use the original API that was used for 'fsts-union' (I understand the difference between union and concatenation, union is 'order invariant', concatenation isn't)...

And then I realized it might be less flexible than the "current" solution in PR. Imagine a situation in which one would like to concatenate different number of FSTs for individual keys... This is easily doable with the scps, but not with the arks. The "current" version in this PR is easy to use by preparing 'scp' file from any script language. Typically, some fsts will "change" on per-key basis and some will be "fixed".

To check this in, is it absolutely necessary to have some calling script?
(I am developing this for a company, and I am not sure how this intellectual property things work...)
K.

@danpovey
Copy link
Contributor

danpovey commented Jul 23, 2018 via email

@KarelVesely84
Copy link
Contributor Author

Okay, in this case, I will change the interface, and I'll send an update when it's done... Thank you, Karel.

@KarelVesely84
Copy link
Contributor Author

KarelVesely84 commented Jul 24, 2018

The API was changed as Dan suggested. (and the commit is rebased on top of the 'main master'...)

Copy link
Contributor

@danpovey danpovey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks. some very cosmetic comments.

bool skip_key = false;
for (int32 i=0; i<fst_readers.size(); i++) {
if (!fst_readers[i]->HasKey(key)) {
KALDI_WARN << "Skippng '" << key << "'"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo skipping

for (int32 i=0; i<fst_readers.size(); i++) {
if (!fst_readers[i]->HasKey(key)) {
KALDI_WARN << "Skippng '" << key << "'"
<< " due to missing the fst in " << i+2 << ". <rspecifier> : "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add parens around i+2, just to avoid doubt about precedence; and maybe replace . with 'th.


// check that the key exists in all 'fst_readers',
bool skip_key = false;
for (int32 i=0; i<fst_readers.size(); i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some compilers warn about signed/unsigned comparisons... I think it would be helpful to have a variable
int32 fst_readers_size = (int32)fst_readers.size();

n_done++;
}

// cleanup,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please start comments with capitals and end with period, not comma. It's google style guide... sorry to be picky.

std::vector<RandomAccessTableReader<VectorFstHolder>*> fst_readers;
TableWriter<VectorFstHolder> fst_writer(fsts_wspecifier);

for (int32 i=2; i<po.NumArgs(); i++)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add space around operators = and < (also later on in this file).

- concatenates all vector-fsts with the same 'key' from the sorted
  'rspecifier' in the order as they appear in I/O,
- handy for concatenating HCLG graphs: 'phn-loop . some_word . phn-loop',
@KarelVesely84
Copy link
Contributor Author

okay, everything is incorporated... (and rebased...)

@danpovey danpovey merged commit d4d968c into kaldi-asr:master Jul 26, 2018
dpriver pushed a commit to dpriver/kaldi that referenced this pull request Sep 13, 2018
Skaiste pushed a commit to Skaiste/idlak that referenced this pull request Sep 26, 2018
@KarelVesely84 KarelVesely84 deleted the fsts_concat branch January 16, 2019 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants