Skip to content

fixing arpa2fst to allow traling whitespace in the headers#1191

Merged
danpovey merged 3 commits intokaldi-asr:masterfrom
jtrmal:arpa2fst_fix
Nov 16, 2016
Merged

fixing arpa2fst to allow traling whitespace in the headers#1191
danpovey merged 3 commits intokaldi-asr:masterfrom
jtrmal:arpa2fst_fix

Conversation

@jtrmal
Copy link
Contributor

@jtrmal jtrmal commented Nov 14, 2016

Adresses #1189

ArpaFileParser::~ArpaFileParser() {
}

std::string rtrim(const std::string &str) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please rename this to TrimTrailingWhitespace?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... actually, why not make it take a pointer to string and return void, and just use resize to remove the trailing whitespace?

int32 ngram_count = 0;
while (++line_number_, getline(is, current_line_) && !is.eof()) {
if (current_line_.empty()) continue;
current_line_ = rtrim(current_line_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding this line will incur a big performance penalty and I don't think it's necessary to solve the current issue. The ARPA file format is not well defined, but I don't think a reasonable person would put leading space before the \1-grams: marker.

while (++line_number_, getline(is, current_line_) && !is.eof()) {
if (current_line_.empty()) continue;
current_line_ = rtrim(current_line_);
if (current_line_[0] == '\\') break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you are looking at this... this code doesn't seem to be checking that the n-gram orders are as expected, it's just checking that the first character is backslash. That's not very nice, and would lead to unexpected behavior if someone, say, skipped an ngram order.


std::string rtrim(const std::string &str) {
std::string ret = str;
ret.erase(ret.find_last_not_of(" \n\r\t")+1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space around '+'

@jtrmal
Copy link
Contributor Author

jtrmal commented Nov 14, 2016

I thought about it just a minute ago. I could also iterate from
backwards, so it would be almost no-op for proper strings.
Y

On Nov 14, 2016 18:01, "Daniel Povey" notifications@github.com wrote:

@danpovey commented on this pull request.

In src/lm/arpa-file-parser.cc
#1191:

@@ -38,6 +38,12 @@ ArpaFileParser::ArpaFileParser(ArpaParseOptions options,
ArpaFileParser::~ArpaFileParser() {
}

+std::string rtrim(const std::string &str) {

... actually, why not make it take a pointer to string and return void,
and just use resize to remove the trailing whitespace?


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#1191, or mute the thread
https://github.com/notifications/unsubscribe-auth/AKisX0L9G5UOYpYKq67e8fs4rXRD7v66ks5q-OhDgaJpZM4KxuPE
.

@jtrmal
Copy link
Contributor Author

jtrmal commented Nov 15, 2016

updated. how about this?

@danpovey
Copy link
Contributor

Looks good-- you can merge once tests finish.

@danpovey danpovey merged commit 91233b8 into kaldi-asr:master Nov 16, 2016
@jtrmal jtrmal deleted the arpa2fst_fix branch February 10, 2017 03:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants