fixing arpa2fst to allow traling whitespace in the headers#1191
fixing arpa2fst to allow traling whitespace in the headers#1191danpovey merged 3 commits intokaldi-asr:masterfrom
Conversation
src/lm/arpa-file-parser.cc
Outdated
| ArpaFileParser::~ArpaFileParser() { | ||
| } | ||
|
|
||
| std::string rtrim(const std::string &str) { |
There was a problem hiding this comment.
Can you please rename this to TrimTrailingWhitespace?
There was a problem hiding this comment.
... actually, why not make it take a pointer to string and return void, and just use resize to remove the trailing whitespace?
src/lm/arpa-file-parser.cc
Outdated
| int32 ngram_count = 0; | ||
| while (++line_number_, getline(is, current_line_) && !is.eof()) { | ||
| if (current_line_.empty()) continue; | ||
| current_line_ = rtrim(current_line_); |
There was a problem hiding this comment.
Adding this line will incur a big performance penalty and I don't think it's necessary to solve the current issue. The ARPA file format is not well defined, but I don't think a reasonable person would put leading space before the \1-grams: marker.
src/lm/arpa-file-parser.cc
Outdated
| while (++line_number_, getline(is, current_line_) && !is.eof()) { | ||
| if (current_line_.empty()) continue; | ||
| current_line_ = rtrim(current_line_); | ||
| if (current_line_[0] == '\\') break; |
There was a problem hiding this comment.
Since you are looking at this... this code doesn't seem to be checking that the n-gram orders are as expected, it's just checking that the first character is backslash. That's not very nice, and would lead to unexpected behavior if someone, say, skipped an ngram order.
src/lm/arpa-file-parser.cc
Outdated
|
|
||
| std::string rtrim(const std::string &str) { | ||
| std::string ret = str; | ||
| ret.erase(ret.find_last_not_of(" \n\r\t")+1); |
|
I thought about it just a minute ago. I could also iterate from On Nov 14, 2016 18:01, "Daniel Povey" notifications@github.com wrote:
|
|
updated. how about this? |
|
Looks good-- you can merge once tests finish. |
Adresses #1189