Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need tests to check join-piece and join-bpe output. #511

Open
philip30 opened this issue Aug 10, 2018 · 3 comments
Open

Need tests to check join-piece and join-bpe output. #511

philip30 opened this issue Aug 10, 2018 · 3 comments

Comments

@philip30
Copy link
Contributor

philip30 commented Aug 10, 2018

I investigate this problem and found that recent merge (#477) broke the sentence piece output. This is because of the line number 94 in the xnmt/sent.py

 84   def sent_str(self, custom_output_procs=None, **kwargs) -> str:
 85     """
 86     Return a single string containing the readable version of the sentence.
 87 
 88     Args:
 89       custom_output_procs: if not None, overwrite the sentence's default output processors
 90       **kwargs: should accept arbitrary keyword args
 91 
 92     Returns: readable string
 93     """
 94     out_str = " ".join(self.str_tokens(**kwargs))
 95     pps = self.output_procs
 96     if custom_output_procs is not None:
 97       pps = custom_output_procs
 98     if isinstance(pps, OutputProcessor): pps = [pps]
 99     for pp in pps:
100       out_str = pp.process(out_str)
101       # TODO: change output processor interface accordingly
102     return out_str

This creates a 'space' separated sentence piece output like:
so steve lo pe z , a column of los angeles times , street streets in los angeles , when he hear d a beautiful music .

In the sentence piece, we have to join that by empty string, not space.

How to correctly solve this @msperber, @neubig ?

@philip30
Copy link
Contributor Author

Hmm, I guess we can just add str.replace(" ", "") at the join-piece processor?

@philip30
Copy link
Contributor Author

I guess the same problem will occur on BPE processor.

@philip30 philip30 mentioned this issue Aug 10, 2018
@neubig
Copy link
Contributor

neubig commented Aug 10, 2018

Let me re-open this because we need to:

  • Check if BPE is fixed
  • Add tests to make sure this doesn't happen again

@neubig neubig reopened this Aug 10, 2018
@neubig neubig changed the title SentencePiece output is broken! Need tests to check join-piece and join-bpe output. Aug 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants