Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Small vocab_size raises division by zero in DocEmbSim #32

Open
remidomingues opened this issue Mar 21, 2019 · 0 comments
Open

Small vocab_size raises division by zero in DocEmbSim #32

remidomingues opened this issue Mar 21, 2019 · 0 comments

Comments

@remidomingues
Copy link

remidomingues commented Mar 21, 2019

Feeding the following real training dataset to a SeqGAN works perfectly:

X = np.random.randint(0, 20, (80, 20))

However, the following dataset with the same dimensionality but 6 symbols instead of 20 raises an error.

X = np.random.randint(0, 6, (80, 20))

In both cases, we used vocab_size = #unique symbols + 1, as suggested in text_process.text_precess(). Here is the corresponding traceback:

Traceback (most recent call last):
  File "texygen/texygen.py", line 85, in train
    gan_func(X)
  File "texygen/models/seqgan/Seqgan.py", line 331, in train_real
    self.evaluate()
  File "texygen/models/seqgan/Seqgan.py", line 80, in evaluate
    scores = super().evaluate()
  File "texygen/models/Gan.py", line 55, in evaluate
    score = metric.get_score()
  File "texygen/utils/metrics/DocEmbSim.py", line 33, in get_score
    return self.get_dis_corr()
  File "texygen/utils/metrics/DocEmbSim.py", line 164, in get_dis_corr
    return np.log10(corr / len(self.oracle_sim))
ZeroDivisionError: division by zero
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant