Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Floating point exception (core dumped) problem #273

Open
wykk00 opened this issue Dec 29, 2023 · 0 comments
Open

Floating point exception (core dumped) problem #273

wykk00 opened this issue Dec 29, 2023 · 0 comments
Labels
bug Something isn't working

Comments

@wykk00
Copy link

wykk00 commented Dec 29, 2023

Description

I face a problem when I try to reproduce the paper code GIANT. I used my own text-atttibuted graph dataset and followed the data processing instruction by GIANT.

It seems really strange that this problem occurred at training level 1, while it can be well at training level 0.
I try to direct this issue, and the only problem I can find is that it may occur at sparse_matmul() function in matcher._predict().

Steps to reproduce

The command is

CUDA_VISIBLE_DEVICES=1 python3 -m pecos.xmc.xtransformer.train -t X.trn.txt -x X.trn.tfidf.npz -y Y.trn.npz -m xrt_models --batch-gen-workers 0

Error message or code output

12/29/2023 13:02:58 - INFO - pecos.xmc.xtransformer.matcher - | [   5/   5][  7150/  7220] | 1373/1444 batches | ms/batch 451.6586 | train_loss 7.300417e-01 | lr 9.695291e-07
12/29/2023 13:03:24 - INFO - pecos.xmc.xtransformer.matcher - | [   5/   5][  7200/  7220] | 1423/1444 batches | ms/batch 451.0563 | train_loss 7.260027e-01 | lr 2.770083e-07
12/29/2023 13:03:24 - INFO - pecos.xmc.xtransformer.matcher - | **** saving model (avg_prec=0) to /tmp/tmpo8wg3j8h at global_step 7200 ****
12/29/2023 13:03:26 - INFO - pecos.xmc.xtransformer.matcher - -----------------------------------------------------------------------------------------
12/29/2023 13:03:36 - INFO - pecos.xmc.xtransformer.matcher - Reload the best checkpoint from /tmp/tmpo8wg3j8h
Floating point exception (core dumped)

Environment

  • Operating system: Ubuntu-22.04.1 (X86)
  • Python version: 3.9.18
  • PECOS version: 1.2.2
  • torch: 1.13.1
  • numpy: 1.26.2
  • scipy: 1.11.4
  • transformers: 4.36.2
@wykk00 wykk00 added the bug Something isn't working label Dec 29, 2023
@wykk00 wykk00 closed this as completed Dec 29, 2023
@wykk00 wykk00 reopened this Dec 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant