Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug report on MaskedOnlyTransformerEncoder #9

Open
JizeZhangCS opened this issue Mar 2, 2023 · 0 comments
Open

Bug report on MaskedOnlyTransformerEncoder #9

JizeZhangCS opened this issue Mar 2, 2023 · 0 comments

Comments

@JizeZhangCS
Copy link

JizeZhangCS commented Mar 2, 2023

Trigger method:

  • Run python main.py --configs ./configs/NCI1/gnn-transformer/no-virtual/gd=128+gdp=0.1+tdp=0.1+l=3+cosine.yml (i.e. NCI1, small GCN)
  • change the num_encoder_layers: 3 into 0 in this file
  • add a new line: num_encoder_layers_masked: 3

which mean I'd use MaskedOnlyTransformerEncoder instead of the default encoder using torch.nn transformer modules.

Traceback

Traceback (most recent call last):
File "/storage_fast/jzzhang/graphtrans/main.py", line 280, in
main()
File "/storage_fast/jzzhang/graphtrans/main.py", line 271, in main
best_val, final_test = run(run_id)
File "/storage_fast/jzzhang/graphtrans/main.py", line 216, in run
loss = train(model, device, train_loader, optimizer, args, calc_loss, scheduler if args.scheduler != "plateau" else None)
File "/storage_fast/jzzhang/graphtrans/trainers/base_trainer.py", line 29, in train
pred_list = model(batch)
File "/storage/jzzhang/miniconda3/envs/general/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/storage_fast/jzzhang/graphtrans/models/gnn_transformer.py", line 107, in forward
padded_adj_list[idx, 0:N, 0:N] = torch.from_numpy(adj_list_item)
RuntimeError: The expanded size of the tensor (78) must match the existing size (91) at non-singleton dimension 1. Target sizes: [78, 78]. Tensor sizes: [91, 91]

Phenomenon

when running at the breakpoint I set at line 104 in file ./models/gnn_transformer.py, i.e.:
padded_adj_list = torch.zeros((len(adj_list), max_num_nodes, max_num_nodes), device=h_node.device)
Some of the variables in my debug console are:

batched_data.batch.eq(0).sum(): tensor(75, device='cuda:0')

adj_list[0].shape: (21, 21)

Analysis

Note that as I'm new to GNN and not familiar with PyG, the following words might base on wrong assumptions.

batched_data.batch.eq(0).sum() is used to represent the node num of the 0th graph in the batch, the shape indicates there are 75 nodes;
adj_list[0] is the adj list for the edges for the same graph, so its shape indicates there are 21 nodes;

So why is the node number different in terms of edges and nodes?

Also, in the traceback, 78 denotes the maximum batched_data.batch.eq(i).sum() for the ith graph in the batch, while 91 denotes the size of the adj list.

As batched_data.adj_list has never been used anywhere except the definition and the usage of MaskedOnlyTransformerEncoder, I suspect there's probably something wrong in the definition in adj_list.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant