Bug report on MaskedOnlyTransformerEncoder #9

JizeZhangCS · 2023-03-02T09:13:19Z

Trigger method:

Run python main.py --configs ./configs/NCI1/gnn-transformer/no-virtual/gd=128+gdp=0.1+tdp=0.1+l=3+cosine.yml (i.e. NCI1, small GCN)
change the num_encoder_layers: 3 into 0 in this file
add a new line: num_encoder_layers_masked: 3

which mean I'd use MaskedOnlyTransformerEncoder instead of the default encoder using torch.nn transformer modules.

Traceback

Traceback (most recent call last):
File "/storage_fast/jzzhang/graphtrans/main.py", line 280, in
main()
File "/storage_fast/jzzhang/graphtrans/main.py", line 271, in main
best_val, final_test = run(run_id)
File "/storage_fast/jzzhang/graphtrans/main.py", line 216, in run
loss = train(model, device, train_loader, optimizer, args, calc_loss, scheduler if args.scheduler != "plateau" else None)
File "/storage_fast/jzzhang/graphtrans/trainers/base_trainer.py", line 29, in train
pred_list = model(batch)
File "/storage/jzzhang/miniconda3/envs/general/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/storage_fast/jzzhang/graphtrans/models/gnn_transformer.py", line 107, in forward
padded_adj_list[idx, 0:N, 0:N] = torch.from_numpy(adj_list_item)
RuntimeError: The expanded size of the tensor (78) must match the existing size (91) at non-singleton dimension 1. Target sizes: [78, 78]. Tensor sizes: [91, 91]

Phenomenon

when running at the breakpoint I set at line 104 in file ./models/gnn_transformer.py, i.e.:
padded_adj_list = torch.zeros((len(adj_list), max_num_nodes, max_num_nodes), device=h_node.device)
Some of the variables in my debug console are:

batched_data.batch.eq(0).sum(): tensor(75, device='cuda:0')

adj_list[0].shape: (21, 21)

Analysis

Note that as I'm new to GNN and not familiar with PyG, the following words might base on wrong assumptions.

batched_data.batch.eq(0).sum() is used to represent the node num of the 0th graph in the batch, the shape indicates there are 75 nodes;
adj_list[0] is the adj list for the edges for the same graph, so its shape indicates there are 21 nodes;

So why is the node number different in terms of edges and nodes?

Also, in the traceback, 78 denotes the maximum batched_data.batch.eq(i).sum() for the ith graph in the batch, while 91 denotes the size of the adj list.

As batched_data.adj_list has never been used anywhere except the definition and the usage of MaskedOnlyTransformerEncoder, I suspect there's probably something wrong in the definition in adj_list.py

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug report on MaskedOnlyTransformerEncoder #9

Bug report on MaskedOnlyTransformerEncoder #9

JizeZhangCS commented Mar 2, 2023 •

edited

Loading

Bug report on MaskedOnlyTransformerEncoder #9

Bug report on MaskedOnlyTransformerEncoder #9

Comments

JizeZhangCS commented Mar 2, 2023 • edited Loading

Trigger method:

Traceback

Phenomenon

Analysis

JizeZhangCS commented Mar 2, 2023 •

edited

Loading