-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Example] GCMC with sampling #1296
Conversation
@classicsong The new RGCN example has been merged. You may check it out and adjust this PR accordingly. |
I c. will modify GCMC implementation accordingly. |
@classicsong Hi, I tried your sampling-based GCMC model, yet there was an error: "AttributeError: module 'dgl' has no attribute 'in_subgraph'". I have installed dgl-4.0.2, but the error still exists |
You should use the nightly-build DGL |
Thanks. I reinstalled the nightly-build dgl used for CPU, but it didn't work. Does only the nightly-build dgl for GPU work? |
Did you install the dgl like: pip install --pre dgl |
I installed the dgl via command: pip install --pre dgl, but as you said , my platform is mac. I would reinstall the dgl from source. Thanks for your help |
@classicsong i renstalled dgl from source and it worked. Thanks |
def __init__(self, num_edges, minibatch_size, dataset): | ||
self.num_edges = num_edges | ||
self.minibatch_size = minibatch_size | ||
self.seed = th.randperm(num_edges) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove?
""" | ||
def __init__(self, num_edges, minibatch_size, dataset): | ||
self.num_edges = num_edges | ||
self.minibatch_size = minibatch_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
self.minibatch_size = minibatch_size | ||
self.seed = th.randperm(num_edges) | ||
self.dataset = dataset | ||
self.sample_idx = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
etype=str(possible_rating_value), | ||
vtype=vtype, | ||
card=(dataset.train_enc_graph.number_of_nodes(utype), | ||
dataset.train_enc_graph.number_of_nodes(vtype)))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
card
-> num_nodes
seed_nodes[ntype] = g.nodes[ntype].data[dgl.NID] | ||
|
||
frontiner = dgl.in_subgraph(dataset.train_enc_graph, seed_nodes) | ||
frontiner = dgl.to_block(frontiner, seed_nodes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
frontier
frontiner.srcnodes['user'].data['cj'] = \ | ||
dataset.train_enc_graph.nodes['user'].data['cj'][frontiner.srcnodes['user'].data[dgl.NID]] | ||
frontiner.dstnodes['movie'].data['ci'] = \ | ||
dataset.train_enc_graph.nodes['movie'].data['ci'][frontiner.dstnodes['movie'].data[dgl.NID]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ci
and cj
will be calculated automatically by the GraphConv
module if you plan to follow the new HeteroGraphConv
approach. However, this means these normalizers will be calculated based on each sample rather than from the whole graph. Need some field tests to see the influence on the accuracy.
args.gcn_agg_accum, | ||
agg_act=self._act, | ||
share_user_item_param=args.share_param, | ||
device=dev_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mention it's hard to unify the full graph and mini-batch versions. What is the problem here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For mini-batch, I suppose we always use mix_cpu_gpu implementation that is more relevant to large dataset. We can merge the two code maybe, but I think the code will be more complex to understand. As this is a baseline model, and some users from industry may borrow it, I want to make the multi-gpu/mix_cpu_gpu example code simple.
enc_graph.number_of_nodes(vtype)))) | ||
|
||
g = dgl.hetero_from_relations(subg) | ||
g = dgl.compact_graphs(g) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reuse the sampler code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. For eval, we also can use dataloader.
self.reset_parameters() | ||
|
||
def partial_to(self, device): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not put all parameters into GPU?
Description
This PR contains a sampling-based GCMC using new sample APIs.
#957
Checklist
Please feel free to remove inapplicable items for your PR.
or have been fixed to be compatible with this change
Changes