You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We found that the NeighborLoader version of the GraphSAGE model in PyG takes a subgraph composed of multiple hops of sampling as the input and updates it at each layer. This leads to a lot of redundancy in storage and computation, see pyg-team/pytorch_geometric#3799. And this part becomes a bottleneck when using GLT's GPU sampling.
We should remove this redundancy to improve the performance of e2e training.
Alternatives
After 2.3 version, PyG has support Hierarchical Neighborhood Sampling to extend classical Neighborhood Sampling by collecting additional information about number of sampled nodes and edges per each hop, and add num_sampled_nodes_per_hop and num_sampled_edges_per_hop in basic gnn model to trim the layer data.
This will reduce the redundancy in storage and computation, see pyg-team/pytorch_geometric#7331.
We can also support Hierarchical Neighborhood Sampling in GLT's NeighborSampler and DistNeighborSampler and modify examples to avoid performance losses caused by redundant computation.
Additional context
The text was updated successfully, but these errors were encountered:
baoleai
changed the title
Support Hierarchical Neighborhood sampling since PyG2.3
[Feat] Support Hierarchical Neighborhood sampling since PyG2.3
May 24, 2023
baoleai
changed the title
[Feat] Support Hierarchical Neighborhood sampling since PyG2.3
[Feat] Remove redundancy in storage and computation caused by NeighborLoader and new model implementation in PyG.
May 24, 2023
🚀 The feature, motivation and pitch
We found that the NeighborLoader version of the GraphSAGE model in PyG takes a subgraph composed of multiple hops of sampling as the input and updates it at each layer. This leads to a lot of redundancy in storage and computation, see pyg-team/pytorch_geometric#3799. And this part becomes a bottleneck when using GLT's GPU sampling.
We should remove this redundancy to improve the performance of e2e training.
Alternatives
After 2.3 version, PyG has support Hierarchical Neighborhood Sampling to extend classical Neighborhood Sampling by collecting additional information about number of sampled nodes and edges per each hop, and add
num_sampled_nodes_per_hop
andnum_sampled_edges_per_hop
in basic gnn model to trim the layer data.This will reduce the redundancy in storage and computation, see pyg-team/pytorch_geometric#7331.
We can also support Hierarchical Neighborhood Sampling in GLT's
NeighborSampler
andDistNeighborSampler
and modify examples to avoid performance losses caused by redundant computation.Additional context
The text was updated successfully, but these errors were encountered: