[Feat] Remove redundancy in storage and computation caused by NeighborLoader and new model implementation in PyG. #38

baoleai · 2023-05-24T09:21:10Z

🚀 The feature, motivation and pitch

We found that the NeighborLoader version of the GraphSAGE model in PyG takes a subgraph composed of multiple hops of sampling as the input and updates it at each layer. This leads to a lot of redundancy in storage and computation, see pyg-team/pytorch_geometric#3799. And this part becomes a bottleneck when using GLT's GPU sampling.

We should remove this redundancy to improve the performance of e2e training.

Alternatives

After 2.3 version, PyG has support Hierarchical Neighborhood Sampling to extend classical Neighborhood Sampling by collecting additional information about number of sampled nodes and edges per each hop, and add num_sampled_nodes_per_hop and num_sampled_edges_per_hop in basic gnn model to trim the layer data.
This will reduce the redundancy in storage and computation, see pyg-team/pytorch_geometric#7331.

We can also support Hierarchical Neighborhood Sampling in GLT's NeighborSampler and DistNeighborSampler and modify examples to avoid performance losses caused by redundant computation.

Additional context

The text was updated successfully, but these errors were encountered:

baoleai added feature New feature or request PyG labels May 24, 2023

baoleai changed the title ~~Support Hierarchical Neighborhood sampling since PyG2.3~~ [Feat] Support Hierarchical Neighborhood sampling since PyG2.3 May 24, 2023

baoleai changed the title ~~[Feat] Support Hierarchical Neighborhood sampling since PyG2.3~~ [Feat] Remove redundancy in storage and computation caused by NeighborLoader and new model implementation in PyG. May 24, 2023

husimplicity self-assigned this May 24, 2023

husimplicity mentioned this issue May 31, 2023

Add trim_to_layer support & relevant examples #43

Merged

LiSu closed this as completed Aug 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feat] Remove redundancy in storage and computation caused by NeighborLoader and new model implementation in PyG. #38

[Feat] Remove redundancy in storage and computation caused by NeighborLoader and new model implementation in PyG. #38

baoleai commented May 24, 2023 •

edited

Loading

[Feat] Remove redundancy in storage and computation caused by NeighborLoader and new model implementation in PyG. #38

[Feat] Remove redundancy in storage and computation caused by NeighborLoader and new model implementation in PyG. #38

Comments

baoleai commented May 24, 2023 • edited Loading

🚀 The feature, motivation and pitch

Alternatives

Additional context

baoleai commented May 24, 2023 •

edited

Loading