Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feat] Remove redundancy in storage and computation caused by NeighborLoader and new model implementation in PyG. #38

Closed
baoleai opened this issue May 24, 2023 · 0 comments · Fixed by #43
Assignees
Labels
feature New feature or request PyG

Comments

@baoleai
Copy link
Collaborator

baoleai commented May 24, 2023

🚀 The feature, motivation and pitch

We found that the NeighborLoader version of the GraphSAGE model in PyG takes a subgraph composed of multiple hops of sampling as the input and updates it at each layer. This leads to a lot of redundancy in storage and computation, see pyg-team/pytorch_geometric#3799. And this part becomes a bottleneck when using GLT's GPU sampling.

We should remove this redundancy to improve the performance of e2e training.

Alternatives

After 2.3 version, PyG has support Hierarchical Neighborhood Sampling to extend classical Neighborhood Sampling by collecting additional information about number of sampled nodes and edges per each hop, and add num_sampled_nodes_per_hop and num_sampled_edges_per_hop in basic gnn model to trim the layer data.
This will reduce the redundancy in storage and computation, see pyg-team/pytorch_geometric#7331.

We can also support Hierarchical Neighborhood Sampling in GLT's NeighborSampler and DistNeighborSampler and modify examples to avoid performance losses caused by redundant computation.

Additional context

@baoleai baoleai added feature New feature or request PyG labels May 24, 2023
@baoleai baoleai changed the title Support Hierarchical Neighborhood sampling since PyG2.3 [Feat] Support Hierarchical Neighborhood sampling since PyG2.3 May 24, 2023
@baoleai baoleai changed the title [Feat] Support Hierarchical Neighborhood sampling since PyG2.3 [Feat] Remove redundancy in storage and computation caused by NeighborLoader and new model implementation in PyG. May 24, 2023
@husimplicity husimplicity self-assigned this May 24, 2023
@LiSu LiSu closed this as completed Aug 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request PyG
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants