Skip to content

Project implementing GraphSage for link prediction and node classification. This project reproduces results from a comparative study, exploring embedding methods and evaluating performance on various datasets.

Notifications You must be signed in to change notification settings

rosameliacarioni/graphsage

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project for course DD2434: Advanced Machine Learning at KTH Royale Institute of Technology

Description

This project implements GraphSage as described in the following papers:

  • "A Comparative Study for Unsupervised Network Representation Learning" by Khosla, Setty, and Anand.
  • "Inductive Representation Learning on Large Graphs" by Hamilton, Ying, and Leskovec.

Report

The report can be found here

Installing steps for requirements on a MACOS 14.1 - 14.2

These steps are based on the solution provided in this issue.

Creating a Conda Environment

  1. Create a new environment: conda create -n graph_sage_env python=3.9
  2. Activate the environment: conda activate graph_sage_env

Installing Packages

  1. conda install -y clang_osx-arm64 clangxx_osx-arm64 gfortran_osx-arm64
  2. python -m pip --no-cache-dir install torch torchvision torchaudio
  3. Verify Torch installation: python -c "import torch; print(torch.__version__)"
  4. Install torch-scatter: python -m pip --no-cache-dir install torch-scatter -f https://data.pyg.org/whl/torch-1.11.0+${cpu}.html
  5. Install torch-sparse: python -m pip --no-cache-dir install torch-sparse -f https://data.pyg.org/whl/torch-1.11.0+${cpu}.html
  6. Install torch-geometric: python -m pip --no-cache-dir install torch-geometric
  7. Install other requirements: pip install -r requirements.txt

Additional Configuration

To ensure compatibility with macOS's GPU limitations, set the environment variable to fall back to CPU when GPU methods are not implemented. Also, ensure the .env file is added to your project root.

  • Execute in the command line: export PYTORCH_ENABLE_MPS_FALLBACK=1

Note: These instructions are tailored for macOS 14.1 - 14.2. Adjustments might be needed for other versions or operating systems.

File Structure Overview

This project's file structure is organized to facilitate understanding and interaction with the various components involved in the machine learning process. Here's the breakdown:

Python Scripts

The core methods used throughout the project are encapsulated within .py files, each serving a specific purpose:

  • read_data.py: Handles the information retrieval of files from the Arizona State University data repository in order to create graphs.

  • graph_information.py: This script is a utility for graph analytics. It visualizes general information about a graph and it's loader (used to sample its neighbors), providing insights into the structure and composition of your networks.

  • test_embeddings.py: Central to evaluating the performance of the models, this file contains functions for node classification and edge prediction, allowing for the assessment of the embeddings generated.

  • graphsage_calculate_embeddings.py: This contains the model used to learn and derive the embedding matrix from the datasets. It offers flexibility by allowing the use of a local model (as applied in this study) or the inbuilt GraphSage from torch_geometric.nn.

Jupyter Notebooks

For a more interactive and exploratory approach, .ipynb notebooks are used, particularly for experimenting with the datasets:

  • Dataset Notebooks: Each of the datasets employed in this study has an associated notebook. These notebooks are where the data manipulation, experimentation, and initial analysis occur.

Files under not_used folder

These files can be ignored. They were used in order to obtain my current code.

About

Project implementing GraphSage for link prediction and node classification. This project reproduces results from a comparative study, exploring embedding methods and evaluating performance on various datasets.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published