Project for course DD2434: Advanced Machine Learning at KTH Royale Institute of Technology

Description

This project implements GraphSage as described in the following papers:

"A Comparative Study for Unsupervised Network Representation Learning" by Khosla, Setty, and Anand.
"Inductive Representation Learning on Large Graphs" by Hamilton, Ying, and Leskovec.

Report

The report can be found here

Installing steps for requirements on a MACOS 14.1 - 14.2

These steps are based on the solution provided in this issue.

Creating a Conda Environment

Create a new environment: conda create -n graph_sage_env python=3.9
Activate the environment: conda activate graph_sage_env

Installing Packages

conda install -y clang_osx-arm64 clangxx_osx-arm64 gfortran_osx-arm64
python -m pip --no-cache-dir install torch torchvision torchaudio
Verify Torch installation: python -c "import torch; print(torch.__version__)"
Install torch-scatter: python -m pip --no-cache-dir install torch-scatter -f https://data.pyg.org/whl/torch-1.11.0+${cpu}.html
Install torch-sparse: python -m pip --no-cache-dir install torch-sparse -f https://data.pyg.org/whl/torch-1.11.0+${cpu}.html
Install torch-geometric: python -m pip --no-cache-dir install torch-geometric
Install other requirements: pip install -r requirements.txt

Additional Configuration

To ensure compatibility with macOS's GPU limitations, set the environment variable to fall back to CPU when GPU methods are not implemented. Also, ensure the .env file is added to your project root.

Execute in the command line: export PYTORCH_ENABLE_MPS_FALLBACK=1

Note: These instructions are tailored for macOS 14.1 - 14.2. Adjustments might be needed for other versions or operating systems.

File Structure Overview

This project's file structure is organized to facilitate understanding and interaction with the various components involved in the machine learning process. Here's the breakdown:

Python Scripts

The core methods used throughout the project are encapsulated within .py files, each serving a specific purpose:

read_data.py: Handles the information retrieval of files from the Arizona State University data repository in order to create graphs.
graph_information.py: This script is a utility for graph analytics. It visualizes general information about a graph and it's loader (used to sample its neighbors), providing insights into the structure and composition of your networks.
test_embeddings.py: Central to evaluating the performance of the models, this file contains functions for node classification and edge prediction, allowing for the assessment of the embeddings generated.
graphsage_calculate_embeddings.py: This contains the model used to learn and derive the embedding matrix from the datasets. It offers flexibility by allowing the use of a local model (as applied in this study) or the inbuilt GraphSage from torch_geometric.nn.

Jupyter Notebooks

For a more interactive and exploratory approach, .ipynb notebooks are used, particularly for experimenting with the datasets:

Dataset Notebooks: Each of the datasets employed in this study has an associated notebook. These notebooks are where the data manipulation, experimentation, and initial analysis occur.

Files under `not_used` folder

These files can be ignored. They were used in order to obtain my current code.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.vscode		.vscode
not_used		not_used
.DS_Store		.DS_Store
.env		.env
.gitignore		.gitignore
README.md		README.md
dataset_amazon.ipynb		dataset_amazon.ipynb
dataset_blogcatalog.ipynb		dataset_blogcatalog.ipynb
dataset_cora.ipynb		dataset_cora.ipynb
dataset_cora_small.ipynb		dataset_cora_small.ipynb
dataset_flickr.ipynb		dataset_flickr.ipynb
dataset_pubmed.ipynb		dataset_pubmed.ipynb
dataset_reddit.ipynb		dataset_reddit.ipynb
dataset_youtube.ipynb		dataset_youtube.ipynb
graph_information.py		graph_information.py
graphsage_calculate_embeddings.py		graphsage_calculate_embeddings.py
ppi_dataset.ipynb		ppi_dataset.ipynb
read_data.py		read_data.py
requirements.txt		requirements.txt
test_embeddings.py		test_embeddings.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project for course DD2434: Advanced Machine Learning at KTH Royale Institute of Technology

Description

Report

Installing steps for requirements on a MACOS 14.1 - 14.2

Creating a Conda Environment

Installing Packages

Additional Configuration

File Structure Overview

Python Scripts

Jupyter Notebooks

Files under `not_used` folder

About

Releases

Packages

Languages

rosameliacarioni/graphsage

Folders and files

Latest commit

History

Repository files navigation

Project for course DD2434: Advanced Machine Learning at KTH Royale Institute of Technology

Description

Report

Installing steps for requirements on a MACOS 14.1 - 14.2

Creating a Conda Environment

Installing Packages

Additional Configuration

File Structure Overview

Python Scripts

Jupyter Notebooks

Files under not_used folder

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Files under `not_used` folder

Packages