Skip to content

Explore the use of Neo4j for potential applications to ATRIAGraph.

Notifications You must be signed in to change notification settings

joestubbs/astria-neo4j

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

In this project, we explore the use of Neo4j for the purposes of a potential collaboration with the ATRIAGraph project.

Initial Commands
----------------------

1) Start the neo4j container:
 docker run -it --rm -p 7474:7474 -p 7687:7687 -v $(pwd)/data:/data -e NEO4J_AUTH=neo4j/tapis4ever --name=neo neo4j:5.22.0

2) Monitor cpu and memory usage:
  docker stats neo

3) Generate test csv files to import into the db (this was already done but can be run again as needed)
  python generate_test_data.py

3b) Copy new files into the container directory 
  sudo cp test_data_100k.csv data/

4) Load csv file into Neo4j -- NOTE: neo4 container should NOT be running. The following runs a new container just to do the import:

 docker run -it --rm -p 7474:7474 -p 7687:7687 -v $(pwd)/data:/data -v $(pwd)/import:/import -e NEO4J_AUTH=neo4j/tapis4ever neo4j:5.22.0 neo4j-admin database import full neo4j --nodes=/import/test_data_100k.csv --overwrite-destination

# 100K example:
 Imported:
  100000 nodes
  0 relationships
  300000 properties
Peak memory usage: 518.0MiB


# 500k example:
IMPORT DONE in 2s 597ms.
Imported:
  500000 nodes
  0 relationships
  1500000 properties
Peak memory usage: 519.5MiB


# 10M example:
IMPORT DONE in 11s 580ms. 
Imported:
  10000000 nodes
  0 relationships
  30000000 properties
Peak memory usage: 632.1MiB



5) do some queries:

docker exec -it neo

# (inside container)

# return total number of nodes 
cypher-shell --database=neo4j -u neo4j -p tapis4ever "MATCH (n) RETURN count(n) as nodes"

# return all distinct properties across all nodes 
cypher-shell --database=neo4j -u neo4j -p tapis4ever "MATCH (n) RETURN distinct keys(n)"

# property_1 is unique for each node 
cypher-shell --database=neo4j -u neo4j -p tapis4ever "MATCH (n) WHERE n.property_1 = 'prp_1' RETURN count(n) as nodes"

# property 2 is random so there should be a number of nodes returned for any given value
cypher-shell --database=neo4j -u neo4j -p tapis4ever "MATCH (n) WHERE n.property_2 = 10 RETURN count(n) as nodes"



Memory Usage
==============

Usage started out a little under 500MB after server start up, even with the 10M data set. 
Queries did increase the usage temporarily though. For example, 
  * The simple count of all nodes increased usage to a peak of around ~850MB; usage went back down to around 550MB
  * The return distinct keys query and property_1 query increased usage to a peak of ~1.62GB; usage went down to around 1.41GB
  * The property_2 query increased usage to a peak of almost 1.7GB; usage went back down to around 1.45GB

About

Explore the use of Neo4j for potential applications to ATRIAGraph.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages