-
Notifications
You must be signed in to change notification settings - Fork 0
joestubbs/astria-neo4j
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
In this project, we explore the use of Neo4j for the purposes of a potential collaboration with the ATRIAGraph project. Initial Commands ---------------------- 1) Start the neo4j container: docker run -it --rm -p 7474:7474 -p 7687:7687 -v $(pwd)/data:/data -e NEO4J_AUTH=neo4j/tapis4ever --name=neo neo4j:5.22.0 2) Monitor cpu and memory usage: docker stats neo 3) Generate test csv files to import into the db (this was already done but can be run again as needed) python generate_test_data.py 3b) Copy new files into the container directory sudo cp test_data_100k.csv data/ 4) Load csv file into Neo4j -- NOTE: neo4 container should NOT be running. The following runs a new container just to do the import: docker run -it --rm -p 7474:7474 -p 7687:7687 -v $(pwd)/data:/data -v $(pwd)/import:/import -e NEO4J_AUTH=neo4j/tapis4ever neo4j:5.22.0 neo4j-admin database import full neo4j --nodes=/import/test_data_100k.csv --overwrite-destination # 100K example: Imported: 100000 nodes 0 relationships 300000 properties Peak memory usage: 518.0MiB # 500k example: IMPORT DONE in 2s 597ms. Imported: 500000 nodes 0 relationships 1500000 properties Peak memory usage: 519.5MiB # 10M example: IMPORT DONE in 11s 580ms. Imported: 10000000 nodes 0 relationships 30000000 properties Peak memory usage: 632.1MiB 5) do some queries: docker exec -it neo # (inside container) # return total number of nodes cypher-shell --database=neo4j -u neo4j -p tapis4ever "MATCH (n) RETURN count(n) as nodes" # return all distinct properties across all nodes cypher-shell --database=neo4j -u neo4j -p tapis4ever "MATCH (n) RETURN distinct keys(n)" # property_1 is unique for each node cypher-shell --database=neo4j -u neo4j -p tapis4ever "MATCH (n) WHERE n.property_1 = 'prp_1' RETURN count(n) as nodes" # property 2 is random so there should be a number of nodes returned for any given value cypher-shell --database=neo4j -u neo4j -p tapis4ever "MATCH (n) WHERE n.property_2 = 10 RETURN count(n) as nodes" Memory Usage ============== Usage started out a little under 500MB after server start up, even with the 10M data set. Queries did increase the usage temporarily though. For example, * The simple count of all nodes increased usage to a peak of around ~850MB; usage went back down to around 550MB * The return distinct keys query and property_1 query increased usage to a peak of ~1.62GB; usage went down to around 1.41GB * The property_2 query increased usage to a peak of almost 1.7GB; usage went back down to around 1.45GB
About
Explore the use of Neo4j for potential applications to ATRIAGraph.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published