Skip to content
This repository has been archived by the owner on Jun 9, 2020. It is now read-only.

IOException: no space left on device #78

Open
puredevotion opened this issue Jan 17, 2014 · 10 comments
Open

IOException: no space left on device #78

puredevotion opened this issue Jan 17, 2014 · 10 comments

Comments

@puredevotion
Copy link

I've been trying to import nodes on CentOS 6.5/64bit (in vmware), but I'm getting an IOexception. At first I figured that this could be an issue with vmware not able to expand the disk fast enough, or that i really ran out of space.
This doesn;'t seem to be the case, i made a 500GB pre-allocated disk, and my dataset (nodes.csv+rels.csv) are combined 14.5Gb.

Unfortunately, in a seperate incident my mouse died, so I had to type over whatever I could see, but I couldnt scroll my window :(
These are abreviated last lines, the (##) is the linenumber in the file.

storedFieldsWriter.flush(55) 
docFieldProcessopr.flush(59) 
documentsWriter.flush(581) 
apache.lucene.index.indexWriter.doFlush(3587)
apache.lucene.index.indexWriter.flush(3552)
apache.lucene.index.indexWriter.forceMerge(2516)
apache.lucene.index.indexWriter.optimize(2424)
neo4j.index.impl.lucene.LuceneBatchInserterIndex.closeWriter(299)
… 7 more
Supressed: java.io.IOException: no space left on device25 more

If i do df -h i see the following:

Filesystem                        Size  Used Avail Use% Mounted on
/dev/mapper/vg_neo4jraid-lv_root   50G   47G  275M 100% /
tmpfs                             7.8G     0  7.8G   0% /dev/shm
/dev/sda1                         485M   54M  407M  12% /boot
/dev/mapper/vg_neo4jraid-lv_home  435G  199M  413G   1% /home

I'm not really sure what I'm seeing, nor how I can fix it...
messages.log seems empty :( but maybe it's been overwritten...

@jexp
Copy link
Owner

jexp commented Jan 17, 2014

Perhaps it is a wrong error message and instead related to open file handles?

Can you put a watch on df -h while the import runs and see how it looks in the meantime?

@puredevotion
Copy link
Author

sure, will take some time, when this was running it was 8+ hrs through
oh-- perhaps unclear, but neo4jraid is the name of my vm...

I'm starting the process as follows:

rm -rf autopredict.db
./import.sh autopredict.db

my batch.properties:

dump_configuration=false
cache_type=none
use_memory_mapped_buffers=true
# 9 bytes per node
neostore.nodestore.db.mapped_memory=6G
# 33 bytes per relationships
neostore.relationshipstore.db.mapped_memory=6G
# 38 bytes per property
neostore.propertystore.db.mapped_memory=200M
neostore.propertystore.db.index.keys.mapped_memory=50M
neostore.propertystore.db.index.mapped_memory=50M
neostore.propertystore.db.strings.mapped_memory=200M
batch_array_separator=,
batch_import.node_index.words=exact
batch_import.nodes_files=a_nodes.csv,ba_nodes.csv,de_nodes.csv,ev_nodes.csv,he_$
batch_import.rels_files=a_rels.csv,ba_rels.csv,de_rels.csv,ev_rels.csv,he_rels.$

df -h At start:

Filesystem                        Size  Used Avail Use% Mounted on
/dev/mapper/vg_neo4jraid-lv_root   50G   15G   33G  31% /
tmpfs                             7.8G     0  7.8G   0% /dev/shm
/dev/sda1                         485M   54M  407M  12% /boot
/dev/mapper/vg_neo4jraid-lv_home  435G  199M  413G   1% /home

@jexp
Copy link
Owner

jexp commented Jan 17, 2014

How many nodes, rels and indexed properties do you have in your data?

@jexp
Copy link
Owner

jexp commented Jan 17, 2014

What is the commandline you call for the import?

@puredevotion
Copy link
Author

I just had a brainwave, I'm running this as root (too lazy to add another useraccount), root is filling up by the looks of it.
although all my files, the import stuff is placed at /opt, should switching to another user help?

@puredevotion
Copy link
Author

made an edit.
I dont know the amount of nodes, but at least 2 bilion, each node has 1 relationship (always the same type, no props) to another node. 4 properties per node

after some imports (total of 16.166.371):

Importing 5814208 Nodes took 349 seconds
....
[...]
Importing 1275803 Nodes took 120 seconds
Filesystem                        Size  Used Avail Use% Mounted on
/dev/mapper/vg_neo4jraid-lv_root   50G   18G   30G  37% /
tmpfs                             7.8G     0  7.8G   0% /dev/shm
/dev/sda1                         485M   54M  407M  12% /boot
/dev/mapper/vg_neo4jraid-lv_home  435G  199M  413G   1% /home

@jexp
Copy link
Owner

jexp commented Jan 17, 2014

then point the database to the correct absolute path,

e.g. import.sh /path/to/db nodes.csv rels.csv

if you import such a large set, make sure to have enough (as much as possible, e.g. 32-256G) RAM specified as HEAP in import.sh and
also in the batch.properties esp. for the nodes and relationships files

@jexp
Copy link
Owner

jexp commented Jan 17, 2014

Oh and perhaps it is your /tmp that is filling up? Could be for temporary caches/files ?

One can specify the tempdir for the jvm with -Djava.io.tmpdir=/path/to/big_tmp

Oh and you don't seem to have an /opt according to your df :)

@puredevotion
Copy link
Author

How do i use -Djava.io.tmpdir=/path/to/big_tmp ?
can i just do ./import.sh -Djava.io.tmpdir=/path/to/big_tmp

I'm running CentOS, but normally i'm a debian guy. if you do a minimal debian install on full disks, it doesnt bother about partitions (which is great for testing), but this seems to be diferent on CentOS, one of these learn-by-shame things...

@jexp
Copy link
Owner

jexp commented Jan 17, 2014

Edit the file, add it in there.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants