You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I go to restart one of my jobTree scripts that uses the global and local temp directories, after some of its jobs have I get a crash from the internal jobTree code complaining about some directories not existing.
Can the code be made to handle the lack of existence of those directories?
log.txt: ---JOBTREE SLAVE OUTPUT LOG---
log.txt: Traceback (most recent call last):
log.txt: File "/cluster/home/anovak/.local/lib/python2.7/site-packages/jobTree-1.0-py2.7.egg/jobTree/src/jobTreeSlave.py", line 271, in main
log.txt: defaultMemory=defaultMemory, defaultCpu=defaultCpu, depth=depth)
log.txt: File "/cluster/home/anovak/.local/lib/python2.7/site-packages/jobTree-1.0-py2.7.egg/jobTree/scriptTree/stack.py", line 153, in execute
log.txt: self.target.run()
log.txt: File "/cluster/home/anovak/hive/sgdev/mhc/targets.py", line 244, in run
log.txt: index_dir = sonLib.bioio.getTempFile(rootDir=self.getGlobalTempDir())
log.txt: File "/cluster/home/anovak/.local/lib/python2.7/site-packages/jobTree-1.0-py2.7.egg/jobTree/scriptTree/target.py", line 103, in getGlobalTempDir
log.txt: self.globalTempDir = self.stack.getGlobalTempDir()
log.txt: File "/cluster/home/anovak/.local/lib/python2.7/site-packages/jobTree-1.0-py2.7.egg/jobTree/scriptTree/stack.py", line 129, in getGlobalTempDir
log.txt: return getTempDirectory(rootDir=self.globalTempDir)
log.txt: File "/cluster/home/anovak/.local/lib/python2.7/site-packages/sonLib-1.0-py2.7.egg/sonLib/bioio.py", line 457, in getTempDirectory
log.txt: os.mkdir(rootDir)
log.txt: OSError: [Errno 20] Not a directory: '/cluster/home/anovak/hive/sgdev/mhc/tree7/jobs/t2/t3/t1/t0/gTD0/tmp_Zss3uyl5X6/tmp_45OevDhWor/tmp_vxiVIbzGSw'
log.txt: Exiting the slave because of a failed job on host ku-1-21.local
log.txt: Due to failure we are reducing the remaining retry count of job /cluster/home/anovak/hive/sgdev/mhc/tree7/jobs/t2/t3/t1/t0/job to 0
The text was updated successfully, but these errors were encountered:
Is this a from a run that crashed over the weekend? Something went majorly wrong with the cluster, and the jobTree might not be recoverable. The directory that it's trying to make a subdirectory of is a 0-length file, which is totally screwed up. I don't see any code path that could lead to that being a file.
In my jobTree, I also had 0-length pickle files show up when it was (according to the posix spec) impossible for them to, so I think this isn't a jobTree error, but a fluke caused by our cluster trouble.
No, I wiped the tree and started this yesterday. It could be due to the
fact that one of the cluster nodes has a hung filesystem mount of some sort
and has been taking jobs and doing who knows what with them.
I noticed that sonLib's temp directory getting function, if it decided to
try a temp directory name and the name is taken, tries to make a
subdirectory of that name instead of picking a new name in the root. I've
changed that around in my copy, and then jobTree successfully can run my
target and start up my C++ code.
On Thu, Feb 5, 2015 at 11:05 AM, Joel Armstrong [email protected]
wrote:
Is this a from a run that crashed over the weekend? Something went majorly
wrong with the cluster, and the jobTree might not be recoverable. The
directory that it's trying to make a subdirectory of is a 0-length file,
which is totally screwed up. I don't see any code path that could lead to
that being a file.
In my jobTree, I also had 0-length pickle files show up when it was
(according to the posix spec) impossible for them to, so I think this isn't
a jobTree error, but a fluke caused by our cluster trouble.
—
Reply to this email directly or view it on GitHub #27 (comment)
.
When I go to restart one of my jobTree scripts that uses the global and local temp directories, after some of its jobs have I get a crash from the internal jobTree code complaining about some directories not existing.
Can the code be made to handle the lack of existence of those directories?
The text was updated successfully, but these errors were encountered: