-
Notifications
You must be signed in to change notification settings - Fork 57
/
matlab_condor.html
428 lines (383 loc) · 13.7 KB
/
matlab_condor.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
<html>
<head>
<title>
MATLAB_CONDOR - Running MATLAB Under the CONDOR Batch Queueing System
</title>
</head>
<body bgcolor="#eeeeee" link="#cc0000" alink="#ff3300" vlink="#000055">
<h1 align = "center">
MATLAB_CONDOR <br> Running MATLAB Under the CONDOR Batch Queueing System
</h1>
<hr>
<p>
<b>MATLAB_CONDOR</b>
is a directory of examples which
demonstrate how a MATLAB program can be submitted to the CONDOR batch
queueing system.
</p>
<p>
CONDOR allows a user to submit jobs for batch execution on an informal
cluster composed of various computers that often have idle time.
Based on information from the user's submission file, CONDOR chooses
one or more appropriate and available computers, transfers files to the
target systems, executes the program, and returns data to the user.
</p>
<p>
CONDOR has many features, and its proper use varies from site to site.
The information in this document was inspired by the CONDOR system
supported by the FSU Research Computing Center (RCC). Some of the
information therefore is peculiar to this local installation.
</p>
<p>
The first thing to note is that executing MATLAB is done indirectly.
The user has a MATLAB program or script to run, of course. Let's
say the main user script is called "program.m". In order for this
script to be run through CONDOR, we need to write a BASH shell script
that "knows" where MATLAB is stored, knows how to invoke MATLAB
for a noninteractive job, and knows the name of the user
script. Such a shell script might be called "program_run.sh", and
look like this:
<pre>
#!/bin/bash
/opt/matlab/current/bin/matlab -nosplash -nodesktop -nojvm -r "run('./program.m'); quit"
</pre>
</p>
<p>
Finally, the user must write a CONDOR script that copies necessary files
to an unknown machine, executes the shell script, which executes MATLAB,
which executes the user's MATLAB commands, and then copies the output
files back. The script might be called "program.condor".
</p>
<p>
The user must then log into the CONDOR submit node interactively:
<blockquote>
ssh condor-login.rcc.fsu.edu
</blockquote>
and, if necessary, transfer the CONDOR script, the BASH script, and
the MATLAB files to this node using SFTP, and then submit the CONDOR
script with a command like:
<blockquote>
condor_submit program.condor
</blockquote>
The user can check on the status of the job with the command
<pre>
condor_q
</pre>
If all goes well, the job output will be returned to the CONDOR
submit node.
However, if things do not go well, or the job is taking too much
time, user "username" can delete all jobs in the condor queue with
the command
<pre>
condor_rm username
</pre>
</p>
<h3 align = "center">
Using Files:
</h3>
<p>
On the FSU RCC Condor cluster, you must first copy your files to the
CONDOR login machine. When you submit your job to the CONDOR queue,
however, the program execution will take place on some unknown machine,
which initially does not have any of your files - and may not even
have the executable program you want to use, unless it is MATLAB,
for instance. Therefore, an important part of using CONDOR is
making sure that you copy to the remote machine all the files needed
for input, make sure the remote machine already has the executable,
or send a copy, and then copy all your output files back.
</p>
<p>
Because the file system is not shared, the following commands should
appear in your CONDOR script:
<pre>
should_transfer_files = yes
when_to_transfer_output = on_exit
</pre>
that allows you to specify the name of this file.
</p>
<p>
If your executable reads from "standard input", then your CONDOR
job will need a file containing that information. CONDOR includes a
command of the form
<pre>
input = filename
</pre>
that allows you to specify the name of this file. Similarly, if
your program writes to "standard output", CONDOR allows you to
specify the name of a file where this information will go:
<pre>
output = filename
</pre>
and if your program writes to the "standard error" device,
you can specify this with
<pre>
error = filename
</pre>
The input file must exist on your CONDOR login node before you submit
the job. The output and error files are created during the run, and
will automatically be copied back to your CONDOR login node when the
job is completed.
</p>
<p>
Your job may require many more files to run than simply the standard
input file. In particular, a MATLAB job will usually need one or more
M files. You need to tell CONDOR the names of these files, in a
comma-separated list:
<pre>
transfer_input_files = file1, file2, ..., file99
</pre>
</p>
<p>
Your job may create many files aside from simply standard output.
Luckily, all files created by the run will be automatically copied back.
</p>
<p>
We happen to know that MATLAB is installed on certain CONDOR nodes.
To guarantee that CONDOR sends our job to such a node, we use a
command like the following:
<pre>
requirements = ( OpSYS="LINUX" && Arch=="X86_64 && Matlab=="true" )
</pre>
</p>
<p>
To run a MATLAB job on the remote machine, we have to use a special
form of the MATLAB command that specifies where the program is,
how it is to be run, and what M file it is to execute. This is done
by writing a short BASH shell script. If our M file is called
"my_prog.m", then the script could be called "run_my_prog.sh",
and could look like this:
<pre>
#!/bin/bash
/opt/matlab/current/bin/matlab -nosplash -nodesktop -nojvm -r "run('./myprog.m') quit"
</pre>
Essentially, CONDOR will treat this shell script as your "executable",
so your CONDOR script must include the statement:
<pre>
executable = run_my_prog.sh
</pre>
</p>
<h3 align = "center">
A Sample CONDOR Script for MATLAB
</h3>
<p>
Here is a file called "my_prog.condor":
<pre>
universe = vanilla
executable = run_my_prog.sh
arguments =
input =
requirements = ( OpSYS="LINUX" && Arch=="X86_64 && Matlab=="true" )
should_transfer_files = yes
transfer_input_files = my_prog.m
when_to_transfer_files = on_exit
notification = never
output = output.txt
log = log.txt
error = error.txt
queue
</pre>
</p>
<p>
A few comments are in order.
<ul>
<li>
The "universe" command is required,
and on the FSU CONDOR system, we only have the "vanilla" universe.
</li>
<li>
The "arguments" command allows you to pass commandline arguments to
the executable program.
</li>
<li>
Setting "notification" to "yes" will cause CONDOR to send you
email when the job completes, and perhaps at some other stages as well.
</li>
<li>
The "log" command species a name to use for the file in which CONDOR
records the progress of the job.
</li>
<li>
The "error" command allows you to capture output to standard error.
</li>
<li>
The "queue" command is necessary, and tells CONDOR to actually
begin running your job.
</li>
</ul>
</p>
<h3 align = "center">
Licensing:
</h3>
<p>
The computer code and data files made available on this web page
are distributed under
<a href = "../../txt/gnu_lgpl.txt">the GNU LGPL license.</a>
</p>
<h3 align = "center">
Languages:
</h3>
<p>
<b>MATLAB_CONDOR</b> is available in
<a href = "../../c_src/c_condor/c_condor.html">a C version</a> and
<a href = "../../cpp_src/c++_condor/c++_condor.html">a C++ version</a> and
<a href = "../../f77_src/f77_condor/f77_condor.html">a FORTRAN77 version</a> and
<a href = "../../f_src/f90_condor/f90_condor.html">a FORTRAN90 version</a> and
<a href = "../../m_src/matlab_condor/matlab_condor.html">a MATLAB version</a>
</p>
<h3 align = "center">
Related Data and Programs:
</h3>
<p>
<a href = "../../c_src/c_condor/c_condor.html">
C_CONDOR</a>,
C programs which
illustrate how a C program can be run in batch mode using the condor
queueing system.
</p>
<p>
<a href = "../../cpp_src/c++_condor/c++_condor.html">
C++_CONDOR</a>,
C++ programs which
illustrate how a C++ program can be run in batch mode using the condor
queueing system.
</p>
<p>
<a href = "../../examples/condor/condor.html">
CONDOR</a>,
examples which
demonstrates the use of the CONDOR queueing system to submit jobs
that run on a one or more remote machines.
</p>
<p>
<a href = "../../f77_src/f77_condor/f77_condor.html">
F77_CONDOR</a>,
FORTRAN77 programs which
illustrate how a FORTRAN77 program can be run in batch mode using the condor
queueing system.
</p>
<p>
<a href = "../../f_src/f90_condor/f90_condor.html">
F90_CONDOR</a>,
FORTRAN90 programs which
illustrate how a FORTRAN90 program can be run in batch mode using the condor
queueing system.
</p>
<p>
<a href = "../../m_src/matlab_commandline/matlab_commandline.html">
MATLAB_COMMANDLINE</a>,
programs which
illustrate how MATLAB can be run from the UNIX command line, that is,
not with the usual MATLAB command window.
</p>
<p>
<a href = "../../m_src/matlab_compiler/matlab_compiler.html">
MATLAB_COMPILER</a>,
MATLAB programs which
illustrate the use of the Matlab compiler, which allows you
to run a Matlab application outside the Matlab environment.
</p>
<h3 align = "center">
Reference:
</h3>
<p>
<ol>
<li>
<a href = "../../pdf/condor.pdf">condor.pdf</a>,<br>
Condor Team,<br>
University of Wisconsin, Madison,<br>
Condor Version 8.0.2 Manual;
</li>
<li>
<a href = "http://www.cs.wisc.edu/htcondor/">
http://www.cs.wisc.edu/htcondor/</a>,<br>
The HTCondor home page;
</li>
</ol>
</p>
<h3 align = "center">
Examples and Tests:
</h3>
<p>
<b>SIMPLE</b> is a simple example, in which a MATLAB function is to
be called with certain input.
<ul>
<li>
<a href = "simple.condor">simple.condor</a>,
the CONDOR submission file. This is used by issuing the command
"condor_submit simple.condor".
</li>
<li>
<a href = "simple_run.sh">simple_run.sh</a>,
the BASH script which invokes MATLAB to run the user's main
MATLAB function.
</li>
<li>
<a href = "simple_script.m">simple_script.m</a>,
the user's main MATLAB function.
</li>
<li>
<a href = "simple_function.m">simple_function.m</a>,
a lower-level MATLAB function, which simply adds the
two input arguments, returning the sum.
</li>
<li>
<a href = "simple.mat">simple.mat</a>,
the MATLAB MAT file which contains the workspace at the end of
the computation. Because we use a "save" command to create
this file during the run, it is automatically copied back to
the login node when the job is completed.
</li>
<li>
<a href = "simple_output.txt">simple_output.txt</a>,
the output printed by MATLAB.
</li>
<li>
<a href = "simple_log.txt">simple_log.txt</a>,
CONDOR's log file (records the job submission, execution, and completion).
</li>
</ul>
</p>
<p>
<b>PRIMES</b> is an example which tries to count the prime numbers
from 1 to some power of 10.
<ul>
<li>
<a href = "primes.condor">primes.condor</a>,
the CONDOR submission file. This is used by issuing the command
"condor_submit primes.condor".
</li>
<li>
<a href = "primes_run.sh">primes_run.sh</a>,
the BASH script which invokes MATLAB to run the user's main
MATLAB function.
</li>
<li>
<a href = "primes_script.m">primes_script.m</a>,
the user's main MATLAB function.
</li>
<li>
<a href = "primes_report.txt">primes_report.txt</a>,
a report file created by the user's MATLAB function,
and automatically copied back to the CONDOR login node.
</li>
<li>
<a href = "primes_output.txt">primes_output.txt</a>,
the output printed by MATLAB.
</li>
<li>
<a href = "primes_log.txt">primes_log.txt</a>,
CONDOR's log file (records the job submission, execution, and completion).
</li>
</ul>
</p>
<p>
You can go up one level to <a href = "../m_src.html">
the MATLAB source codes</a>.
</p>
<hr>
<i>
Last modified on 28 August 2013.
</i>
<!-- John Burkardt -->
</body>
</html>