cluster computing - wait for already finished jobs -
i launch pbs script once others completed. use commands:
$ job1=$(qsub job1.pbs) $ jobn=$(qsub jobn.pbs) $ qsub -w depend=afterok:$job1:$jobn join.pbs
this works, in cases. if run joining script when job1 , jobn finished, go idle indefinitely waiting already-finished-jobs finish. sounds insane, happens. if run qstat
can see joining job being held ('h')
$ qstat -u me job id username queue jobname sessid nds tsk memory time s time --------------- -------- -------- ---------- ------ --- --- ------ ----- - ----- 1990613 me workq join.pbs -- 1 1 -- -- h --
however if @ least 1 of jobs still running, while other finished, joining script not go idle , finish.
so solutions deal jobs over? need job finish.
when join job starts, server still needs know depended-upon jobs; if either of gone qstat
, you'll need increase keep_completed
in qmgr
. otherwise, when join job ready run, dependency never satisfied, , hold never released.
to check: $ qmgr -c 'print server keep_completed'
to add/modify: $ qmgr -c 'set server keep_completed=300'
(i believe can set keep_completed
on queues.)
Comments
Post a Comment