cluster computing - wait for already finished jobs -


i launch pbs script once others completed. use commands:

$ job1=$(qsub job1.pbs) $ jobn=$(qsub jobn.pbs) $ qsub -w depend=afterok:$job1:$jobn join.pbs 

this works, in cases. if run joining script when job1 , jobn finished, go idle indefinitely waiting already-finished-jobs finish. sounds insane, happens. if run qstat can see joining job being held ('h')

$ qstat -u me job id          username queue    jobname    sessid nds tsk memory time  s time --------------- -------- -------- ---------- ------ --- --- ------ ----- - ----- 1990613            me    workq    join.pbs      --    1   1    --    --  h   --  

however if @ least 1 of jobs still running, while other finished, joining script not go idle , finish.

so solutions deal jobs over? need job finish.

when join job starts, server still needs know depended-upon jobs; if either of gone qstat, you'll need increase keep_completed in qmgr. otherwise, when join job ready run, dependency never satisfied, , hold never released.

to check: $ qmgr -c 'print server keep_completed'

to add/modify: $ qmgr -c 'set server keep_completed=300'

(i believe can set keep_completed on queues.)


Comments

Popular posts from this blog

magento2 - Magento 2 admin grid add filter to collection -

Android volley - avoid multiple requests of the same kind to the server? -

Combining PHP Registration and Login into one class with multiple functions in one PHP file -