Skip to content

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
    • Help
    • Support
    • Submit feedback
    • Contribute to GitLab
  • Sign in
gridtk
gridtk
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 7
    • Issues 7
    • List
    • Boards
    • Labels
    • Milestones
  • Merge Requests 0
    • Merge Requests 0
  • CI / CD
    • CI / CD
    • Pipelines
    • Jobs
    • Schedules
  • Analytics
    • Analytics
    • CI / CD
    • Repository
    • Value Stream
  • Members
    • Members
  • Collapse sidebar
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
  • bob
  • gridtkgridtk
  • Issues
  • #6

Closed
Open
Opened Apr 29, 2016 by André Anjos@andre.anjos💬
  • Report abuse
  • New issue
Report abuse New issue

Incomplete list of jobs leads to jobs that do not get submitted

Created by: sc-jumper

I have a program that submits a number of jobs via JobManagerSGE.submit

Each job is relatively small but they require slightly different parameters for each job. Rather than create an array job, I am just submitting hundreds of smaller jobs. While I submit 300 jobs, I only complete about 295 jobs.

The standard output shows... 'jman'> | all.q : failure (70) -- ... ' was not executed successfully (maybe a time-out happened). Please check the log files

I check the logs for the failed job and I see File "/opt/gridengine/ots/spool/chl-compute1237-ib0/job_scripts/6965", line 29, in sys.exit(gridtk.script.jman.main()) File ".../gridtk/gridtk/script/jman.py", line 381, in main args.func(args) File ".../gridtk/gridtk/script/jman.py", line 224, in run_job jm.run_job(job_id, array_id) File ".../gridtk/gridtk/sge.py", line 179, in run_job raise ValueError("Could not find job id '%d' in the database'" % job_id) ValueError: Could not find job id '12345' in the database'

I went back, after receiving the above error and queries the sqlite database directly at that job Id, 12345, is there.

I added a "sleep" immediately following job = add_job(self.session... in gridtk.gridtk.sge.py import time time.sleep(1)

And this corrected the issue. I was not able to figure out why the database was not correctly adding the job id "in time" for the qsub command to execute, or if my sleep hack impacted something else. After adding sleep every job gets submitted (slowly) and every job completes.

Assignee
Assign to
None
Milestone
None
Assign milestone
Time tracking
None
Due date
None
0
Labels
None
Assign labels
  • View project labels
Reference: bob/gridtk#6