Skip to content
GitLab
  • Menu
Projects Groups Snippets
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • gridtk gridtk
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 6
    • Issues 6
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • bob
  • gridtkgridtk
  • Issues
  • #6
Closed
Open
Created Apr 29, 2016 by André Anjos@andre.anjos💬Owner

Incomplete list of jobs leads to jobs that do not get submitted

Created by: sc-jumper

I have a program that submits a number of jobs via JobManagerSGE.submit

Each job is relatively small but they require slightly different parameters for each job. Rather than create an array job, I am just submitting hundreds of smaller jobs. While I submit 300 jobs, I only complete about 295 jobs.

The standard output shows... 'jman'> | all.q : failure (70) -- ... ' was not executed successfully (maybe a time-out happened). Please check the log files

I check the logs for the failed job and I see File "/opt/gridengine/ots/spool/chl-compute1237-ib0/job_scripts/6965", line 29, in sys.exit(gridtk.script.jman.main()) File ".../gridtk/gridtk/script/jman.py", line 381, in main args.func(args) File ".../gridtk/gridtk/script/jman.py", line 224, in run_job jm.run_job(job_id, array_id) File ".../gridtk/gridtk/sge.py", line 179, in run_job raise ValueError("Could not find job id '%d' in the database'" % job_id) ValueError: Could not find job id '12345' in the database'

I went back, after receiving the above error and queries the sqlite database directly at that job Id, 12345, is there.

I added a "sleep" immediately following job = add_job(self.session... in gridtk.gridtk.sge.py import time time.sleep(1)

And this corrected the issue. I was not able to figure out why the database was not correctly adding the job id "in time" for the qsub command to execute, or if my sleep hack impacted something else. After adding sleep every job gets submitted (slowly) and every job completes.

Assignee
Assign to
Time tracking