Saturday, December 18, 2010

Automatically Retry Failed Jobs in Quartz

How do you handle job failures in Quartz? There are a few things you can do:
  • Do nothing. Let the job fail and log the error.
  • Retry continuously until the job succeeds.
  • Retry n times and then disable the job.
In this post, I will describe how you can configure your jobs to be retried on failure.

Retrying continuously until success:
If you want to keep trying over and over again until the job succeeds, all you have to do is throw a JobExecutionException with a flag to tell the scheduler to fire it again when it fails. The following code shows how:

class MyJob implements Job {

  public MyJob() {
  }

  public void execute(JobExecutionContext context)
                  throws JobExecutionException {
    try{
        //do something
    }
    catch(Exception e){

        Thread.sleep(10000); //sleep for 10 secs

        JobExecutionException e2 = new JobExecutionException(e);
        //fire it again
        e2.refireImmediately();
        throw e2;
    }
  }
}
Retrying n times:
It gets a bit more complicated if you want to retry a certain number of times only. You have to use a StatefulJob and hold a retryCounter in its JobDataMap, which you increment if the job fails. If the counter exceeds the maximum number of retries, then you can disable the job if you wish.
class MyJob implements StatefulJob {

  public MyJob() {
  }

  public void execute(JobExecutionContext context)
                                 throws JobExecutionException {
    JobDataMap dataMap = context.getJobDetail().getJobDataMap();
    int count = dataMap.getIntValue("count");

    // allow 5 retries
    if(count >= 5){
        JobExecutionException e = new JobExecutionException("Retries exceeded");
        //unschedule it so that it doesn't run again
        e.setUnscheduleAllTriggers(true);
        throw e;
    }


    try{
        //do something

        //reset counter back to 0
        dataMap.putAsString("count", 0);
    }
    catch(Exception e){
        count++;
        dataMap.putAsString("count", count);
        JobExecutionException e2 = new JobExecutionException(e);

        Thread.sleep(10000); //sleep for 10 secs

        //fire it again
        e2.refireImmediately();
        throw e2;
    }
  }
}

3 comments:

  1. Anonymous4:52 PM

    e2.refireImmediately();
    should be
    e2.setRefireImmediately(true);

    ReplyDelete
  2. Anonymous11:03 AM

    Thanks for the idea, but this solution can be quite dangerous if you have lots of jobs that can run into such a condition, as you block a thread for each job with your Thread.sleep().

    This may lead to clogging up the Scheduler and hence leading to misfires for other jobs.

    That's why I opted for a delayed reschedule via the scheduler itself, which frees up Scheduler's thread immediately any cannot lead as easily to a Thread pool congestion.

    ReplyDelete
  3. Hi,
    I have job runs at every 15 mins. If it was failed at 10:15am does it runs again at 10:30am. I am just catching exception, but not re-firing it.

    Does failed job automatically runs at its next interval?

    ReplyDelete