vendredi 27 février 2015

Quartz Jobs lock up application

We have an application that schedules jobs based on user's response to a question. There are peak times when we get more responses than others. This is a java 7 application run under tomcat using apache wicket, spring, spring-amqp, hibernate, quartz and c3p0.


The problem is at peak times, when the application is starts running multiple jobs, it locks up completely. We run jstack for a thread dump when the problem happened and all 5 of our quartz threads were in a BLOCKED state with this as the last 2 lines:



- java.lang.Object.wait(long) @bci=0 (Compiled frame; information may be imprecise)
- com.mchange.v2.resourcepool.BasicResourcePool.awaitAvailable(long) @bci=171, line=1315 (Interpreted frame)


Unfortunately, our connection pool had a max pool size of 5 so our entire application, including tomcat locked up. We had to kill it from the windows task manager. The application would fail every time we would start it back up because quartz was executing all the misfired jobs, which exposed the problem again. The quartz best practices documentation indicates that your max connection size in your pool should be at least 3 more than the thread count for quartz. I made a change to increase the max size to 50 and it seemed to get the application back up and running, who knows for how long. However, it seems like it would run slow but not lock up if there weren't enough connections, not lock everything including tomcat.


In c3p0 I set debugUnreturnedConnectionStackTraces=true and unreturnedConnectionTimeout=10000 to hopefully find where someone is checking out a connection and not returning it but I never see anything in the logs.


So my question is, what is happening here? Why is our application locking up when quartz runs all of its threads?


I'm using quartz 2.2.1.


Here is my quartz configuration



<bean id="mainScheduler" class="org.springframework.scheduling.quartz.SchedulerFactoryBean">
<property name="schedulerName" value="MainScheduler" />
<property name="overwriteExistingJobs" value="true" />
<property name="startupDelay" value="50" />
<property name="dataSource" ref="dataSource" />
<property name="quartzProperties">
<props>
<prop key="org.quartz.scheduler.skipUpdateCheck">true</prop>
<prop key="org.quartz.jobStore.class">org.quartz.impl.jdbcjobstore.JobStoreTX</prop>
<prop key="org.quartz.jobStore.driverDelegateClass">org.quartz.impl.jdbcjobstore.MSSQLDelegate</prop>
<prop key="org.quartz.jobStore.misfireThreshold">60000</prop>
<prop key="org.quartz.jobStore.selectWithLockSQL">SELECT * FROM {0}LOCKS UPDLOCK WHERE LOCK_NAME = ?</prop>
<prop key="org.quartz.plugin.triggHistory.class">org.quartz.plugins.history.LoggingTriggerHistoryPlugin</prop>
<prop key="org.quartz.plugin.triggHistory.triggerFiredMessage">Trigger {1}.{0} fired job {6}.{5} at: {4, date, HH:mm:ss dd/MM/yyyy}</prop>
<prop key="org.quartz.plugin.triggHistory.triggerCompleteMessage">Trigger {1}.{0} completed firing job {6}.{5} at {4, date, HH:mm:ss dd/MM/yyyy} with resulting trigger instruction code: {9}</prop>
<prop key="org.quartz.plugin.jobHistory.class">org.quartz.plugins.history.LoggingJobHistoryPlugin</prop>
<prop key="org.quartz.plugin.jobHistory.jobSuccessMessage">Job {1}.{0} fired at: {2, date, dd/MM/yyyy HH:mm:ss} result=OK</prop>
<prop key="org.quartz.plugin.jobHistory.jobFailedMessage">Job {1}.{0} fired at: {2, date, dd/MM/yyyy HH:mm:ss} result=ERROR</prop>
<prop key="org.quartz.threadPool.threadCount">5</prop>
</props>
</property>
<property name="applicationContextSchedulerContextKey">
<value>applicationContext</value>
</property>
</bean>

Aucun commentaire:

Enregistrer un commentaire