I use mod_jk 1.2.15 in a failover configuration with session stickyness: # List of available workers worker.list=failover # Master worker # Take care that the jvmRoute attribute in the Engine tag is set to master # for the Tomcat addressed by MASTER_HOST and MASTER_PORT worker.master.port=MASTER_PORT worker.master.host=MASTER_HOST worker.master.type=ajp13 worker.master.cachesize=10 worker.master.cache_timeout=600 worker.master.socket_keepalive=1 worker.master.prepost_timeout=300 worker.master.reply_timeout=120000 worker.master.recovery_options=3 # redirect to backup if master fails worker.master.redirect=backup # Backup worker for failover # Take care that the jvmRoute attribute in the Engine tag is set to backup # for the Tomcat addressed by BACKUP_HOST and BACKUP_PORT worker.backup.port=BACKUP_PORT worker.backup.host=BACKUP_HOST worker.backup.type=ajp13 worker.backup.cachesize=10 worker.backup.cache_timeout=600 worker.backup.socket_keepalive=1 worker.backup.prepost_timeout=300 worker.backup.reply_timeout=120000 worker.backup.recovery_options=3 # Set worker to disabled. This means it gets only requests in the case that # - The session route points to this worker # - In the failover case (see redirect setting for master above) worker.backup.disabled=1 # Failover worker worker.failover.type=lb worker.failover.balanced_workers=master, backup Once I got a session from the backup worker the session stays on this disabled worker which is correct and expected. But if the backup server goes into error state it does not recover from this state as disabled workers are not retried. This is bad in the case that the disabled worker had been choosen because of session stickyness. The attached patch fixes this.
Created attachment 17807 [details] Patch against 1.2.15
Right, it makes sense to retry the disabled worker also. Try by changing the #define JK_WORKER_IN_ERROR(w) ((w)->in_error_state && !(w)->is_disabled && !(w)->is_busy) to: #define JK_WORKER_IN_ERROR(w) ((w)->in_error_state && !(w)->is_busy) Your patch only addresses the byreq lb methods, while the others should be treated in the same way.
Ok. I just wasn't sure if adjusting JK_WORKER_IN_ERROR was the right thing to do, so I limited the change to find_bysession_route. Do we really care about disabled workers in find_best_byrequests, find_best_bytraffic and get_most_suitable_worker (here only the one worker case)? I don't think so.
Right, we don't care about disabled workers for a single worker cause it's an oxymoron. Anyhow, adjusting JK_WORKER_IN_ERROR should do the trick. I really can not remember why I put that check at the first place.
Fixed in the SVN. Thanks for spotting that.