Celery connection to RabbitMQ errors at start or after running for a few minutes
|Reported by:||Fernando Gutierrez||Owned by:|
When Celery is starting or after it has run for a few minutes I get many "connection reset" errors in the log.
After this errors Celery does not recover and appears to be running but does not process any tasks.
There are two root causes for this problem:
1) The systemd service file provided as example in the deployment guide does not specify that celery must be started after RabbitMQ. The OS is free to start them in any order and when celery starts before RabbitMQ it will not be able to connect
2) In mediagoblin/init/celery/init.py the BROKER_HEARTBEAT is set to 1 second. In slower machines or under heavy load this setting causes missed hearbeats and a eventual connection reset.
I think this code had the intention to cause a more granular task scheduling but from what I understand from celery docs it has nothing to do with task scheduling. The CELERYBEAT_SCHEDULE is just a periodic task and has nothing to do with BROKER_HEARTBEAT.