Ran into an odd scenario with RabbitMQ recently. Messages were not being published to queues, yet no errors were reported. Queues were recreated once deleted (indicating that the message was received, just not published).
There were a couple of indiciations in the web panel that something was amiss:
- Disk space was highlighted red (showed ~800MB free),
- All connections were marked as blocking.
I don't have much experience with RabbitMQ, so I misinterpreted these as:
- Disk space: probably a warning - it surely can't need nearly 1GB to operate,
- Blocking connections: I assumed this meant that the threads were waiting for messages - blocking in the threading context.
Turns out those connections were really blocking new messages and that disk space warning was causing the issue. RabbitMQ uses the available quantity of free memory and disk space to gauge how it can deal with an influx of messages - something called flow control. When the limits are reached, the connections start to block.
The worst part of this was that there was no indication in the log files that anything was wrong. All that was needed was a "free disk space is too low" line. The AMQP library we use should have detected a blocked connection - I'm looking into whether it checks for it.
My main takeaway from this is to read a bit more into the software we're running.