256
Logo

Producer Consumer Thread Race Conditions

Background

Recently, on StackOverFlow.com, a question was asked about why java.util.concurrent.ArrayBlockingQueue use while loops instead of if around calls to await(). The accepted answer was that it was due to spurious wakeups. I strongly disagree with this answer and instead believe that it is due to bad programming around race conditions. This is an all too common problem with producer/consumer models from threaded/reentrant applications. See the wikipedia section on it.

I'm not saying that spurious wakeups aren't an issue and the while protects against them as well, but the incidence rate of them is 3-4 orders of magnitude less than the race conditions described here. As you can see running the following Java code, you can get the race conditions to happen extremely reliably.

The Problem

In producer/comsumer applications you have threads which are producing some items and threads which are consuming the items and working on them. This can be a website with servlet threads queueing up aynchronous database requests, a thread reading lines from a file and queueing up work that needs to be done on each line, etc..

Typically you have some sort of blocking queue so that when a producer has a item, it adds it to the end of the queue, possible blocking if the queue has too many items in it already. The consumer, when it needs a new item, gets from the front of the queue, possibly blocking if the queue is empty. It is the fact that both sides are blocking and need to signal the other side is where the complexity and confusion originate.

In the code listed at the bottom of the page, the consumer locks the lock and tests whether the items list is empty. If it is empty then it waits until a producer signals the hasItems condition. However, when an item is added to the list, and the condition is signaled, the consumer goes to re-lock the lock and may end up behind another consumer waiting for the same lock. That other comsumer will get the lock first and de-queue the item that the first consumer was signaled for. This race to the dequeue is why what we call a race condition and it is the real source of the problem.

The Race

race condition

Here is a more specific enumeration of the race condition. It shows the interaction of 2 consumer threads (getter-A and getter-C) a single producer thread (putter-B).

  1. getter A is waiting for there to be items in the queue (line #31)
  2. putter B locks the lock (line #54)
  3. getter C finishes the last item, calls get(), tries to lock the lock, waits for B to unlock (line #25)
  4. putter B adds an item into the queue and signals that there is an item there (line #72)
  5. getter A is awoken and tries to lock the lock, has to wait for B to unlock (after line #31)
  6. putter B unlocks (line #74)
  7. getter C is ahead of getter A waiting for lock so it is given the lock first (line #25)
  8. getter C dequeues the item that getter A was notified for (!!), and then unlocks (line #48)
  9. getter A is finally able to lock, goes forward and calls remove on an empty list (line #40)

The race is between getter A and C on who is able to get the lock first to dequeue the new item. The race condition would be solved by changing lines 26 and 51 from if statements to while statements. Once getter A gets the lock it would check again to see if items is empty and would go back to waiting if C had already dequeued. The producer would also check again to see if the list is at capacity.

This is the real problem with many producer/comsumer models instead of spurious wakeups which I'm sure do happen but not nearly as frequently. As an interesting historical note, I had submit corrections to the original O'Reilly Pthreads book which suffered from the same problems -- errata had to be issued and a new version published.

Other Reading

Sample Code

Click to download sample code.

Free Spam Protection   Eggnog Recipe   Android ORM   Simple Java Magic   JMX using HTTP