Appendix: CoDel pseudocode

Data types

time_t is an integer time value in units convenient for the system. Resolution to at least a millisecond is required, and better resolution is useful up to the minimum possible packet time on the output link; 64- or 32-bit widths are acceptable but with 32 bits the resolution should be no finer than 2-16, so there's enough dynamic range to represent a wide range of queue waiting times. Narrower widths also have implementation issues due to overflow (wrapping) and underflow (limit cycles because of truncation to zero) that are not addressed in this pseudocode. The code presented here uses 0 as a flag value to indicate "no time set."

packet_t* is a pointer to a packet descriptor. We assume it has a tstamp field capable of holding a time_t and that field is available for use by CoDel (it will be set by the enque routine and used by the deque routine).

queue_t is a base class for queue objects (the parent class for codel_queue_t objects). We assume it has enque() and deque() methods that can be implemented in child classes. We assume it has a bytes() method that returns the current queue size in bytes. This can be an approximate value. The method is invoked in the deque() method but shouldn't require a lock with the enque() method.

flag_t is a Boolean.


Per-queue state (codel_queue_t instance variables)

time_t first_above_time; Time when we'll declare we're above target (0 if below)
time_t drop_next; Time to drop next packet
uint32_t count; Packets dropped since going into drop state
flag_t dropping; Equal to 1 if in drop state

Constants

time_t target = MS2TIME(5); Target queue delay (5 ms)
time_t interval = MS2TIME(100); Sliding minimum time window width (100 ms)
u_int maxpacket = 512; Maximum packet size in bytes (should use interface MTU)

Enque uses the generic system enqueue routine after it puts the current time in the packet's tstamp field so that the deque routine can compute the packet's sojourn time.

void codel_queue_t::enque(packet_t* pkt)
{
              pkt->timestamp() = clock();
              queue_t::enque(pkt);
}

Since the degree of multiplexing and nature of the traffic sources is unknown, CoDel acts as a closed-loop servo system that gradually increases the frequency of dropping until the queue is controlled (sojourn time goes below target). This is the control law that governs the servo. It has this form because of the sqrt(p) dependence of TCP throughput on drop probability.1 Note that for embedded systems or kernel implementation the inverse sqrt can be computed efficiently using only integer multiplication.


time_t codel_queue_t::control_law(time_t t)
{
              return t + interval / sqrt(count);
}

This is a helper routine the does the actual packet dequeue and tracks whether the sojourn time is above or below target and, if above, if it has remained above continuously for at least interval. It returns two values, a Boolean indicating whether it is OK to drop (sojourn time above target for at least interval) and the packet dequeued.


typedef struct {
      packet_t* p;
      flag_t ok_to_drop;
} dodeque_result;

dodeque_result codel_queue_t::dodeque(time_t now)
{
      dodeque_result r = { 0, queue::deque() };
      if (r.p == NULL) {
            first_above_time = 0;
      } else {
            time_t sojourn_time = now - r.p->tstamp;
            if (sojourn_time < target || bytes() < maxpacket) {
                  // went below so we'll stay below for at least interval
                  first_above_time = 0;
            } else {
                  if (first_above_time == 0) {
                        // just went above from below. if we stay above
                        // for at least interval we'll say it's ok to drop
                        first_above_time = now + interval;
                  } else if (now >= first_above_time) {
                        r.ok_to_drop = 1;
                  }
            }
      }
      return r;
}

All of the work of CoDel is done here. There are two branches: if we're in packet-dropping state (meaning that the queue-sojourn time has gone above target and hasn't come down yet), then we need to check if it's time to leave or if it's time for the next drop(s); if we're not in dropping state, then we need to decide if it's time to enter and do the initial drop.


packet_t* codel_queue_t::deque()
{
      time_t now = clock();
      dodeque_result r = dodeque();
      if (r.p == NULL) {
            // an empty queue takes us out of dropping state
            dropping = 0;
            return r.p;
      }
      if (dropping) {
            if (! r.ok_to_drop) {
                  // sojourn time below target - leave dropping state
                  dropping = 0;
            } else if (now >= drop_next) {

It's time for the next drop. Drop the current packet and dequeue the next. The dequeue might take us out of dropping state. If not, schedule the next drop. A large backlog might result in drop rates so high that the next drop should happen now; hence, the while loop.


                  while (now >= drop_next && dropping) {
                        drop(r.p);
                        ++count;
                        r = dodeque();
                        if (! r.ok_to_drop)
                              // leave dropping state
                              dropping = 0;
                        else
                              // schedule the next drop.
                              drop_next = control_law(drop_next);
                  }
            }

If we get here, then we're not in dropping state. If the sojourn time has been above target for interval, then we decide whether it's time to enter dropping state. We do so if we've been either in dropping state recently or above target for a relatively long time. The "recently" check helps ensure that when we're successfully controlling the queue we react quickly (in one interval) and start with the drop rate that controlled the queue last time rather than relearn the correct rate from scratch. If we haven't been dropping recently, the "long time above" check adds some hysteresis to the state entry so we don't drop on a slightly bigger-than-normal traffic pulse into an otherwise quiet queue.


      } else if (r.ok_to_drop &&
                         ((now - drop_next < interval) ||
                          (now - first_above_time >= interval))) {
                   drop(r.p);
                   r = dodeque();
                   dropping = 1;

                   // If we're in a drop cycle, the drop rate that controlled the queue
                   // on the last cycle is a good starting point to control it now.
                   if (now - drop_next < interval)
                           count = count>2? count-2 : 1;
                   else
                           count = 1;
                   drop_next = control_law(now);
             }
             return (r.p);
}

Reference

1. Mathis, M., Semke, J., Mahdavi, J. 1997. The macroscopic behavior of the TCP congestion avoidance algorithm. ACM SIGCOMM Computer Communication Review 27(3).