(newest first)

  • Kent Peacock | Tue, 13 Mar 2012 05:22:08 UTC

    I would suggest that FAST TCP ( could be a viable approach to solving this control system problem. Unfortunately, I see that the developers of the solution have patented it. That's an almost guaranteed way to avoid wide deployment.
  • mike | Thu, 15 Dec 2011 22:15:33 UTC

    Sorry, but complaining about the network conditions is a bit like cell phone network designers complaining about people moving too quickly, buildings going up and trees growing. You cant shift the buildings and the trees. You need a protocol which can cope.
    Cell networks constantly monitor and analyse huge data sets. Many transmission parameters are constantly adjusted accordingly. Cell phones must not only maintain real-time bandwidth and latency sufficient for voice telephony, but they must deal with constantly changing signal  conditions, and even more extreme, maintain and switch calls between base stations. And all this is in a small battery powered unit. Compared to this, TCPs task is trivial.
    Endpoint TCPs are closed loop feedback control systems. The control signal is the RTT, and the controlled parameter is the window size. Feedback always involves a time constant, or period. In this case, there is an inherent time constant associated with a TCP expanding and contracting its window. 
    TCP window control based on RTT is about 20 years behind the times. And even back then, it was pretty crude. But it was good enough, because the behaviour of the network was simple. The RTT of any particular packet was a good predictor of all packets over a reasonable time period. And change tended not to be periodic. And so a packets in flight control system using RTT as a control signal worked well.
    Todays networks, (for reasons described in the article), exhibit vastly more complex behaviour, with periodicity inherent in every step of the link. So the RTT can vary with multiple periodicities. These can move in and out of phase with the periods of the endpoint TCPs. So the feedback can shift from negative to positive and back, with all values in between. In other words, the control system can become completely unstable.  The result is the massive and unpredictable variation in latency and bandwidth observed.
    But surely the answer is not to try to control the network conditions- a chain is only as strong as its weakest link, and all that. Cell phone network designers knew they had a hard job to do. And they designed an appropriate protocol.
    So surely we need a protocol which can cope. This must, just as a cellular network does, constantly monitor network conditions in detail, and react accordingly. This may best be done in a firmware layer in the network card. Until thats done properly (and that means hard sums, not hacking,) the problems will persist, and almost certainly get worse. The internet needs to grow up a little.
  • Eugen Dedu | Wed, 14 Dec 2011 13:38:53 UTC

    The advantage of my proposition is that there is no incentive for the sender to choose one or the other, it just depends on what sender application prefer, i.e. the two queues are more or less equal in terms of service.  Is this the case for ToS (at least as it was implemented) too?
  • Alex | Tue, 13 Dec 2011 19:18:25 UTC

    Eugen Dedu> Flows with the bit set have low latencies but smaller throughput ....
    These flags have been there from the very beginning -- ToS flags in the IP header. However at some moment windows 90-something came with ToS set to interactive for all traffic and broke everything.
  • Eugen Dedu | Fri, 09 Dec 2011 11:45:18 UTC

    What about the following method.  Divide the router buffer in two, 90% and 10%.  Each flow has a bit in IP header set by application.  When the bit is 1, the packet is put in 10% queue, otherwise in 90% queue.  Flows with the bit set have low latencies but smaller throughput, while the others have high latencies but higher throughput.
  • Eugen Dedu | Fri, 09 Dec 2011 11:42:04 UTC

    "... wherein routers along the path drop the path's bit rate into the initial packet if that rate is lower than what's there already. The idea is to tell both ends of the slowest link in the route."
    Since links are shared, what a sender needs is not bandwidth, but available bandwidth, which depends on the flows traversing the link.  As a simple example, for a 10Mb/s link with 10 flows, the router needs to put in the packet 1Mb/s, not 10Mb/s.
  • David | Thu, 08 Dec 2011 00:34:39 UTC

    @ Neal Murphy 
    "As part of my efforts to modernize Smoothwall (search for 'phaeton/roadster'), I've been developing a traffic control app that uses Linux Traffic Control and HTB to smooth out the flow."
    Can I get that app, or something similar, for installation onto an OpenWRT router?
  • Ayie Qeusyirie | Tue, 06 Dec 2011 19:23:51 UTC

  • Lennie | Tue, 06 Dec 2011 00:14:42 UTC

    @Neal Murphy "The idea is to tell both ends of the slowest link in the route." I think you should look up Explicit Congestion Notification
  • Martin Fick | Mon, 05 Dec 2011 23:52:59 UTC

    This is surely a really dumb solution, but I wonder how latency would be affected if all buffers (no matter their size) were turned into LIFOs instead of FIFOs?  
    It seems moronic on the surface, but perhaps if you have to wait to transmit, and there is more than one packet in the buffer, you might as well let the latest one through first.  For latencies, it might be similar to having a single packet buffer, perhaps the latencies would stay low.  But unlike simply dropping all other packets it might help avoid drops in the TCP case which might somehow still allow decent throughput?  
    Of course, you would get all sorts of packet order inversions and other weird stuff.  Perhaps some of the packet order inversions would get reordered properly as they make their way through the system and it wouldn't be too bad.  
    Perhaps a simple variation on this theme would work better (depending on the medium), make it a LIFO of smaller FIFO buffers (say 10 buffers/per FIFO)...
  • Neal Murphy | Mon, 05 Dec 2011 22:25:45 UTC

    I encountered this with my Comcast connection: a large, long transfer made all other use dicey. As part of my efforts to modernize Smoothwall (search for 'phaeton/roadster'), I've been developing a traffic control app that uses Linux Traffic Control and HTB to smooth out the flow. It does two major things: (1) it limits outbound bit rate to that of the cable link's observed rate (2.8Mb/s in my case), and (2) it reorders packets to force polite bandwidth sharing. It also gives DNS and VoIP slight priority over other traffic, but limited to one packet at a time and to a pretty low bit rate; other streams are allotted a percentage of the bandwidth but are allowed to use 100% in the absence of contention.
    As soon as I activated it, all the dodgy problems went away, like night v. day. The active bit rate climbs to 2.8Mb/s and stays there. I can have a 100MB HTTP upload (from a wireless node) and a 500MB FTP download (to a wired node) going simultaneously and be blissfully unaware of that traffic as I send and receive email and browse the web.
    I've seen it mentioned (possibly in Lucian Gheorghe's "Linux Firewalls and QoS") that it's better for packets to be dropped earlier in the path than later, which tends to make sense. If you have a 92Mb/s link (maximal saturation of a 100Mb link) feeding into a 2.8Mb/s link, it makes more sense to drop the excess packets before they become enqueued upstream.
    I've sometimes wondered if it would be possible to add a bit of functionality to TCP, wherein routers along the path drop the path's bit rate into the initial packet if that rate is lower than what's there already. The idea is to tell both ends of the slowest link in the route. But that would likely be of little value, since we already know where the biggest bottleneck is: between the ISP and the home/SOHO.
    The most workable solution may be similar to what I did: match the outbound bit rate to the slower link, and enforce reasonable bandwidth sharing.
  • Jim Gettys | Sat, 03 Dec 2011 21:43:28 UTC

    The problem is not buffer size per' se', but unmanaged buffers.  You can't easily predict in advance how much buffering is every going to be needed, and you want your buffers to normally be running nearly empty even at a bottleneck (just enough to keep that bottleneck filled).
    So if we had an effective AQM algorithm everywhere, that would occur.  Without it, there is no "right answer" for buffering (except in exceptional cases, seldom if ever encountered).  
  • Lincoln Yeoh | Sat, 03 Dec 2011 19:38:56 UTC

    I haven't checked in recent times, but used to be getting a good timing source for x86 servers/desktops wasn't so simple. The CPU hardware people said don't use TSC (with good reasons), but the other time sources were slow to read from or not always available or had some other problems. With billions of transistors now you should be able to get decent timing (and locking/serialization too for that matter) somewhere right? For a typical cheap embedded linux network device, it might be worse though. 
    Anyway, once you take packet age into account, it doesn't matter even if your buffer is infinite in size. Someone smarter and more knowledgeable than me in this field can go figure out if it's really better to do things this way. I suspect it'll be better, but I'm no expert in this or any specific field :).
  • Richard Neill | Sat, 03 Dec 2011 16:09:48 UTC

    There is an additional problem with WiFi networks, which can have the opposite problem. Instead of a cat5 cable (where what comes out is essentially a perfectly recovered copy of what goes in), the radio environment is noisy. A wireless link can sometimes be congested (and this is usually well handled by the AP), but is often just suffering from RF interference or a weak signal. 
    In this case, perhaps only 50% percent of the packets get through. The sender wrongly infers [because it is tuned for Cat5, where the only source of lost packets is drops from congestion] that the network is congested, and backs off the transmission rate, slowing the link altogether. This is exactly the wrong thing to do.
    As I understand it, the solution for the weak-signal zone is to reduce the TCP retransmit timeout, and to do forward error-correction (i.e. anticipate many packet drops, and send every packet twice).
    [For example, a lightly loaded wifi link in the middle of a field, no other wireless devices in range, but we are near the extreme edge of the signal strength. We might have a nominal 1Mbit/s of bandwidth, and only be using it for 10 pings a second, yet suffer 20% packet loss. This is especially severe for UDP connections; for example as used by DHCP. You can easily observe this by trying to connect to, and then use a wifi network when near the edge of the range. At certain distances, it's impossible to connect to the network (because too many UDP packets are dropped), but if you are already on the network, web browsing remains just about usable (because TCP does retransmit)).]
  • Dave Täht | Sat, 03 Dec 2011 16:07:08 UTC

    The concept of relying far more heavily on "Time in Queue", and expiring old packets, has great potential. 
    As you point out, monotonic time is highly desirable for such a packet scheduler. I don't feel it needs extremely high clock resolutions, as we are talking about delays that in many cases are equivalent to a detour between your couch and the moon. Getting that down to mere ms per hop would be a marked improvement.
    However, as wireless is one of the biggest sources of wildly variable latency, and all hosts that do wireless are effectively routers, the domains where such a 'TiQ' scheduler would need to be applied extends out to home gateways, pcs (windows,mac,linux, etc) , handhelds, 3g and 4g devices and so on. (and then there is a question of 'how old is too old')
    Other forms of shared edge network devices (cable to a reasonable extent, fiber far less so) have similar issues with buffering and latency that this approach would help also.
    Getting a fix like this out to all these devices is not within the budget of a Cisco or a Juniper (certainly not mine!) - but getting the idea out and code published proving it could help a lot across the industry.
    Even in a post-TiQ universe, the application of things like fair queuing and AQM on and around it is highly desirable, in fact, would make such a scheduler work much better.
  • Lincoln Yeoh | Sat, 03 Dec 2011 14:09:38 UTC

    In my opinion the actual solution to latency is not a reduction in buffer sizes. Because the real problem isn't actually large buffers. The problem is devices holding on to packets longer than they should. Given the wide variations in bandwidth, it is easier to define "too high a delay" than it is to define "too large a buffer".
    So the real solution would be for routers (and other similar devices) to drop and not forward packets that are older than X milliseconds (where X could be 1, 5, 50 or 100 depending on the desired maximum hop latency and the output bandwidths). X would be measured from the time the packet ends up in that router. Routers may have different values for X for different output paths/ports or even "tcp connections"  (more expensive computationally), or a single hop wide value (cheaper to calculate).
    To do such a thing may require better and more efficient timing systems than some devices have. But Cisco and Juniper charge big bucks right? By the way, you may wish to ensure the time keeping you use for this is always incrementing (monotonic) and is not affected by system clock changes (whether backwards or forwards) ;).
  • Dave Täht | Sat, 03 Dec 2011 10:18:28 UTC

    The publication deadline for this missed the adoption of Byte Queue Limits (BQL) into the net-next tree
    of the linux kernel. This holds promise to improve latency inside the driver portion of that stack by an 
    order of magnitude. 
    For more details, see:
    And the kernel developers are busily adding further driver support for it and fixing related bugs, including some found in the RED implementation.
    It's my hope that similar techniques will be adopted by other operating systems. I note that BQL takes out one
    order of magnitude of the problem... there are at least 3 more orders of magnitude (and the 10GigE folk are busily adding a 4th) that need to be addressed and managed properly, and worse, superpackets generated by the TSO and GSO offload mechanisms are increasingly escaping the datacenter and being injected into the the rest of the internet from more and more sources.
    It's an uphill battle, all the way, that needs to be defeated in detail.
    Seeing what GSO and TSO can do to a slow link, in particular, scares me.
Leave this field empty

Post a Comment:

(Required - 4,000 character limit - HTML syntax is not allowed and will be removed)