Upgrade to Pro — share decks privately, control downloads, hide ads and more …

2008: TCP Issues in the Data Center

2008: TCP Issues in the Data Center

Presentation for Stanford seminar "The Future of TCP: Train-Wreck or Evolution?"
Raises the issue of buffer-bloat in servers (before the "buffer-bloat" term was used)
by Tom Lyon of Nuova Systems

Avatar for Tom Lyon

Tom Lyon

April 01, 2008
Tweet

More Decks by Tom Lyon

Other Decks in Technology

Transcript

  1. 03/12/08 Nuova Systems Inc. Page 1 TCP Issues in the

    Data Center Tom Lyon The Future of TCP: Train-wreck or Evolution? Stanford University 2008-04-01
  2. 03/12/08 Nuova Systems Inc. Page 2 TCP: Not Just for

    “The Internet”  Essentially all network software relies on TCP/IP semantics  “The network is the data center”  In the data center, gigabits are “free”  105 times cheaper than WAN bandwidth  Terabit class switches  10Gb endpoints  TCP needs:  High bandwidth  Low Latency  Predictability & Fairness
  3. 03/12/08 Nuova Systems Inc. Page 3 Storage Networks  Storage

    Access slowly evolving from hardware bus to open network  NAS vs SAN  NFS & CIFS vs SCSI's many flavors  Ethernet vs Fibre Channel vs Infiniband
  4. 03/12/08 Nuova Systems Inc. Page 4 Storage Networks: Ethernet vs

    EtherNot  iSCSI, NFS, CIFS  TCP & Ethernet  Congestion Loss  Stream Oriented  Software Transport  High CPU overhead  SCSI-FCP, SCSI-SRP  F.C. and Infiniband  Credit Flow Control  Block Oriented  Hardware Transport  Low CPU overhead
  5. 03/12/08 Nuova Systems Inc. Page 5 Storage Networks: Convergence 

    Data Center Ethernet  Choice of congestion classes  Lossy vs lossless  Choice of storage transports  TCP or F.C. (FCOE)  Choice of hardware or software transport  TOE w TCP, software FCOE, ...
  6. 03/12/08 Nuova Systems Inc. Page 6 TCP: Time Out of

    Joint  TCP was standardized in a much slower world  ½ Second minimum retransmit timeout  20 micro-second RTT achievable today!  Fast re-transmit algorithm only works for streams – more data being sent  Most data center traffic is request/response – often single packets  Packet loss hurts because TCP won't (not can't) respond fast enough
  7. 03/12/08 Nuova Systems Inc. Page 7 Congestion in the Data

    Center  Gigantic, non-blocking switches are the norm  Hundreds of ports, terabits of throughput  Buffers and buffer management are the most costly part of the switch  Link based flow control (“pause”) allows switch to push congestion back to its upstream neighbors  If the upstream neighbor is the source server, then the congestion “Goes away”  Or does it?
  8. 03/12/08 Nuova Systems Inc. Page 8 Servers and Gigabits 

    Any current x86 server can easily saturate a 1Gb Ethernet link with TCP traffic  Many current servers can saturate 10Gb Ethernet links!  Lossless classes cause the pipe to fill faster  What happens when the first hop, the server's own Ethernet link, is the point of congestion?
  9. 03/12/08 Nuova Systems Inc. Page 9 TCP and the Fat

    Pipe  If TCP doesn't “see” congestion (loss or ECN) then it will continue to increase its window to try to get more bandwidth in the network  Lossless network => high throughput  But... a single streaming connection will consume all available buffers  Newer connections will have a hard time getting buffers => extreme unfairness  The server needs good congestion management
  10. 03/12/08 Nuova Systems Inc. Page 10 Servers, Ethernet, and Queues

     “Everyone” knows that big, simple FIFO queues are a bad idea in routers  What do servers have today? - big, simple FIFO queues!  The queues are owned and maintained by the Ethernet NIC hardware  Horrible unfairness can be demonstrated with only 2 TCP connections  Many servers deal with 1000s of TCP connections
  11. 03/12/08 Nuova Systems Inc. Page 11 Connection Size vs Throughput

    – idle 1G link 10 100 1,000 10,000 100,000 1,000,000 10,000,000 100,000,000 1,000,000,000 0 100000000 200000000 300000000 400000000 500000000 600000000 700000000 800000000 900000000 1000000000 Throughput
  12. 03/12/08 Nuova Systems Inc. Page 12 10 100 1,000 10,000

    100,000 1,000,000 10,000,000 100,000,000 1,000,000,000 0 50000000 100000000 150000000 200000000 250000000 300000000 350000000 400000000 450000000 500000000 Ideal Actual Connection Size vs Throughput – busy 1G link – competing with a single “hog” connection UNFAIR!
  13. 03/12/08 Nuova Systems Inc. Page 14 TCP: Rock or Hard

    Place?  With lossy Ethernet, TCP bandwidth can collapse due to stupidly high timeouts  => Unpredictable performance  With lossless Ethernet, TCP fairness can collapse due to stupid queuing policies  => Unpredictable performance  Data Center Managers hate unpredictability  Ethernet standards have evolved, TCP needs to catch up  TCP and Ethernet implementations must improve
  14. 03/12/08 Nuova Systems Inc. Page 15 Why does this matter?

     The Earth is being paved by data centers  Google, Microsoft, NSA, Walmart, Facebook, ...  Improving TCP means more overall efficiency in the data center  Heat, CO 2 , and radioactive waste are becoming measurable by-products of TCP inefficiency  Fix TCP => Save the World!