>

TCP: The Reliable Workhorse That Built the Internet

Scott MorrisonNovember 15, 2025 0 views
TCP networking protocols congestion control MPTCP BBR CUBIC flow control network engineering internet protocols protocol history
From its origins solving ARPANET's reliability problem to modern extensions like MPTCP and BBR congestion control, TCP has quietly powered the Internet for four decades. This deep dive explores the three-way handshake, window scaling, congestion control algorithms from Tahoe to CUBIC, and how TCP continues to evolve while facing challenges from newer protocols.

When you load a webpage, stream a video, or send an email, you're relying on a protocol that's been quietly doing its job for over four decades. The Transmission Control Protocol, better known as TCP, is one of those technologies that works so well we barely think about it...until something goes wrong. But beneath its apparent simplicity lies a sophisticated system of reliability mechanisms, flow control, and congestion management that has scaled from a handful of research computers to billions of devices worldwide.

Let's pull back the curtain and explore how TCP actually works, why it was designed the way it was, and how it continues to evolve to meet the demands of modern networks.

The Birth of TCP: Solving the Reliability Problem

In the early 1970s, the fledgling ARPANET faced a fundamental challenge: how do you reliably transfer data between computers when the network itself is unreliable? Packets could arrive out of order, get lost entirely, or become corrupted in transit. The solution needed to work across diverse networks with wildly different characteristics from slow telephone lines to faster local connections.

Enter Vint Cerf and Bob Kahn, who in 1974 published their seminal paper "A Protocol for Packet Network Intercommunication," introducing what would eventually become TCP/IP. Initially, TCP and IP were a single protocol, but by 1978, they recognized the value of separation: IP would handle routing and addressing, while TCP would ensure reliable, ordered delivery.

The first complete specification, RFC 793, was published in September 1981 and remains the foundation of TCP as we know it today. The genius of the design was its simplicity: create a reliable byte stream abstraction on top of an unreliable packet network, and let applications treat network communication as if they were reading from and writing to files.

The Anatomy of a TCP Connection: Three Packets to Say Hello

Before any data can flow, TCP establishes a connection through its famous three-way handshake. This elegant dance ensures both sides are ready and establishes the initial sequence numbers that will track every byte of data.

Here's what actually happens when you open a TCP connection:

Step 1: SYN: The client sends a SYN (synchronize) packet with an initial sequence number (ISN). This ISN isn't zero; it's typically derived from a timer to prevent old packets from being mistaken for new ones if a connection is quickly closed and reopened.

Step 2: SYN-ACK: The server responds with its own SYN packet (with its own ISN) and acknowledges the client's sequence number by adding one to it. This is the ACK part of SYN-ACK.

Step 3: ACK: The client acknowledges the server's sequence number. Now both sides have agreed on starting sequence numbers and the connection is established.

Why the three-way handshake and not just two packets? The third packet prevents a critical problem: if delayed SYN packets from old connection attempts were to arrive at a server, a two-way handshake would create ghost connections. The three-way handshake ensures both sides explicitly agree on the current connection.

The TCP Header: Every Byte Counts

A TCP header is typically 20 bytes (though it can be larger with options), and every bit is carefully allocated:

  • Source and Destination Ports (4 bytes total): These 16-bit numbers identify the sending and receiving applications, enabling a single IP address to host thousands of simultaneous connections.
  • Sequence Number (4 bytes): Identifies the first byte of data in this segment. With 32 bits, TCP can track over 4 billion bytes before sequence numbers wrap around.
  • Acknowledgment Number (4 bytes): When the ACK flag is set, this tells the sender what byte the receiver expects next. TCP acknowledgments are cumulative acknowledging byte N means "I've received everything up to byte N-1."
  • Data Offset (4 bits): Specifies the TCP header length in 32-bit words. This is needed because the Options field has variable length.
  • Flags (9 bits in modern TCP, originally 6): Control bits like SYN, ACK, FIN, RST, PSH, and URG that manage connection state and data handling. Modern TCP added ECN-related flags (ECE, CWR) and the NS flag for additional signaling.
  • Window Size (2 bytes): Arguably one of the most important fields, more on this shortly.
  • Checksum (2 bytes): Covers the header, data, and a pseudo-header with IP addresses to ensure data integrity end-to-end.
  • Urgent Pointer (2 bytes): Rarely used today, this indicated where urgent data ended when the URG flag was set.
  • Options (variable, up to 40 bytes): This extensibility mechanism has been crucial for TCP's evolution, enabling features like window scaling, timestamps, and selective acknowledgments.

Window Size: TCP's Traffic Control System

The window size field is TCP's primary mechanism for flow control preventing a fast sender from overwhelming a slow receiver. But there's more happening here than meets the eye.

The receiver advertises how many bytes it can currently accept by setting the window size in every ACK packet. If the receiver's buffer is filling up, it can advertise a smaller window, slowing the sender down. If it has plenty of room, it can advertise a larger window, speeding things up.

Here's the catch: the original window size field is only 16 bits, limiting the maximum window to 65,535 bytes. In 1981, that seemed like plenty. But on modern high-speed, high-latency networks (think satellite links or cross-continental fiber), this becomes a severe bottleneck.

Consider a connection with 100ms of round-trip time (RTT). With a 64KB window, the maximum throughput is limited to 64KB / 0.1s = 640KB/s, or roughly 5 megabits per second. No matter how fast your network connection is, TCP can't go faster because it's waiting for acknowledgments before sending more data.

The solution came in RFC 1323 (1992) with window scaling. During the three-way handshake, both sides can negotiate a scale factor (0-14) that's applied to the window field, effectively multiplying the maximum window size by up to 16,384. With window scaling, modern TCP can advertise windows of up to 1 gigabyte.

But window size isn't just about receiver buffers, it interacts critically with congestion control. The actual sending window is the minimum of the receiver's advertised window and the sender's congestion window (more on this next). This is TCP's balancing act: send as fast as the receiver can handle AND as fast as the network can handle, whichever is slower.

Congestion Control: TCP's Most Sophisticated Feature

Here's a problem that wasn't obvious in TCP's early days: what happens when too many senders try to use the network simultaneously? In 1986, the Internet experienced its first "congestion collapse" throughput dropped to a tiny fraction of capacity as routers became overwhelmed, dropping packets, causing retransmissions, which caused more congestion, in a vicious cycle.

Van Jacobson's 1988 paper "Congestion Avoidance and Control" introduced the algorithms that saved the Internet and remain at the heart of TCP today. The key insight: packet loss is a signal of network congestion, and senders need to back off when they detect it.

TCP Tahoe: The Original

Tahoe introduced four interrelated mechanisms:

Slow Start: Despite the name, it's actually quite aggressive. The sender starts with a small congestion window (cwnd), typically 1-10 MSS (Maximum Segment Size). For each ACK received, cwnd increases by 1 MSS, causing exponential growth. Send 1 packet, get 1 ACK, send 2 packets, get 2 ACKs, send 4 packets, and so on. This continues until reaching a threshold (ssthresh) or detecting packet loss.

Congestion Avoidance: Once cwnd reaches ssthresh, TCP enters congestion avoidance mode, increasing cwnd more slowly by roughly 1 MSS per RTT regardless of how many ACKs arrive. This linear increase probes for available bandwidth more cautiously.

Fast Retransmit: Instead of waiting for a retransmission timeout (which can be quite long), if the sender receives three duplicate ACKs for the same sequence number, it immediately retransmits the presumably lost packet.

Fast Recovery: When packet loss is detected via duplicate ACKs (not a timeout), Tahoe sets ssthresh to half of cwnd, resets cwnd to 1, and re-enters slow start.

TCP Reno: A Gentler Response

Reno, introduced in 1990, refined fast recovery. The insight was that receiving duplicate ACKs means packets are still flowing, so the congestion isn't as severe as a timeout would indicate. Instead of resetting cwnd to 1, Reno:

  1. Sets ssthresh to half of cwnd
  2. Sets cwnd to ssthresh (not 1)
  3. Enters congestion avoidance mode directly

This "multiplicative decrease, additive increase" (AIMD) approach became TCP's signature: cut the sending rate in half when detecting congestion, then increase it slowly. Over time, this causes TCP flows to converge toward fair bandwidth sharing.

TCP New Reno: Handling Multiple Losses

New Reno addressed a weakness in Reno: when multiple packets are lost in a window, Reno would exit fast recovery after the first retransmitted packet was acknowledged, only to immediately re-enter fast recovery when discovering another loss. New Reno stays in fast recovery until all data outstanding at the time of the first loss has been acknowledged, handling multiple losses in a single window more efficiently.

SACK: Selective Acknowledgments

Standard TCP acknowledgments are cumulative: "I've received everything up to byte N." If packets 1, 2, 4, and 5 arrive but packet 3 is lost, the receiver can only ACK up to packet 2. The sender doesn't know if packets 4 and 5 arrived, so it might unnecessarily retransmit them.

SACK (Selective Acknowledgment), defined in RFC 2018, allows receivers to acknowledge non-contiguous blocks of data. Now the receiver can say "I have bytes 1-2 and 4-5, but I'm missing 3." The sender can retransmit just packet 3, improving efficiency significantly on lossy networks.

TCP CUBIC: Optimizing for High-Speed Networks

By the 2000s, it was clear that standard TCP's AIMD approach wasn't optimal for high-bandwidth, high-latency networks. CUBIC, developed in 2005 and now the default in Linux, takes a different approach.

Instead of increasing cwnd by a fixed amount per RTT, CUBIC uses a cubic function that:

  1. After a loss, drops cwnd by a multiplicative factor (typically 0.7, less aggressive than Reno's 0.5)
  2. Increases cwnd rapidly at first, then slows as it approaches the window size where loss occurred (Wmax)
  3. After exceeding Wmax, increases slowly to probe for additional bandwidth
  4. Crucially, the growth function depends on time since the last loss, not RTT

This makes CUBIC more fair across flows with different RTTs and more efficient at utilizing high-bandwidth links.

BBR: Rethinking Congestion Control

In 2016, Google introduced BBR (Bottleneck Bandwidth and Round-trip propagation time), a paradigm shift in congestion control. Rather than reacting to packet loss, BBR actively measures the bottleneck bandwidth and minimum RTT to determine the optimal sending rate.

BBR cycles through four states:

  • Startup: Exponentially probe for bandwidth
  • Drain: Reduce the queue created during startup
  • ProbeBW: Oscillate around the estimated bandwidth to adapt to changes
  • ProbeRTT: Periodically reduce cwnd to measure minimum RTT

BBR can achieve higher throughput and lower latency than loss-based algorithms, especially on networks with bufferbloat (excessive buffering). However, its deployment has been cautious due to fairness concerns when competing with traditional loss-based TCPs.

Multipath TCP: One Connection, Multiple Paths

What if your device has both Wi-Fi and cellular connections? Standard TCP must choose one path and stick with it for the connection's lifetime. Enter Multipath TCP (MPTCP), standardized in RFC 6824 (2013) and significantly revised in RFC 8684 (2020).

MPTCP allows a single TCP connection to use multiple paths simultaneously, with several compelling benefits:

Resilience: If one path fails (say, you walk out of Wi-Fi range), the connection seamlessly continues on another path without disruption.

Throughput: By aggregating bandwidth across multiple paths, MPTCP can significantly increase throughput. A device with both Wi-Fi and LTE can use both simultaneously.

Efficiency: MPTCP can intelligently shift traffic to the most efficient path or balance load across paths.

Under the hood, MPTCP establishes multiple TCP subflows, regular TCP connections that appear normal to middleboxes, but coordinates them at the endpoints. A scheduler decides which data to send on which subflow, and the protocol handles reordering and reassembly at the receiving end.

Apple has deployed MPTCP in iOS since 2013 for Siri traffic, using it to seamlessly failover between Wi-Fi and cellular. More recently, it's being explored for improving performance of cloud services and mobile applications.

The challenge with MPTCP is that it must be deployed at both endpoints. While it's designed to gracefully fall back to regular TCP when the other end doesn't support it, widespread adoption has been slower than hoped. Nonetheless, it represents an important evolution in making TCP more robust for the mobile, multi-connected world.

TCP Extensions: Continuous Evolution

TCP's option field has enabled numerous extensions over the decades:

Timestamps (RFC 1323): Enable more accurate RTT measurements and protect against sequence number wraparound at very high speeds (PAWS - Protection Against Wrapped Sequences).

TCP Fast Open (RFC 7413): Allows data to be sent in the initial SYN packet using a cryptographic cookie, eliminating one RTT from connection establishment for subsequent connections.

ECN (Explicit Congestion Notification, RFC 3168): Routers can mark packets experiencing congestion rather than dropping them, giving TCP an early warning signal before loss occurs.

TCP Authentication Option (TCP-AO, RFC 5925): Provides cryptographic authentication of TCP segments, protecting against spoofing and injection attacks. It replaced the older TCP MD5 signature option.

Accurate ECN (AccECN): A recent proposal to provide more granular congestion feedback, counting not just whether congestion occurred but its severity.

The Challenges Ahead

Despite its remarkable success, TCP faces challenges in the modern Internet:

Head-of-Line Blocking: TCP's strict ordering guarantee means a single lost packet blocks delivery of all subsequent data, even if that data has arrived. HTTP/2 multiplexing made this worse by running multiple request/response streams over a single TCP connection. QUIC, built on UDP, addresses this with stream-level ordering.

Ossification: TCP is so ubiquitous that middleboxes (firewalls, load balancers, NATs) make assumptions about it, making it difficult to deploy new TCP options. Many networks silently strip unknown options, hindering innovation.

Connection Setup Overhead: The three-way handshake and TLS handshake together add 2-3 RTTs before application data flows. TCP Fast Open helps, but protocols like QUIC have pushed further with 0-RTT connection establishment.

UDP Competition: Many applications now use UDP-based protocols (QUIC, WebRTC, game protocols) to gain more control and avoid TCP's limitations. This shifts complexity from the kernel to application space.

The Legacy and Future

TCP is a testament to good protocol design. Its core mechanisms: sequence numbers, acknowledgments, windowing, retransmission, have remained essentially unchanged for 40+ years, yet it has evolved through careful extensions to handle network speeds and conditions its creators could never have imagined.

The protocol embodies key engineering principles: simplicity in core design, extensibility through options, end-to-end design philosophy (intelligence at the endpoints, not in the network), and conservative behavior (being gentle with the network, even at the cost of individual throughput).

While newer protocols like QUIC are gaining ground for specific use cases, TCP remains the foundation of most Internet communication. Every web page, email, file transfer, and database connection likely relies on TCP. It's a quiet workhorse that scaled from 56 kbps modems to 400 Gbps data center links, from four IMPs to billions of devices.

That's the real genius of TCP: not that it was perfect from the start, but that it was good enough to succeed, and flexible enough to evolve. In networking, as in biology, the ability to adapt is often more important than initial perfection.