>

QUIC: When Google Got Tired of Waiting for the IETF to Fix TCP's 40-Year-Old Mistakes

Scott MorrisonDecember 29, 2025 0 views
QUIC HTTP3 TCP UDP Google CloudFront Akamai head-of-line blocking protocol design CDN
When Google watched HTTP/2's multiplexing get destroyed by TCP head-of-line blocking, they built an entirely new transport protocol over UDP, deployed it to billions of users before anyone could object, and forced the entire internet to adopt it. Now HTTP/3 powers over a third of websites and TCP's 40-year reign is finally facing real competition from a protocol everyone said would never work.

QUIC: When Google Got Tired of Waiting for the IETF to Fix TCP's 40-Year-Old Mistakes

Let's talk about QUIC. When Google looked at HTTP/2 multiplexing over TCP and watched a single lost packet block every stream on the connection, they did what Google does best: they said "we're going to fix this ourselves" and built an entirely new transport protocol. Then they convinced the IETF to standardize it, got every major browser and CDN to implement it, and now QUIC powers over a third of the internet's websites, all before Google had a chance to pull their signature move and deprecate it. This is the story of how UDP, the protocol everyone said was only good for video games and DNS, became the foundation for HTTP/3. Spoiler alert: it involves a lot of very smart people solving problems that shouldn't have existed in the first place, and Google basically forcing the entire internet to do things their way because they could.

HTTP/0.9: When the entire protocol fit in a tweet (back when tweets were 140 characters)

Tim Berners-Lee's original 1991 protocol achieved radical simplicity by eliminating everything unnecessary for hypertext document retrieval, and also eliminating several things that turned out to be extremely necessary. The complete protocol specification fits in a single sentence: send GET /path<CRLF>, receive raw document bytes until the server closes the TCP connection. No headers, no status codes, no way to know if what you got back was the document you wanted or an error message formatted as HTML. It was beautiful in its naivety, like a child's drawing of a car that's just a rectangle with circles.

The wire format that changed the world (for better or worse)

The HTTP/0.9 request consists of exactly three components with no optional elements:

47 45 54 20 2F 69 6E 64 65 78 2E 68 74 6D 6C 0D 0A
G  E  T  SP /  i  n  d  e  x  .  h  t  m  l  CR LF

That's it. Twenty-seven characters that launched the World Wide Web. No headers, no status codes, no Content-Type, no nothing. The W3C specification notes that "a well-behaved server will not require the carriage return character", meaning servers should accept a single LF (0x0A) as the line terminator because apparently even CRLF was too much to ask for consistency. This is like saying "a well-behaved human will not require punctuation when speaking" and then wondering why nobody understands each other.

The request URI must be a single word with no spaces, and the full URL scheme and host are omitted because they were "just used to make the connection." Translation: we already know who you are because you literally just connected to us via TCP. The response contained only the raw document content, no status line, no headers, no Content-Length. This design choice meant there was literally no way to distinguish a successful response from an error except by parsing the HTML content itself. Error messages were simply returned as human-readable HTML text. Good luck writing a client that could handle this programmatically.

Why connection close as message delimiter was actually genius (until it wasn't)

The decision to signal response completion by closing the TCP connection was driven by two factors. First, without headers, there was no mechanism to specify Content-Length. Second, the protocol's stateless design meant servers freed all resources immediately after each response. The W3C specification explicitly stated: "The message is terminated by the closing of the connection by the server." This is the networking equivalent of hanging up the phone to signal you're done talking.

This imposed a hard constraint on HTTP/1.0: request bodies required Content-Length because closing the connection from the client side would prevent receiving any response. The original CERN httpd implementation used straightforward socket programming, reading into a fixed buffer until EOF, with a 15-second inactivity timeout. Simple, elegant, and completely unsustainable once people realized they wanted to load more than one resource per page. Turns out images exist.

HTTP/1.0: Headers arrive and everything gets complicated (welcome to the 90s)

RFC 1945, published in May 1996, formalized what browsers and servers had already implemented. The addition of headers transformed HTTP from a document retrieval protocol into an extensible application protocol. It also introduced the concept of status codes, which meant servers could finally tell you they were returning an error without making you parse HTML to figure it out. Progress!

Complete request/response format with actual structure

A complete HTTP/1.0 exchange with wire format looks like:

GET /pub/WWW/TheProject.html HTTP/1.0
User-Agent: NCSA_Mosaic/2.0 (Windows 3.1)
Accept: text/html, text/plain, image/gif
[blank line, CRLF CRLF]

The response introduces the Status-Line with its 3-digit status code:

HTTP/1.0 200 OK
Date: Tue, 15 Nov 1994 08:12:31 GMT
Server: CERN/3.0 libwww/2.17
Content-Type: text/html
Content-Length: 42
[blank line]
<HTML>A page with an image</HTML>

Look at that, metadata! Headers! The ability to know what kind of content you're receiving before you try to parse it! Revolutionary stuff. It only took five years to figure out that maybe we should include some basic information about what we're sending.

The keep-alive extension that broke the rules (and nobody cared because it was faster)

Starting in late 1995, browser and server vendors implemented an unofficial extension to avoid the TCP connection-per-request overhead. The mechanism required explicit negotiation in both directions:

# Request
GET /page.html HTTP/1.0
Connection: keep-alive

# Response (if server supports)
HTTP/1.0 200 OK
Connection: keep-alive
Keep-Alive: timeout=5, max=100
Content-Length: 1234

Without keep-alive, loading a page with 10 images required 11 separate TCP connections, each incurring a 3-way handshake. That's 33 packets just to establish connections, not counting the actual data transfer. The Keep-Alive header parameters specified timeout (minimum seconds to keep connection open) and max (maximum requests before forcing close). Critically, this was never standardized in RFC 1945 and appeared only as an "additional feature" in Appendix D. But everyone implemented it anyway because the alternative was insane, and sometimes "works in practice" beats "works in theory."

Virtual hosting's unsolvable problem (that HTTP/1.1 had to fix because IPv4 addresses aren't infinite)

HTTP/1.0's most significant architectural flaw was the optional Host header. Without it, servers receiving requests couldn't determine which virtual host to serve when multiple domains shared a single IP address. Apache's documentation explicitly warned: "Old HTTP/1.0 clients do not send such a header and Apache has no clue what vhost the client tried to reach."

This limitation drove IPv4 address exhaustion as shared hosting required dedicated IPs per domain until HTTP/1.1 mandated the Host header. So if you've ever wondered why we ran out of IPv4 addresses faster than we should have, you can partially blame HTTP/1.0. Thanks, Tim. We needed IPv6 anyway, right?

HTTP/1.1: Persistent connections and pipelining's spectacular failure (a lesson in why specs don't guarantee reality)

RFC 2616 (June 1999, later revised as RFCs 7230-7235 in 2014) fundamentally changed HTTP's connection semantics and introduced features that wouldn't be fully utilized until HTTP/2. It also introduced pipelining, which looked great on paper and failed spectacularly in practice, teaching an entire generation of engineers that middleboxes ruin everything.

Persistent connections become the default (finally, someone used their brain)

HTTP/1.1 inverted the connection model: persistence became default, closure became explicit. Per RFC 2616 §8.1.2, connections remain open unless either party sends Connection: close. This eliminated negotiation overhead but required all messages to have self-defined length through Content-Length or chunked encoding. You know, basic things like "tell the other side how much data you're sending" that should have been there from the start.

Server implementations typically enforced idle timeouts, Apache httpd defaulted to 15 seconds in versions 1.3/2.0, reduced to 5 seconds in 2.2+ to conserve resources. Because apparently keeping connections open forever was a bad idea. Who knew? (Everyone. Everyone knew.)

Why pipelining failed despite being standardized (a cautionary tale about trusting the spec)

Pipelining allowed clients to send multiple requests without waiting for responses, theoretically filling the TCP pipe. RFC 2616 §8.1.2.2 specified that servers MUST return responses in request order. This ordering requirement created head-of-line blocking at the application layer: a slow first response blocked all subsequent responses on that connection. Let me repeat that: we tried to solve the performance problem of serial requests by allowing parallel requests that still had to come back serially. Brilliant.

The browser implementation history reveals systematic abandonment:


BrowserStatusKey DetailsFirefoxRemoved in Firefox 54 (June 2017)Used heuristics to detect misbehaving IIS servers (spoiler: most of them)ChromeNever enabled by default"Known crashing bugs and front-of-queue blocking issues" (translation: it doesn't work)Opera (Presto)Only browser with working defaultAbandoned when Opera switched to Chromium (and sanity)IE 8-11Never supported"Concerns regarding buggy proxies" (rare moment of Microsoft wisdom)Safari iOS 5Enabled then disabledKnown bug randomly switching images (cat pictures turning into dog pictures)

The failure stemmed from multiple factors: broken intermediate proxies that corrupted response ordering, legacy servers that ignored HTTP/1.1 pipelining requirements, and the fundamental head-of-line blocking problem that could only be solved by multiplexing within a single connection. So HTTP/1.1 standardized a feature that literally nobody could use reliably. This is why we can't have nice things, and why protocol designers learned to encrypt everything so middleboxes can't mess with it.

Chunked transfer encoding: streaming without knowing the length (actually clever)

When response length is unknown at generation time, chunked encoding provides streaming capability:

HTTP/1.1 200 OK
Transfer-Encoding: chunked

4\r\n        (chunk size = 4 bytes in hex)
Wiki\r\n     (4 bytes of data)
7\r\n        (chunk size = 7 bytes)
pedia i\r\n  (7 bytes of data)
B\r\n        (chunk size = 11 bytes in hex, which is 11 in decimal, obviously)
n \r\nchunks.\r\n
0\r\n        (zero-length terminating chunk, aka "I'm done, stop waiting")
\r\n         (empty line ends message)

The chunk size is specified in hexadecimal, followed by CRLF, then the data bytes, then another CRLF. The zero-length chunk signals body completion, optionally followed by trailer headers for checksums or signatures computed after body generation. It's actually a pretty clever solution to a problem that only exists because we didn't think about streaming when we designed the protocol. Credit where credit is due.

The 100 Continue handshake for large uploads (rare feature that actually works)

The Expect: 100-continue mechanism prevents wasting bandwidth on rejected uploads:

PUT /media/file.mp4 HTTP/1.1
Host: api.example.org
Content-Length: 1073741824
Expect: 100-continue
[client waits before sending 1GB body]

The server can respond with 100 Continue to signal acceptance, or immediately return 401/413/etc. to reject before body transmission. curl automatically sends this header for POST/PUT bodies exceeding 1MB. This is one of those rare features that actually works as designed. Nobody tell the IETF.

Why 6 connections per domain became standard (the domain sharding hack nobody asked for)

RFC 2616's original recommendation of 2 connections per origin proved inadequate as web pages grew resource-heavy. In 2008, Firefox 3 raised the limit to 6 connections per domain, a pragmatic balance between parallelism and server load that all major browsers adopted. This created the domain sharding workaround: distributing resources across img1.example.comimg2.example.com to multiply effective parallelism. HTTP/2's multiplexing finally eliminated this hack, but not before thousands of developers wasted countless hours implementing domain sharding and buying extra domains just to load images faster. The 2000s were weird.

SPDY: Google's proving ground for HTTP/2 (aka "we'll just do it ourselves")

Google announced SPDY in November 2009 with an ambitious target: 50% reduction in page load time. The protocol shipped in Chrome 6 (September 2010) and was deployed across all Google services by January 2011. Because when you're Google, you can just deploy experimental protocols across billions of users and see what happens. Must be nice.

Binary framing replaces text parsing (goodbye grep, hello wireshark)

SPDY introduced a binary framing layer atop TLS with an 8-byte common header. For control frames, the first bit was set to 1, followed by 15 bits for version and 16 bits for frame type. For data frames, the first 32 bits encoded the 31-bit Stream-ID instead. Frame types included SYN_STREAM (opens stream), SYN_REPLY (acknowledges), RST_STREAM (aborts), SETTINGS, PING, GOAWAY (graceful shutdown), HEADERS, and WINDOW_UPDATE. Binary protocols mean you can't debug them with curl and grep anymore, but they're faster, so we traded convenience for performance like we always do.

The CRIME attack that killed zlib compression (and taught us compression + secrets = bad)

SPDY's header compression using zlib achieved 88% reduction in request header size. However, the September 2012 CRIME attack (Compression Ratio Info-leak Made Easy) demonstrated that attacker-controlled data compressed alongside secret cookies leaked information through output size variations. Somebody had to name it CRIME because apparently security researchers have a quota for dramatic acronyms.

The attack worked by injecting guesses into URL paths: shorter compressed output indicated matches with cookie values. Chrome 21 and Firefox 15 disabled SPDY header compression entirely. This vulnerability directly motivated HPACK's design for HTTP/2, using fixed Huffman coding without shared compression contexts vulnerable to oracle attacks. Sometimes security researchers ruin perfectly good optimizations, but in this case, they probably saved us from something worse. Thanks for finding it in our test environments rather than in production, I guess.

Why SPDY became HTTP/2 rather than a standard itself (Google strong-armed the IETF, change my mind)

SPDY served as an experimental testbed, the IETF's first HTTP/2 draft in November 2012 was essentially a copy of SPDY. Key differences in the final HTTP/2 specification include HPACK replacing zlib, 9-byte frame headers, ALPN replacing NPN for protocol negotiation, and dependency-based priority replacing the simple 3-bit field. Google deprecated SPDY after HTTP/2's RFC 7540 publication in May 2015; Chrome 51 removed support in May 2016. SPDY was never meant to be a standard, it was meant to be a threat that forced the IETF to move faster. And it worked.

HTTP/2: Binary framing solves application-layer head-of-line blocking (but not transport-layer, oops)

HTTP/2 (RFC 9113) solved application-layer head-of-line blocking through binary framing and multiplexing. All communication splits into frames with a 9-byte header: 24-bit length, 8-bit type, 8-bit flags, and 31-bit stream identifier. Client-initiated streams use odd IDs (1, 3, 5...), server push uses even IDs. HPACK compression maintains a static table of 61 common headers plus a dynamic table, achieving 30-40% header size reduction.

But HTTP/2's multiplexing runs over a single TCP connection, and here lies the fundamental problem that everyone saw coming but hoped would magically not be a problem: TCP guarantees ordered byte delivery. When TCP packet N is lost, the kernel buffers packets N+1, N+2, N+3 until retransmission succeeds, blocking all HTTP/2 streams simultaneously. At 2% packet loss, HTTP/1.1 with six connections often outperforms HTTP/2's single multiplexed connection. So we spent years building HTTP/2 only to discover that TCP was the bottleneck all along. Shocking. This is why Google decided to build QUIC, because clearly the solution to TCP being in the way is to just not use TCP.

QUIC packet structure enables independent stream delivery (finally, someone fixed it)

QUIC solves TCP's limitations by implementing reliability at the stream level rather than the connection level. Each stream maintains independent sequence ordering, allowing packet loss on Stream 1 to block only Stream 1 while Streams 2, 3, and 4 continue delivering data. It's the solution HTTP/2 should have been if TCP hadn't been in the way. Turns out 40 years of TCP ossification is hard to work around.

Long header packets handle connection establishment (with more bits than you knew you needed)

During handshake, QUIC uses long header packets with this structure:

Long Header Packet {
  Header Form (1) = 1,           // Bit 0: 1 indicates long header
  Fixed Bit (1) = 1,             // Must be 1 for QUIC v1 (for reasons)
  Long Packet Type (2),          // 0x00=Initial, 0x01=0-RTT, 0x02=Handshake, 0x03=Retry
  Type-Specific Bits (4),        // Reserved + packet number length
  Version (32),                  // 0x00000001 for QUIC v1
  DCID Length (8),               // Destination Connection ID length
  Destination Connection ID (0..160),
  SCID Length (8),               // Source Connection ID length  
  Source Connection ID (0..160),
  Type-Specific Payload (..),
}

Initial packets must be at least 1200 bytes (padded if necessary) to prevent amplification attacks. They carry CRYPTO frames containing the TLS ClientHello and include a Token field for address validation. The initial encryption keys derive from the Destination Connection ID using a fixed salt 0x38762cf7f55934b34d179ae6a4c80cadccbb7f0a, notably the SHA-1 collision value discovered by Google researchers. Yes, they used a SHA-1 collision as a constant. I'm sure this made sense to someone at Google who thought "hey, we already computed this hash value, might as well use it."

Short header packets optimize post-handshake communication (because bytes are expensive, apparently)

After handshake completion, 1-RTT packets use the compact short header:

1-RTT Packet {
  Header Form (1) = 0,           // 0 indicates short header
  Fixed Bit (1) = 1,
  Spin Bit (1),                  // Latency measurement (intentionally exposed)
  Reserved Bits (2),             // Protected, must be 0
  Key Phase (1),                 // Key rotation indicator
  Packet Number Length (2),      // 1-4 bytes
  Destination Connection ID (0..160),
  Packet Number (8..32),         // Variable length encoding
  Protected Payload (..),
}

The spin bit is deliberately left unencrypted to enable passive RTT measurement by network operators, a compromise between debuggability and privacy. Because sometimes you need to throw network operators a bone so they don't block your entire protocol. It's like leaving a window slightly open so the landlord doesn't kick the door down.

Variable-length integer encoding maximizes efficiency (bit twiddling at its finest)

QUIC encodes most values using a 2-bit prefix scheme that optimizes for small common values. The prefix 00 means 1 byte (range 0-63), 01 means 2 bytes (range 0-16,383), 10 means 4 bytes (range 0-1,073,741,823), and 11 means 8 bytes (range 0-4,611,686,018,427,387,903). This encoding supports 62-bit stream IDs and offsets while ensuring most values fit in 1-2 bytes. It's actually pretty clever, the kind of optimization that makes you go "huh, that's smart" and then immediately forget how it works.

Stream multiplexing eliminates head-of-line blocking at the transport layer (the whole point of this exercise)

QUIC streams are identified by 62-bit IDs where the two least significant bits encode type: 0x00 for client-initiated bidirectional, 0x01 for server-initiated bidirectional, 0x02 for client-initiated unidirectional, and 0x03 for server-initiated unidirectional. Four types because why have two when you can have four?

Each STREAM frame contains: Stream ID (variable-length integer), Offset (position in byte sequence, present if OFF bit set), Length (if LEN bit set), Data payload, and FIN bit (end of stream marker).

Here's the thing that makes QUIC actually work: the receiver reassembles each stream independently using offset values. When packet carrying Stream 1 data is lost, only Stream 1 delivery blocks, the receiver delivers Stream 2, 3, and 4 data immediately to the application. This is fundamentally different from TCP, where the kernel blocks all data delivery until gaps are filled. This is the entire point of QUIC, and it actually works. Someone should tell the TCP maintainers.

Connection establishment achieves 1-RTT and 0-RTT handshakes (eat your heart out, TCP)

The 1-RTT handshake integrates transport and TLS negotiation (two birds, one round trip)

QUIC combines what traditionally required TCP handshake plus TLS handshake into a single round trip. Flight 1 (Client to Server) sends Initial packet containing CRYPTO frame with TLS ClientHello, key share (X25519 or P-256 public key), and QUIC transport parameters as a TLS extension (0x39), padded to 1200+ bytes because amplification attacks are a thing.

Flight 2 (Server to Client) sends coalesced datagram containing Initial packet (ACK + ServerHello) and Handshake packet (EncryptedExtensions, Certificate, CertificateVerify, Finished). That's a lot of stuff in one flight.

Flight 3 (Client to Server) sends Handshake packet (client Finished) coalesced with 1-RTT application data. The server sends HANDSHAKE_DONE frame to confirm handshake completion. Total: one round trip to first application data, versus two RTT minimum for TCP + TLS 1.3. This is the kind of optimization that makes TCP look embarrassing, like showing up to a race on a bicycle while everyone else has cars.

0-RTT enables instant data transmission for returning clients (fast but dangerous)

When clients have cached session tickets from previous connections, they can send application data immediately. The client sends Initial Packet with CRYPTO frame containing ClientHello + early_data extension, followed by 0-RTT Packet with STREAM frames containing early application data. No waiting, just send.

0-RTT data encrypts using keys derived from the pre-shared key (PSK) stored in the session ticket. Critical security constraint: 0-RTT data lacks forward secrecy and is vulnerable to replay attacks. Applications must only send idempotent requests (GET, HEAD) in 0-RTT. Servers can reject 0-RTT with 425 Too Early response, forcing retry with full handshake. So 0-RTT is fast but you better know what you're doing, like driving without a seatbelt because you're only going to the corner store.

Header protection and encryption prevent ossification (because middleboxes ruin everything)

QUIC encrypts nearly everything to prevent middlebox interference that has ossified TCP evolution. Google found approximately one-third of Internet paths have middleboxes modifying TCP metadata. So QUIC's solution was simple: encrypt everything so middleboxes can't mess with it. Can't modify what you can't see, checkmate middleboxes.

What network observers can see: Packet type (Initial, Handshake, 1-RTT) is visible, QUIC version is visible, Connection IDs are visible, Spin bit is visible (intentional, because network operators demanded something). What they cannot see: Packet numbers are encrypted, Stream IDs are encrypted, All payload data is encrypted. It's like putting everything in locked boxes and only labeling the outside.

Header protection algorithm prevents tracking (cryptography to the rescue)

Packet numbers and header flags are protected using a mask derived from payload ciphertext. The system takes a 16-byte sample from the ciphertext, encrypts it with the header protection key using AES-ECB (for AES-GCM) or ChaCha20 (for ChaCha20-Poly1305), then XORs the result with the header byte and packet number. This makes packet number tracking impossible without cryptographic keys, preventing correlation attacks and protocol ossification. It also makes debugging QUIC connections a nightmare, but that's the price of progress. Good luck explaining to your boss why you can't see what's in the packets anymore.

Connection migration survives network changes (WiFi to cellular and back, seamlessly)

QUIC connections survive IP address changes through connection ID-based routing rather than 4-tuple binding. This is actually useful for mobile devices that switch between WiFi and cellular, which turns out to be something people do constantly.

PATH_CHALLENGE validates new paths (prove you're really there)

When detecting network change (WiFi to cellular transition), clients probe the new path by sending PATH_CHALLENGE with 8 random bytes from the new IP/port. The server responds with PATH_RESPONSE echoing the challenge data. Until path validation completes, the 3x amplification limit applies, servers cannot send more than three times the bytes received, preventing amplification attacks through spoofed source addresses. Because attackers ruin nice things.

Connection ID rotation prevents linkability (privacy through changing identifiers)

Clients must use fresh Connection IDs when migrating to prevent passive observers from correlating activity across network changes. NEW_CONNECTION_ID frames provide additional CIDs with Sequence Number, Retire Prior To value (forces retirement of older CIDs), Length, Connection ID, and Stateless Reset Token (128 bits).

Each connection ID has an associated stateless reset token, enabling endpoints to terminate connections without maintaining state, critical for server restarts and load balancer failover. It's like having multiple fake IDs that you cycle through, except legal and for good reasons.

HTTP/3 maps HTTP semantics to QUIC streams (finally, HTTP over a protocol that makes sense)

HTTP/3 (RFC 9114) uses QUIC's native stream abstraction rather than reimplementing multiplexing at the application layer. Each HTTP request-response pair uses exactly one client-initiated bidirectional QUIC stream. Stream independence means a slow response on Stream 1 cannot block faster responses on Streams 3, 5, and 7. You know, the way HTTP/2 was supposed to work before TCP ruined everything.

Each endpoint creates exactly one control stream (type 0x00) carrying SETTINGS, GOAWAY, and MAX_PUSH_ID frames. SETTINGS must be the first frame sent, because ordering matters when you're establishing control. QPACK compression streams (encoder 0x02, decoder 0x03) handle out-of-order packet delivery, unlike HTTP/2's HPACK which relied on TCP's ordered delivery and promptly fell over when packets arrived out of order.

AWS CloudFront: HTTP/3 at 410+ edge locations (but not to your origin, lol)

Amazon CloudFront launched HTTP/3 support on August 15, 2022 as a general availability feature with no preceding public beta. The implementation uses AWS's open-source s2n-quic library, a Rust-based QUIC implementation designed for production deployment. Rust because apparently C wasn't causing enough arguments in code reviews.

Configuration and protocol negotiation (opt-in because AWS trusts nobody)

HTTP/3 requires explicit enablement, new distributions default to http1.1 because AWS assumes you don't know what you're doing until you prove otherwise. The CloudFormation configuration uses HttpVersion: http2and3 with valid options being http1.1, http2, http3, or http2and3. CloudFront returns the alt-svc: h3=":443"; ma=86400 header to advertise HTTP/3 availability on UDP port 443. The max-age of 86,400 seconds (1 day) is fixed and not customer-configurable, because AWS knows better than you.

As of December 2024, CloudFront added HTTPS DNS record support via Route 53, enabling browsers to discover HTTP/3 during DNS resolution, eliminating the Alt-Svc upgrade round-trip. Only took two years.

Critical limitation: origin connections remain HTTP/1.1 or HTTP/2 (because of course they do)

CloudFront does not use QUIC or HTTP/3 for origin connections, all origin fetches use HTTP/1.1 or HTTP/2 over TCP. This architecture means HTTP/3 benefits only the viewer-to-edge segment. Origin servers require no changes to support HTTP/3 delivery to end users. So HTTP/3 is great for your users but your origin is still stuck with TCP head-of-line blocking. Progress! Well, half progress. AWS probably figures origins are on fast networks anyway, and they're not wrong, but still.

Performance benchmarks and customer results (numbers that aren't lies)

AWS reports up to 10% reduction in Time to First Byte and 15% improvement in page load times. Snapchat documented 10% TTFB reduction and 20% lower latency compared to their previous architecture, specifically citing 0-RTT connection setup benefits. These are real improvements, not marketing fluff. Snapchat engineers actually measured this stuff.

AWS Network Load Balancer: Native QUIC support arrives November 2025 (only took three years after CloudFront)

The most significant AWS load balancing development for HTTP/3 came on November 13, 2025: NLB gained native QUIC passthrough support. Only took them three years after CloudFront got it, but who's counting?

QUIC-LB connection ID routing implementation (the right way to do load balancing)

NLB uses the IETF QUIC-LB draft specification for connection routing. Each target instance registers with an 8-byte hex Server ID. The Server ID is encoded in QUIC connection IDs, allowing the load balancer to route packets to the correct backend even when the client's IP address changes during connection migration. This is actually the right way to do load balancing for QUIC, which is refreshing.

Protocol options and configuration (with strings attached)

NLB supports two QUIC-related protocols: QUIC (standard QUIC over UDP) and TCP_QUIC (combined listener for HTTP/3 + HTTPS fallback on port 443, recommended for web applications because browsers need fallback options).

Critical requirements include: NLB must be configured without security groups (QUIC listeners are incompatible, because reasons), target type must be instance (not IP, also because reasons), and backend servers must implement QUIC-LB compatible connection ID encoding, AWS's s2n-quic library provides this capability. So you need to use specific instance types and implement connection ID encoding. Not exactly plug-and-play. There's always a catch with AWS.

What ALB cannot do (and probably never will because ALB is for HTTP/2 peasants)

Application Load Balancer does not support HTTP/3 as of late 2025, with no public roadmap. ALB's HTTP/2 support (launched 2016) includes end-to-end HTTP/2 to targets and gRPC with unary, client-streaming, server-streaming, and bidirectional modes. For HTTP/3 edge termination with ALB backend, the recommended architecture places CloudFront in front of ALB. So if you want HTTP/3 with ALB, you get to pay for CloudFront too. Thanks, AWS. Your wallet will love this solution.

Akamai: From first gQUIC deployment to HTTP/3 GA (the CDN that actually got it right)

Akamai deployed Google QUIC (gQUIC) in 2016, becoming the first third party after Google itself to run QUIC in production. By 2018, Akamai had "nearly as many QUIC endpoints as Google." The company's Principal Architect Mike Bishop served as editor of the HTTP/3 RFC (RFC 9114). So Akamai didn't just implement QUIC, they helped write the spec. This is like being in the band and also being the sound engineer.

Configuration through Property Manager (actually straightforward for once)

Enabling HTTP/3 requires: TLS 1.3 enabled on the certificate (configured in CPS deployment settings), HTTP/3 behavior added to the property's Default Rule, and HTTP/2 behavior retained for fallback because browsers need options.

Akamai generates the Alt-Svc header automatically with max-age=93600 seconds (customizable via a separate Alt-Svc Header behavior). The platform supports gradual rollout via percentage-of-clients match criteria. This is how you deploy a new protocol: carefully and with escape hatches, not by flipping a switch and hoping for the best.

mPulse real-user monitoring data (actual numbers from actual users, not lab tests)

A documented media customer case study from April 2023 (European football live stream in Latin America) showed: Turnaround time less than 25ms achieved 96.2% with HTTP/3 versus 89.7% with HTTP/2 (6.5 percentage point improvement), and Throughput of 5 Mbps or higher achieved 69% with HTTP/3 versus 56% with HTTP/2 (13 percentage point improvement). Those are real numbers from real users watching real football, not synthetic benchmarks in a lab.

Wix testing demonstrated 33% mean improvement in connection setup time and up to 20% LCP improvement at 75th percentile (approximately 500ms reduction). These are the kind of improvements that make HTTP/3 worth the migration pain. Half a second might not sound like much until you realize users bounce if your site takes too long to load.

Protocol boundaries in Akamai's architecture (same limitation as everyone else, but they wrote the spec)

Like CloudFront, Akamai uses HTTP/3 only for the client-to-edge segment. Internal communication between edge servers, parent servers (Tiered Distribution), and Cloud Wrapper uses TCP-based protocols. Customer origins always receive HTTP/1.1 or HTTP/2. So even Akamai, who helped write the HTTP/3 spec, doesn't trust QUIC for internal communication yet. Maybe they know something we don't.

CDN implementation challenges at planetary scale (or why this is harder than it looks)

Connection ID routing across anycast networks (when IP addresses lie)

QUIC's connection migration, clients changing IP addresses while maintaining session state, fundamentally breaks traditional Layer-4 load balancing that routes using the source IP/port tuple. The IETF QUIC-LB draft provides the solution: encoding server routing information within connection IDs. It's like writing the destination on the package instead of relying on the return address.

Three encoding strategies exist: Plaintext CID (server ID directly embedded, AWS NLB approach, simple but visible), Encrypted CID (server ID + nonce encrypted with shared key, secure but complex), and Obfuscated CID (variable complexity algorithms, for when you want to be fancy).

Meta's mvfst (open-source QUIC implementation) integrates with Katran (XDP-based load balancer) for connection-ID routing, now handling over 75% of Meta's internet traffic via QUIC/HTTP/3. When Meta trusts it enough to handle three-quarters of their traffic, including all those Instagram photos, it's probably production-ready.

Session ticket distribution for 0-RTT across global edges (synchronized chaos)

0-RTT connection resumption requires session tickets encrypted with keys that all edge servers can decrypt. Cloudflare solves this by regenerating and synchronizing session ticket keys hourly across their entire global network, with keys generated one hour ahead to handle clock skew. Servers maintain decryption capability for the past 18 hours, because time zones and network delays are messy.

The challenge is fundamental: atomic retrieve-and-delete operations for single-use tickets are impractical across distributed systems, so CDNs accept some replay risk for performance gains. Security versus performance, as usual. Sometimes you have to choose between perfect security and actually working.

0-RTT replay attack mitigation strategies (because fast isn't always safe)

Since attackers can copy and replay 0-RTT packets without decrypting them, CDNs implement various protections. Cloudflare adds Early-Data: 1 header; origins can return HTTP 425 (Too Early) to force full handshake retry. Akamai restricts 0-RTT to GET requests without query parameters by default; advanced configuration enables selective paths for people who know what they're doing.

TLS 1.3 specifies a 7-day maximum between original connection and 0-RTT attempt, but CDNs typically apply stricter limits. Because if your session ticket is a week old, you're probably not in that much of a hurry anyway. Go get a fresh handshake, it only takes one round trip.

DDoS mitigation with encrypted UDP traffic (the new attack surface nobody wanted)

QUIC traffic encryption renders traditional UDP filtering ineffective. Specific attack patterns include handshake flooding (CPU amplification up to 4.6x versus TCP SYN cookies due to cryptographic computation, because crypto is expensive) and amplification attacks (limited by QUIC's 3x response factor for unverified clients, at least there's that).

Cloudflare discovered and patched a broadcast address amplification vulnerability where packets sent to broadcast IPs triggered responses from multiple server workers, a 128-core system could generate 384 replies per attack packet. Oops. That's what we call a "whoopsie" in the security industry.

Azure DDoS Protection (2024) implements QUIC-specific mitigations: protocol compliance validation, Initial packet verification with SNI checking, and source/destination rate limiting per 4-tuple. Turns out securing UDP-based protocols requires rethinking your entire DDoS strategy. Who could have predicted this? (Everyone. Everyone predicted this.)

Competitive landscape: Cloudflare, Fastly, and Google Cloud (everyone's doing it now)

Cloudflare's open-source advantage (free tier and open source, because Cloudflare understands marketing)

Cloudflare announced HTTP/3 support in September 2019 and enables it on the free tier, the only major CDN offering this capability without premium pricing. Their quiche library (Rust, open-source on GitHub) is used by curl, Mozilla's Firefox QUIC implementation (neqo), and netty. Open source means everyone can use it, which means everyone can find bugs, which means it gets better. Also great marketing.

The 2024 tokio-quiche library handles millions of HTTP/3 requests per second with low latency, powering WARP's MASQUE client. Cloudflare Radar data shows HTTP/3 at approximately 12% of global traffic, with 15 countries exceeding 33% adoption (Georgia leads at 38%, presumably because fast internet is nice). When Cloudflare gives it away for free, you know they're serious about adoption.

Fastly's measured approach (better late than never, we suppose)

Fastly launched HTTP/3 beta in April 2020 with Distinguished Engineer Jana Iyengar (IETF QUIC working group editor) leading development. Like competitors, HTTP/3 applies only to edge connections, origin fetches remain TCP-based. Fastly took their time but at least they got it right eventually.

Google Cloud CDN's integration advantage (they invented it, after all)

Google Cloud CDN benefits from Google's decade of QUIC deployment experience. Google reports 2% reduction in Search latency, 9% reduction in YouTube rebuffer times, and 7% improvement in mobile throughput. HTTP/3 is generally available with a single console toggle. When the people who invented QUIC are running it in production across YouTube and Search, it's probably stable. Probably.

Azure Front Door: premium tier only (because Microsoft loves tiers)

Microsoft Azure Front Door supports HTTP/3 exclusively on the Premium tier, because of course Microsoft puts new features in premium tiers. Azure Application Gateway HTTP/3 remains in private preview, requiring opt-in via email, which is Microsoft's way of saying "we're working on it but we're not ready to commit." Azure DDoS Protection includes QUIC-specific mitigations enabled by default. So Microsoft has QUIC support, but you're going to pay extra for it. Tale as old as time.

The conclusion: QUIC represents protocol evolution at internet scale (and it actually worked, surprisingly)

QUIC's architectural innovations, stream-level reliability, encrypted transport headers, integrated TLS 1.3, and connection migration, address TCP's fundamental limitations while remaining deployable over UDP on today's Internet. The 1-RTT and 0-RTT handshakes eliminate connection establishment latency, while per-stream loss recovery eliminates head-of-line blocking that plagued HTTP/2 over TCP.

Deployment challenges remain significant: CPU overhead roughly doubles (crypto isn't free), UDP blocking affects 3-5% of paths (enterprise firewalls gonna enterprise firewall), and infrastructure tooling lags TCP's 40-year maturity (turns out four decades of tcpdump muscle memory is hard to replace). However, with 36.6% of websites supporting HTTP/3 and major providers seeing measurable latency improvements, QUIC has achieved the critical mass necessary for continued adoption.

The key engineering insight: QUIC succeeds not through incremental TCP optimization but through architectural redesign that places reliability at the stream level rather than the connection level. This enables the protocol to deliver on HTTP/2's multiplexing promise while surviving the packet loss and network transitions that characterize modern mobile networks.

When Google looked at TCP's head-of-line blocking problem and decided to solve it by building an entirely new transport protocol over UDP, the conventional wisdom said it would never work. UDP was for video games and DNS, not serious internet infrastructure. Middleboxes would block it, enterprises would reject it, the IETF would never standardize it, and the internet would continue plodding along with TCP forever. But Google did what Google does best: they built it anyway, deployed it to billions of users without asking permission, proved it worked with actual data from YouTube and Search, and then dared everyone else not to adopt it.

Now HTTP/3 powers over a third of the internet, AWS and Akamai have both implemented it at massive scale (with varying degrees of completeness), every major browser supports it, and TCP's 40-year reign as the internet's reliable transport protocol is finally facing real competition. Turns out sometimes the right solution is to ignore the committee, build the thing yourself, deploy it before anyone can stop you, and let the results speak for themselves. Google basically strong-armed the entire internet into adopting QUIC, and you know what? It was the right call. TCP head-of-line blocking is actually solved now. We did it. We finally fixed a 40-year-old problem, and all it took was Google deciding they were tired of waiting for the IETF to figure it out.