>

BGP: The Terrible Protocol That Routes the Entire Internet

Scott MorrisonNovember 15, 2025 0 views
BGP routing protocols RPKI BGPSec route hijacking ASN RIR Internet infrastructure network security routing security
Border Gateway Protocol is fundamentally broken, built on trust in an era of nation-state attacks, and has caused countless Internet outages through hijacks, leaks, and misconfigurations. Yet replacing it is impossible, so we're stuck incrementally patching it with RPKI, better filtering, and hope, because BGP is terrible but absolutely essential to keeping the Internet running.

Border Gateway Protocol is arguably the most critical protocol on the Internet and simultaneously one of the worst designed. It's a routing protocol built on trust in an era when the Internet was a small academic network where everyone knew everyone. That same protocol, essentially unchanged in its trust assumptions, now routes traffic for billions of users and trillions of dollars in commerce. BGP has caused massive outages, enabled nation-state attacks, and regularly leaks routes that disrupt global connectivity. Yet we're stuck with it, because replacing the Internet's routing protocol is like replacing the foundation of a skyscraper while people are living in it. So instead, we're doing what humans do best: we're patching it, bolting on security features, and hoping the duct tape holds.

Let's explore how BGP actually works, why it's so fundamentally broken, the disasters it's caused, and the incremental improvements (RPKI, BGPSec) that are our only realistic path forward.

The Internet Numbers: RIRs and Resource Allocation

Before we can understand BGP, we need to understand how Internet resources (IP addresses and Autonomous System Numbers) are allocated. This isn't a technical free-for-all, there's a hierarchical system of organizations managing the global pool.

IANA: The Top of the Pyramid

The Internet Assigned Numbers Authority (IANA), operated by ICANN, sits at the top. IANA manages the global pool of IP addresses and ASNs, allocating large blocks to Regional Internet Registries.

The Five RIRs

Five Regional Internet Registries (RIRs) manage resources for different geographic regions:

ARIN (American Registry for Internet Numbers): Covers the United States, Canada, and several Caribbean and North Atlantic islands. Created in 1997, ARIN allocates IPv4 and IPv6 addresses and ASNs to ISPs, enterprises, and organizations in its region.

RIPE NCC (Réseaux IP Européens Network Coordination Centre): Covers Europe, the Middle East, and parts of Central Asia. Founded in 1992, RIPE NCC is the oldest of the RIRs and serves over 20,000 members.

APNIC (Asia-Pacific Network Information Centre): Covers Asia-Pacific region, serving the fastest-growing Internet regions including China, India, Australia, and Southeast Asia.

LACNIC (Latin America and Caribbean Network Information Centre): Covers most of Latin America and the Caribbean.

AFRINIC (African Network Information Centre): Covers the African continent.

Each RIR maintains a registry of IP address allocations and ASN assignments in their region. They have different policies (determined by their communities) for how resources are allocated, though they coordinate to maintain consistency.

ASNs: Your Routing Identity

An Autonomous System Number (ASN) identifies a collection of IP networks under a single administrative control with a common routing policy. Think of it as your routing identity on the Internet. ASNs come in two flavors:

16-bit ASNs: Numbers 1-65,535, the original format. ASN 64,512-65,534 are private (like RFC 1918 addresses).

32-bit ASNs: Introduced in 2007 (RFC 4893) when 16-bit ASNs were running out, providing 4.2 billion possible numbers (though the usable range is smaller).

Getting an ASN requires justification, typically that you're multihomed (connected to multiple upstream providers) and need an independent routing policy. You don't get an ASN just to look cool, though the status of having your own ASN is admittedly pretty cool.

IP Address Allocation

RIRs allocate IP addresses to Local Internet Registries (LIRs), typically ISPs, who then assign them to end users. The process involves:

  1. Needs Assessment: You must justify why you need the addresses (current usage, growth plans)
  2. Allocation: For LIRs getting large blocks to redistribute
  3. Assignment: For end users who will use the addresses themselves

With IPv4 exhaustion (ARIN ran out of available IPv4 in 2015, RIPE in 2019), getting new IPv4 space is difficult. Most regions now only allocate from waiting lists or recovered addresses. IPv6, despite having effectively infinite addresses, has seen slower adoption than anyone hoped.

This registry system is important because RPKI (coming later) builds on it. The RIRs are the root of trust, they're the authoritative source for "who owns what."

BGP: How Internet Routing Actually Works

BGP is the protocol that makes the Internet work as an interconnected network of autonomous systems rather than isolated islands. Every time you access a website, stream a video, or send a message, BGP determined the path your packets took.

The Trust Problem

Here's BGP's fundamental problem: it's based on trust. When an AS announces "I can reach 192.0.2.0/24," BGP believes it. There's no authentication, no verification, no cryptographic proof. If an AS lies (intentionally or accidentally), BGP propagates that lie across the Internet.

This made sense in 1989 when BGP-4 (RFC 1771) was designed. The Internet was small, everyone knew each other, there was little incentive to lie. Today, with thousands of autonomous systems, nation-state actors, criminals, and honest mistakes, this trust model is catastrophically broken. But we're stuck with it.

The Protocol Basics

BGP is a path-vector protocol. Unlike distance-vector protocols (like RIP) that share distance to destinations, or link-state protocols (like OSPF) that share topology maps, BGP shares entire paths (sequences of ASNs).

BGP Sessions: BGP runs over TCP port 179. Two BGP routers (peers) establish a TCP connection and exchange routing information. BGP sessions are manually configured, you don't automatically peer with everyone.

eBGP vs iBGP: External BGP (eBGP) runs between different autonomous systems. Internal BGP (iBGP) runs within an AS to distribute external routes. They follow different rules (iBGP doesn't modify the AS path, for instance).

The BGP Update Message: When routes change, BGP sends UPDATE messages containing:

  • NLRI (Network Layer Reachability Information): The prefix being announced (192.0.2.0/24)
  • AS_PATH: The sequence of ASNs the route has traversed (65001 65002 65003)
  • NEXT_HOP: The IP address to send packets to for this prefix
  • Attributes: Additional metadata for route selection

Route Selection: When BGP receives multiple paths to the same destination, it uses a complex decision process:

  1. Highest Weight (Cisco-specific, local only)
  2. Highest Local Preference (prefer certain paths)
  3. Locally Originated Routes
  4. Shortest AS_PATH (fewest ASN hops)
  5. Lowest Origin Type (IGP < EGP < Incomplete)
  6. Lowest MED (Multi-Exit Discriminator)
  7. eBGP over iBGP
  8. Lowest IGP cost to next hop
  9. Oldest route (for stability)
  10. Lowest Router ID

This selection process is deterministic but complex, and different implementations may vary slightly.

Filtering and Policy

BGP's power comes from policy-based routing. Network operators can:

  • Accept or reject routes based on prefix, AS_PATH, or other attributes
  • Modify attributes to influence selection (prepend your ASN to make paths less attractive)
  • Aggregate routes to reduce table size
  • Set communities (tags) for policy signaling

This flexibility is essential for traffic engineering, but it also means misconfigurations can have global impact.

The Route Propagation Problem

Here's a key BGP characteristic: BGP only propagates the best path. If an AS has three paths to a destination, it only advertises its best one to peers. This means if that best path fails, there's a convergence delay while everyone recomputes and propagates new best paths.

This is efficient (smaller routing tables, less churn) but slow. BGP convergence can take minutes after a topology change, during which packets may be dropped or loop.

The 768k Day: When the Internet Almost Broke

In August 2014, the Internet had a near-miss that highlighted BGP's fragility: the global BGP routing table crossed 512,000 routes, breaking thousands of routers worldwide.

The problem wasn't BGP itself but the hardware implementing it. Many routers used TCAMs (Ternary Content Addressable Memory) for fast routing lookups. TCAMs are expensive and limited in size. Cisco 6500 series routers, extremely common in ISP networks, had 512k TCAM entries for their default configuration.

When the routing table exceeded this limit:

  • Routers ran out of TCAM space
  • Routes overflowed into slower memory
  • Performance degraded massively or routers crashed
  • Parts of the Internet became unreachable or severely slowed

The fix involved:

  • Upgrading router memory modules
  • Changing memory allocation between IPv4 and IPv6 (most routers had plenty of unused IPv6 TCAM space)
  • More aggressive route aggregation and filtering

The incident was dubbed "512k day" (some documentation shows it as "768k day" when considering the full table size). It demonstrated that the Internet's growth could be limited not by protocol design but by hardware economics. Today's routing table has over 950,000 IPv4 routes, and we're approaching similar limits on newer hardware generations.

The deeper problem: BGP has no mechanism to control routing table growth. Any AS can announce any prefix length (subject to filtering), and the global table just grows. Efforts to encourage aggregation are voluntary and often conflict with traffic engineering needs.

BGP Hijacks: A Greatest Hits Collection

BGP's trust-based model has enabled countless hijacks, where an AS announces prefixes it doesn't own. Sometimes these are accidents, sometimes malicious. Here are the highlights:

The AS7007 Incident (1997): The Original Internet Meltdown

In April 1997, AS7007 (a small ISP called MAI Network Services, later bought by Sprint) accidentally announced thousands of specific routes to the entire Internet, including routes for networks it had no connection to.

A configuration error caused the AS to originate:

  • More specific prefixes than the legitimate owners advertised
  • Routes through AS7007 for major parts of the Internet

BGP prefers more specific routes (192.0.2.0/25 beats 192.0.2.0/24), so AS7007's announcements were preferred globally. Traffic destined for huge portions of the Internet was redirected to AS7007, which couldn't handle it. Their small network was overwhelmed, and much of the Internet became unreachable.

The incident lasted several hours while Sprint (AS7007's upstream) identified and fixed the problem. It was the first major demonstration of BGP's fragility and introduced the term "BGP hijack" to the networking vocabulary.

Lessons learned: None, apparently. Similar incidents have happened dozens of times since.

Pakistan Telecom vs. YouTube (2008): Censorship Gone Global

In February 2008, Pakistan's government ordered ISPs to block YouTube nationwide. Pakistan Telecom (AS17557) implemented this by announcing a more specific route for YouTube's prefix (208.65.153.0/24) to a null route.

So far, so good for censorship, this would work within Pakistan. But Pakistan Telecom accidentally (or incompetently) announced this route to their upstream provider, PCCW (AS3491), which accepted it and propagated it globally.

For about two hours, YouTube was unreachable worldwide because BGP preferred Pakistan's more-specific route, which went to a blackhole. YouTube was down globally because of one country's censorship attempt.

YouTube's response was to announce even more specific routes (208.65.153.0/25 and 208.65.153.128/25), which BGP preferred over Pakistan's /24. This is the "longest-prefix match wins" principle being weaponized for defense.

The incident demonstrated:

  • How easily BGP mistakes can go global
  • How censorship infrastructure can affect others
  • That the only defense against BGP hijacks is to announce more specifics
  • The lack of authentication means anyone can hijack anyone

China Telecom's Wandering Routes (2010, 2014, 2017)

China Telecom has been involved in multiple incidents where they announced thousands of foreign prefixes, routing significant global traffic through China.

In April 2010, China Telecom (AS4134) announced about 50,000 prefixes belonging to others for about 18 minutes. Traffic for major sites, US military networks, and ISPs worldwide was redirected through China before being forwarded to legitimate destinations.

Was it a hijack? A mistake? Intelligence gathering? The traffic did reach its destination (eventually), so it could have been route optimization gone wrong. Or it could have been intentional traffic interception. We'll never know because BGP provides no audit trail.

Similar incidents in 2014 and 2017 showed the pattern continued. Each time, the explanation was "configuration error," but the pattern of repeatedly misconfiguring in ways that route global traffic through your country seems suspect.

The geopolitical implications are serious: BGP's lack of authentication means nation-states can redirect traffic for surveillance or disruption, and proving intent is nearly impossible.

Cloudflare/Verizon Leak (June 2019): When Big Networks Mess Up

On June 24, 2019, a small ISP in Pennsylvania (AS396531, Allegheny Technologies) leaked about 20,000 routes to its upstream provider, Verizon (AS701). Verizon accepted these routes and propagated them globally.

This made Verizon claim it had better paths to prefixes actually owned by Cloudflare, Amazon, Facebook, and many others. Traffic was redirected through Verizon, where it hit capacity limits and was dropped. Major parts of the Internet were unreachable for about two hours.

The problem wasn't the small ISP's leak (mistakes happen), it was that Verizon, a Tier 1 ISP, accepted and propagated obviously bogus routes. Verizon should have filtered these routes based on:

  • The size of their customer (a tiny ISP shouldn't be announcing routes for Cloudflare)
  • Resource Public Key Infrastructure (RPKI) validation (more on this shortly)
  • Basic sanity checking

Instead, Verizon's lack of filtering turned a small mistake into a global outage. This incident particularly infuriated the networking community because RPKI could have prevented it, and Verizon was ignoring available security mechanisms.

Hurricane Electric's Fat-Finger (2020)

In July 2020, Hurricane Electric (AS6939), a major transit provider, accidentally announced about 20,000 routes they shouldn't have. The routes were accepted by peers and propagated globally, causing widespread outages for about an hour.

The interesting part: many of these routes had valid RPKI Route Origin Authorizations (ROAs) pointing to the legitimate owners, yet networks still accepted Hurricane Electric's invalid announcements. This showed that even with security infrastructure in place, many networks weren't using it.

The Pattern: Trust, Accidents, and No Accountability

Every BGP incident follows the same pattern:

  1. Someone announces routes they shouldn't (accident or malice)
  2. Peers accept these routes without verification
  3. Routes propagate globally within minutes
  4. Traffic is disrupted or redirected
  5. Hours later, someone notices and fixes it
  6. Retrospectives blame "misconfiguration"
  7. Nothing fundamentally changes

BGP has caused more Internet outages than any other single protocol, yet we keep using it because there's no alternative.

The Old Ways: Prefix Lists and Hope

Before RPKI, the only BGP security was manual filtering:

Prefix Filters: Operators maintained lists of which prefixes their peers should announce. If AS65001 is your customer and owns 192.0.2.0/24, you configure a filter allowing only that prefix (and perhaps more specifics). Anything else gets dropped.

This works but:

  • It's entirely manual (someone must update filters when customers get new space)
  • It doesn't scale (large ISPs have thousands of customers)
  • It only protects your immediate peers (downstream problems still propagate)
  • It's error-prone (misconfigurations are common)

AS_PATH Filters: Filtering based on AS path patterns. For instance, you might reject routes with your own ASN in the path (prevents loops) or very long AS paths (likely bogus).

Route Registries (IRR): Organizations like RADB maintain databases of routing policy. Operators register their prefixes and routing policies, and others can query these to build filters.

IRR had severe problems:

  • No strong authentication (anyone could register almost anything)
  • Often out of date (no requirement to maintain records)
  • Inconsistent between registries
  • Many networks didn't use it

Bogon Lists: Lists of IP ranges that should never appear in BGP (RFC 1918 private space, unallocated space, etc.). These helped catch obvious mistakes but not subtle hijacks.

All these methods shared a fatal flaw: they were optional and inconsistently applied. A chain is only as strong as its weakest link, and BGP has thousands of weak links.

RPKI: Cryptographically Proving Ownership

Resource Public Key Infrastructure (RPKI), standardized in the mid-2000s and gaining adoption in the 2010s, provides cryptographic proof of IP address and ASN ownership.

How RPKI Works

RPKI is built on the existing RIR hierarchy:

Trust Anchors: Each RIR operates a trust anchor (a root certificate). These are distributed to networks that want to validate BGP routes.

Resource Certificates: The RIR issues certificates to organizations, binding IP prefixes and ASNs to public keys. This certificate chain mirrors the resource allocation hierarchy.

Route Origin Authorizations (ROAs): The resource holder creates ROAs specifying which ASNs are authorized to originate which prefixes. For example, a ROA might say "AS65001 is authorized to originate 192.0.2.0/24 and any more specific prefix up to /26."

Validation: When a BGP router receives a route announcement, it checks:

  1. Does a valid ROA exist for this prefix?
  2. Does the origin ASN match the ROA?
  3. Is the announced prefix length within the ROA's maximum length?

If validation passes, the route is marked "Valid." If it fails, it's "Invalid." If no ROA exists, it's "Unknown" (not covered).

Relying Party Software: Routers don't validate RPKI directly. Instead, dedicated servers (validators) download certificates and ROAs from RIRs, validate them, and provide routers with a filtered list of valid ROAs. This separation reduces complexity and improves performance.

The RPKI Decision

With RPKI validation, networks can make policy decisions:

  • Drop Invalid: Reject routes marked Invalid (strong security, potential connectivity issues)
  • Prefer Valid: Prefer Valid routes over Unknown or Invalid in selection
  • Monitor Only: Log Invalid routes but accept them (learning mode)

The networking community is gradually moving toward "Drop Invalid" as RPKI adoption increases.

RPKI's Limitations

RPKI isn't perfect:

Only Validates Origin: RPKI only verifies that the originating AS is authorized to announce the prefix. It doesn't validate the AS path or detect route leaks (where an AS announces routes from a peer to another peer, violating policy).

Deployment Lag: As of 2025, RPKI adoption is improving but incomplete. Many networks still don't create ROAs, and many don't validate BGP announcements against RPKI.

Operational Complexity: Creating ROAs, managing certificates, and operating validators adds complexity. Mistakes can make your own prefixes Invalid.

Trust in RIRs: RPKI requires trusting the RIR hierarchy. If an RIR is compromised or malicious, they could issue fraudulent certificates.

No Protection Against Leaks: AS65001 might be authorized to announce your prefix (because they're your transit provider), but they shouldn't announce it to everyone. RPKI can't detect this.

Despite these limitations, RPKI is a massive improvement over "trust everyone." It's probably prevented thousands of hijacks we never heard about because attackers couldn't get valid ROAs.

BGPSec: The Path Validation We Need But Can't Deploy

RPKI solves origin validation, but what about path validation? That's where BGPSec comes in, and where BGP security efforts largely stall out.

BGPSec (RFC 8205, 2017) provides cryptographic validation of the entire AS path:

Path Signatures: Each AS along the path signs the route announcement, including the next-hop ASN. The signature chain proves that AS1 sent it to AS2, AS2 sent it to AS3, and so on.

Tamper-Proof Paths: With BGPSec, you can't fake an AS path or inject your AS into a path. This prevents sophisticated hijacks where attackers claim to have shorter paths or transit through specific ASes.

Resource Certificates: BGPSec uses the same RPKI infrastructure, so it builds on existing deployment.

Why BGPSec Has Failed

Despite being standardized for years, BGPSec deployment is essentially zero. The problems:

Performance: Each BGP update requires cryptographic verification of potentially dozens of signatures. At Internet scale, this is computationally expensive. Modern routers struggle to handle full BGP table churn while validating every signature.

Incremental Deployment: BGPSec only provides security if the entire path uses it. If even one AS in the path doesn't support BGPSec, the security guarantees break down. This "all or nothing" property makes gradual deployment nearly impossible.

Memory Requirements: Storing signatures for hundreds of thousands of routes requires significant memory. Many routers can't handle it without expensive upgrades.

Little Clear Benefit: RPKI origin validation catches most hijacks. The incremental security benefit of path validation doesn't justify the cost for most operators.

Network Policies: BGP's flexibility in manipulating paths (prepending, communities, etc.) can conflict with BGPSec's requirement for accurate path signatures.

BGPSec is technically elegant and would significantly improve BGP security. But it's a perfect example of letting perfect be the enemy of good. The networking community has largely concluded that RPKI (with maybe some additional leak prevention) is good enough, and BGPSec's deployment barriers are insurmountable.

Some researchers are exploring "BGPSec-lite" variants with reduced cryptographic overhead, but even these face deployment challenges.

ASPA and Leak Prevention: The Next Frontier

Since BGPSec isn't happening, the focus has shifted to preventing route leaks:

ASPA (Autonomous System Provider Authorization): A new RPKI object type that specifies provider relationships. It lets you say "AS65001 is my provider, if you see me announcing their routes to others, that's a leak."

ASPA provides:

  • Detection of route leaks (AS announcing routes from one provider to another)
  • Validation that paths respect business relationships
  • Better than nothing, not as good as BGPSec

Early deployment is showing promise, and it's simpler than BGPSec while addressing the route leak problem that RPKI origin validation misses.

The Uncomfortable Truth About BGP

Let's be honest: BGP is terrible. It's a routing protocol designed for a few dozen academic networks that now handles billions of users, enables global surveillance, and regularly breaks the Internet. Its security model is "trust everyone," which would be laughable if it weren't so critical.

We've spent decades trying to fix it:

  • RPKI helps but doesn't solve everything
  • BGPSec is too expensive to deploy
  • ASPA might help with leaks
  • Manual filtering works but is inconsistent

None of these are complete solutions. We're applying band-aids to a protocol that needs major surgery.

Why don't we replace it?

Because you can't. BGP is the foundation of Internet routing. Replacing it requires:

  • Every router on the Internet to upgrade
  • Protocols to interoperate during transition
  • Years of testing and gradual deployment
  • Cooperation from thousands of autonomous organizations
  • No disruption to the global Internet during transition

It's not technically impossible, just practically impossible. The Internet is too big, too critical, and too distributed to replace a core protocol. We're locked in.

The path forward is incremental:

  • Deploy RPKI and drop Invalid routes (happening slowly)
  • Implement better filtering practices (preached constantly, practiced inconsistently)
  • Deploy ASPA for leak prevention (early days)
  • Improve monitoring and rapid response to incidents
  • Maybe, someday, get networks to care about security over convenience

BGP will continue causing outages, enabling hijacks, and making network engineers curse. We'll keep patching it, because the alternative, replacing it, is even worse. This is the Internet we've built, held together with BGP, duct tape, and hope.

Living With BGP's Flaws

BGP is fundamentally a trust-based protocol operating in a zero-trust world. Every hijack, every leak, every outage proves this. Yet the Internet keeps running, mostly, because:

  1. Most networks are honest: Accidents are more common than attacks
  2. Economic incentives: Breaking the Internet is bad for business
  3. Monitoring improves: Detection of anomalies is faster
  4. Security adoption grows: RPKI deployment is accelerating
  5. Best practices spread: After enough disasters, people learn

BGP's security model is "eventual consistency enforced by embarrassment." Someone hijacks routes, the networking community notices, calls them out on mailing lists and Twitter, and they fix it (usually). It's not great, but it's what we have.

The next major BGP disaster is inevitable. It might be an accident (fat-fingered configuration), an attack (nation-state surveillance or sabotage), or a vulnerability we haven't discovered yet. When it happens, we'll patch BGP a bit more, improve monitoring, and move on.

Because that's all we can do. BGP is terrible, BGP is essential, and BGP isn't going anywhere. Welcome to the Internet.