dslreports logo
site
spacer

spacer
 
   
spc
story category
Why You're Having Problems Reaching DSLReports (and Other Sites)
by Karl Bode 02:53PM Wednesday Aug 13 2014 Tipped by sortofageek See Profile
If you've been having problems accessing DSLReports.com and a flood of other websites this week, you're not alone. The problem, as it turns out, was experienced by tier-one and last mile ISPs alike across much of North America. The cause? Border Gateway Protocol (BGP) routing tables this week finally became too large for some top-level Internet routers to handle, resulting in those routers being unable to manage the Internet's traffic.

Our hosting provider LiquidWeb was similarly impacted, and discussed the problems on Twitter (most users should no longer be having problems). A blog post over at Renesys offers some excellent detail into what occurred, calling this "more of an annoyance than a real Internet-wide threat":
quote:
There was minor consternation in Internet engineering circles today, as the number of IPv4 networks worldwide briefly touched another magic “power of 2″ size limit. As it turns out, 512K (524,288 to be exact, or 2-to-the-19th power) is the maximum number of routes supported by the default TCAM configuration on certain aging hardware platforms.
Some additional excellent explanations can be found at BGPMon, which notes that Verizon quite unintentionally caused the ripple effect across un-upgraded routers:
quote:
Looking at the our data we quickly see that the new prefixes being announced at that time were almost all originated by the Verizon Autonomous systems 701 and 705. All of the new routing entries appear to be more specific announcements for their larger aggregate blocks. For example BGPmon detected 170 more specific /24 routes for the larger 72.69.0.0/16 block.

So whatever happened internally at Verizon caused aggregation for these prefixes to fail which resulted in the introduction of thousands of new /24 routes into the global routing table. This caused the routing table to temporarily reach 515,000 prefixes and that caused issues for older Cisco routers.
The problem isn't insurmountable: older Cisco gear for example can have the 512,000 route limitation increased. It's worth noting that some engineers saw this problem coming back in May.

view:
topics flat nest 

telcodad
Premium
join:2011-09-16
Lincroft, NJ
kudos:15

1 recommendation

ZDNet article on this problem

Internet hiccups today? You're not alone. Here's why
Summary: It's not just you. Many Internet providers have been having trouble as they run into long expected (but not adequately prepared for) routing table problems.
By Steven J. Vaughan-Nichols, ZDNet - August 12, 2014
»www.zdnet.com/internet-hiccups-t···0032566/

stvnbrs
Premium
join:2009-03-17
Cary, NC
kudos:5

1 recommendation

Re: ZDNet article on this problem

If only they had enough money to upgrade their aging platforms... sound like it is time for rate hikes
--
No trees were harmed by this posting, but a large number of electrons were terribly inconvenienced.

fiosultimate

join:2014-06-09
San Antonio, TX

Y2K14

Apocalypse. !!!!!! It has begun

telcodad
Premium
join:2011-09-16
Lincroft, NJ
kudos:15

Re: Y2K14

said by fiosultimate:

Apocalypse. !!!!!!

Actually, it's the i-Apocalypse.

Coelispex

join:2013-06-03
Wilmette, IL

Re: Y2K14

Dad jokes :P

OhHell

@66.249.83.x
Well that explains why my Comcast supplied IP address failed connection to dslreports earlier this week. I had to go thru a proxy to connect. DNS resolved dslreports correctly but the return path just wouldn'the work.

telcodad
Premium
join:2011-09-16
Lincroft, NJ
kudos:15

1 edit

2 recommendations

The "Echoes of Y2K"

In today's Wall Street Journal:

Echoes of Y2K: Engineers Buzz That Internet Is Outgrowing Its Gear
Routers That Send Data Online Could Become Overloaded as Number of Internet Routes Hits '512K'
By Drew FitzGerald, The Wall Street Journal - August 13, 2014
»online.wsj.com/articles/y2k-meet···07937617

EDIT: aka "512KDay":

512KDay: How the internet BROKE (Next time, big biz, listen to your network admin)
We failed the internet's management challenge
By Trevor Pott, The Register - August 13, 2014
»www.theregister.co.uk/2014/08/13···to_play/

y2k14

@24.243.21.x

apocalypse

It has begun.......

DaveDude
No Fear

join:1999-09-01
New Jersey
kudos:1

1 recommendation

Re: apocalypse

i am sure Nostradamus predicted it.

battleop

join:2005-09-28
00000

1 recommendation

What a huge pain in the ass....

I either adjusted the TCAM or dropped back from full tables to default+connected on all the gear I'm responsible for when this issue first showed up on NANOG. Even with all of our stuff being prepared for this I'm still hearing tons of people bitch us out because other people's shit is broken far upstream from us.
--
I do not, have not, and will not work for AT&T/Comcast/Verizon/Charter or similar sized company.
BlueC

join:2009-11-26
Minneapolis, MN
kudos:1

Re: What a huge pain in the ass....

Right. This isn't an issue for us, but I recall seeing others discussing this months back. It was an inevitable issue and it seems a number of networks dropped the ball on being prepared.

battleop

join:2005-09-28
00000

Re: What a huge pain in the ass....

I was expecting some rogue nations to get together and split their aggregated announcements into /24s to run the size of the routing table up and disrupt traffic.
--
I do not, have not, and will not work for AT&T/Comcast/Verizon/Charter or similar sized company.

hello123454
Premium
join:2002-02-02
Wilmington, DE
kudos:1

2 recommendations

Proactive much?

So they saw this issue back in May? Why didn't they see this issue back in 2000? You can't make this stuff up.

woody7
Premium
join:2000-10-13
Torrance, CA

Re: Proactive much?


Rogue Wolf
Mourns the Loss of lilhurricane

join:2003-08-12
Troy, NY

3 recommendations

These days, "proactive" means "solving the problem while it's smoking but before it explodes".
--
I may have been born yesterday, but I've spent all afternoon downtown.

zpm

join:2009-03-23
Columbus, GA

1 recommendation

Typical

Good Job Verizon.......
HeadSpinning
MNSi Internet

join:2005-05-29
Windsor, ON
kudos:5

Re: Typical

said by zpm:

Good Job Verizon.......

Really, we were so close to the 512k limit that it would have been hit eventually - Verizon just tripped it earlier. Net effect would have been the same. Sooner or later, older Cisco routers would have barfed unless they'd been adjusted to handle the larger table.
--
MNSi Internet - »www.mnsi.net
rebus9

join:2002-03-26
Tampa Bay
Reviews:
·Verizon FiOS
·Bright House

Re: Typical

said by HeadSpinning:

said by zpm:

Good Job Verizon.......

Really, we were so close to the 512k limit that it would have been hit eventually - Verizon just tripped it earlier. Net effect would have been the same. Sooner or later, older Cisco routers would have barfed unless they'd been adjusted to handle the larger table.

IMO it was a good thing. It brought more attention to the problem most people don't know about, and that many in-the-know have tried to ignore. This is a very real problem.

Thankfully this was only a dressed rehearsal. If this had been an actual emergency, you wouldn't be able to read this post.

Some day soon, it WILL be TCAM exhaustion from that many REAL routes, and not a temporary de-aggregation screwup.

jlivingood
Premium,VIP
join:2007-10-28
Philadelphia, PA
kudos:2
You can't blame Verizon -- it's more people that knew this was coming and didn't upgrade their gear.
--
JL
Comcast

PA23

join:2001-12-12
East Hanover, NJ

1 recommendation

Lack of maintance

There is a work around available but the work around requires a restart. Unfortunately many places don't allow for preemptive maintenance just reactive. I remember work at place that was running code that was years out of date with known bugs that had been fixed. These bugs would occasionally cause havoc in our network, but could I perform a software upgrade??? NO! the business could never agree upon a time to allow the upgrade.
--
It's the end of the world as we know it, and I feel fine

tubbynet
reminds me of the danse russe
Premium,MVM
join:2008-01-16
Chandler, AZ
kudos:1

Re: Lack of maintance

said by PA23:

There is a work around available but the work around requires a restart. Unfortunately many places don't allow for preemptive maintenance just reactive.

the larger issue at hand is this *only* affects the providers/carriers/customers that are taking a full dfz (default-free zone) on their gear. generally -- as a carrier -- you have multiple edges/peering points where you can engineer traffic during slow periods in your network. using bgp it is entirely possible to divert traffic away from the edge that you're restarting using policy.

as a hosting provider/customer -- the *only* reason you would take a dfz is if you're multihoming your internet edge -- and even then its only if you have multiple carriers at the edge (i.e. if you dual-home to the same provider using diverse paths -- they'll send you local a/s plus a 0/0).

this has been approaching for a while -- and i've seen an uptick in traffic on [c-nsp] about prefix allocation in tcam, etc for pfc3xl-based boxen. the sad fact is that the reason this happened was that people were either (a) not aware of the hardware limitations of the platform or (b) thought they'd be safe by running on the razor's edge. either way -- its a self-inflicted problem -- especially because if you overflow the tcam on a sup720 -- you can watch the additional prefixes be switched in software -- and if the traffic loads are high enough -- you can see the box puke all over itself and roll around in it.

q.
--
"...if I in my north room dance naked, grotesquely before my mirror waving my shirt round my head and singing softly to myself..."

Cheeze_It

@73.181.84.x

The platform is running on a Sup720-3BXL that had this issue...certain of it

Certain aging Cisco 6500/7600 platforms with the Sup720-3BXL platforms that have been in production since the late 1990s....that could just be upgraded to a Sup 2T and none of this would have happened.

Oh wait...that would mean for them to spend money on infrastructure.

Can't have that...now can we...

tubbynet
reminds me of the danse russe
Premium,MVM
join:2008-01-16
Chandler, AZ
kudos:1

Re: The platform is running on a Sup720-3BXL that had this issue...certain of it

said by Cheeze_It :

Certain aging Cisco 6500/7600 platforms with the Sup720-3BXL platforms that have been in production since the late 1990s....that could just be upgraded to a Sup 2T and none of this would have happened.

except of course -- to try and prove your point -- you neglected to include the vs-720 and the rsp720 -- each of which were introduced in mid-2007 and suffer from the same issue. in fact -- the s2t was released mid-2011 and has potential drawbacks that could prevent this card from being adopted in each carriers network.

that being said -- if i have an edge/core box that is handling services correctly and the infrastructure needs (i.e. correct number of 1/10g interfaces) are met by the s720 -- why would i need to upgrade a box that still has life left in it? upgrading for upgrading's sake is a poor way to spend your capex dollars.

q.
--
"...if I in my north room dance naked, grotesquely before my mirror waving my shirt round my head and singing softly to myself..."

NormanS
I gave her time to steal my mind away
Premium,MVM
join:2001-02-14
San Jose, CA
kudos:12
Reviews:
·SONIC.NET
·Pacific Bell - SBC

1 recommendation

said by Cheeze_It :

Oh wait...that would mean for them to spend money on infrastructure.

Can't have that...now can we...

Of course not! It is much more important to spend the money to buy off regulation!
--
Norman
~Oh Lord, why have you come
~To Konnyu, with the Lion and the Drum

novaflare
The Dragon Was Here
Premium
join:2002-01-24
Barberton, OH

Was wondering what was going on.

i used a web proxy to get here while it was screwed up

CosmicDebri
Still looking for intelligent life

join:2001-09-01
Port Saint Lucie, FL

I just waited

I just waited for the site to connect. I knew it would come back unless Justin didn't pay the hosting bill.....
--
Follow Your Bliss -- Joseph Cambell
I reject your Reality and substitute my own! -- Adam Savage, Mythbuster

firephoto
We the people
Premium
join:2003-03-18
Brewster, WA

2 recommendations

Profit vs infrastructure

Making money is more important than upgrading aging hardware that is so costly it's probably been paid for 100 times over it's life so far. And these are the literally the pillars of the internet, imagine how old those ISP edge routers that the shills like to cry about their costs all the time.

It's always the money.
--
Say no to those that ‘inadvertently make false representations’.
Cobra11M

join:2010-12-23
Mineral Wells, TX

1 recommendation

Re: Profit vs infrastructure

they would prob save a fortune just replacing them with newer equipment in the long run.. especially on electricity!
biochemistry
Premium
join:2003-05-09
92361

1 recommendation

CDN

I was about to ask Verizon to host the DSLReports CDN.

KrK
Heavy Artillery For The Little Guy
Premium
join:2000-01-17
Tulsa, OK

Verizon crashed the Interwebz

Seriously though, this was a problem that was known to be coming.

Still caught people with their pants down, however.
WhatNow
Premium
join:2009-05-06
Charlotte, NC

1 recommendation

Re: Verizon crashed the Interwebz

Normal for most American business. The bigger they get the less they do because there are more people in charge to make a decision.

cork1958
Cork
Premium
join:2000-02-26

1 recommendation

Re: Verizon crashed the Interwebz

said by WhatNow:

Normal for most American business. The bigger they get the less they do because there are more people in charge to make a decision.

Isn't that just amazing?!

I noticed that way back as a young kid even!
--
The Firefox alternative.
»www.mozilla.org/projects/seamonkey/
tmc8080

join:2004-04-24
Brooklyn, NY

ipv6

ipv6 was supposed to be fully integrated into (tier-1 to isp side/edge) dns routers by now..

so much for the Skynet theory..
AVonGauss
Premium
join:2007-11-01
Boynton Beach, FL

1 recommendation

Re: ipv6

IPv6 was not affected by this event per se, though that didn't help you reaching this site as it still hasn't been IPv6 enabled....

KA0OUV
Premium
join:2010-02-17
Jefferson City, MO

The Exaflood has begun

Well, it finally arrived.....

»Nemertes Still Pushing Exaflood Nonsense

»The Exaflood Myth Just Won't Die

Johna Till Johnson was right.....?

whfsdude
Premium
join:2003-04-05
Washington, DC
Reviews:
·Comcast

Only Going to Get Worse

This is only going to get worse as networks get smaller address blocks due to IPv4 exhaustion.

Solution: Freakin' add IPv6 to the site already and a large number of your user base won't come in over v4.
floydb1982

join:2004-08-25
Kent, WA

2 recommendations

Simply upgrade to IPv6

Upgrade to IPv6 and the problem is solved.
YDC

join:2007-11-13
Hewlett, NY

Re: Simply upgrade to IPv6

IPv6 is NOT ready to be deployed, even though it is currently running alongside the IPv4 networks now. Take a good look at how it works, then cringe! It will cause more headaches than a broken IPv4 network does now. The whole thing needs a revamp BEFORE it goes mainstream. That means now..
Madtown
Premium
join:2008-04-26
Madera, CA

Is this why my router went out

Is this why my router been going out as of lately? My modem stays online as I check the status page to make sure, but my router goes out.
InvalidError

join:2008-02-03
kudos:5

In other words...

Aggregate your netblocks and do not spread fragmented netblocks around.

The *RINs should be increasing incentives and pressure for everyone who owns multiple netblocks to aggregate as much as they can - the address space needs a big defrag.

Obviously, that only works as long as whoever owns those blocks advertise them in few large lumps.