dslreports logo
 
    All Forums Hot Topics Gallery
spc
Search similar:


uniqs
1024
lilarry
Premium Member
join:2010-04-06

1 edit

lilarry

Premium Member

[Voip.ms] Voip.ms New York Servers Down

All New York Servers down once again - portal too! I've been begging them to get the heck out of Internap. Maybe this time?
taoman
Premium Member
join:2013-09-13
Seattle, WA

taoman

Premium Member

Re: Voip.ms New York Servers Down

said by lilarry:

All New York Servers down once again - portal too! I've been begging them to get the heck out of Internap. Maybe this time?

And I just ported in today. Portal is down for me but my SIP server (Seattle2) is still up.

crazyk4952
Premium Member
join:2002-02-04
united state
Ubiquiti EdgeRouter Lite
Ubiquiti UniFi AP-LR
Polycom VVX300

crazyk4952 to lilarry

Premium Member

to lilarry
From their twitter account:

New York Data Center issues, Website not accessible, we're working on it with Internap.

— VoIP.ms (@voipms) December 19, 2014
lilarry
Premium Member
join:2010-04-06

lilarry

Premium Member

Outbound calls on working servers are failing too - as are some inbound calls. This is a big one.
taoman
Premium Member
join:2013-09-13
Seattle, WA

taoman to lilarry

Premium Member

to lilarry
Yep. Inbound calls are now all busy and outbound calls are dead air........
GusHerb
join:2011-11-04
Chicago, IL

GusHerb to lilarry

Member

to lilarry
Well this sucks, I'm on the Chicago server and I can only dial toll free calls, but incoming is ok. And I just ported in...God dammit.
mts
join:2000-10-06
Lansing, MI

mts

Member

So now I'm curious... do all outbound calls route through New York on the backend somehow?

mackey
Premium Member
join:2007-08-20

mackey

Premium Member

said by mts:

So now I'm curious... do all outbound calls route through New York on the backend somehow?

No, I'm in Los Angeles and the latency on local calls isn't enough for it to be going to NY and back. My guess is something backend (DNS? CDR accounting?) tries to contact the NY servers.

/M
lilarry
Premium Member
join:2010-04-06

lilarry to mts

Premium Member

to mts
said by mts:

So now I'm curious... do all outbound calls route through New York on the backend somehow?

Just speculating, but I believe the main database and CDR's are at Internap, thus even if calls don't route through New York, data entries need to be made there.
lilarry

lilarry to taoman

Premium Member

to taoman
said by taoman:

And I just ported in today.

No worries. This looks like a bad one, but Voip.ms is usually pretty good.
lilarry

lilarry

Premium Member

From what I'm seeing on Voip.ms Twitter feed, it looks like Voip.ms may be having some difficulty working with Internap tonight.

Aside from my angst as a powerless reseller with way too many angry customers screaming at me just now, one of the things that is really bugging me about this is that for weeks I've been repeatedly reporting to Voip.ms issues with the New York servers and the portal - via tickets and live chat. I even posted some portal issues here in the past couple of weeks. Pathping tests a couple of weeks ago showed 100% packet loss at Voxel (Internap) on the last hop before the portal server. I'm troubled that while support staff acknowledges my reports, I have no idea what they do with that info. And while I don't know whether the issues I've related to them have anything to do with tonight's fiasco, I do know that Internap genuinely sucks - and I know THEY know that Internap sucks. It is absolutely time for them to get the heck out of there. I wonder if or when they'll take action and move to someplace (anyplace) more reliable.
lilarry

lilarry

Premium Member

Re: [Voip.ms] Voip.ms New York Servers Down

Servers appear back up at 2330 eastern - but outbound calls are still taking 60 seconds or longer to go through. I think they're switching to a mirror site.
GusHerb
join:2011-11-04
Chicago, IL

GusHerb to lilarry

Member

to lilarry
Outbound calls are still not working here.
Mango
Use DMZ and you get a kick in the dick.
Premium Member
join:2008-12-25
www.toao.net

2 edits

1 recommendation

Mango

Premium Member

Thirty-three PoPs and they don't even have proper outbound failover?

EDIT: My post is no longer accurate; see below.
PX Eliezer1
Premium Member
join:2013-03-10
Zubrowka USA

PX Eliezer1

Premium Member

And when I hear something that sounds like "Enter Nap" of course I will go to sleep.

That's how Locutus defeated the Borg, with an "Enter Nap" command.
MartinM
VoIP.ms
Premium Member
join:2008-07-21

1 recommendation

MartinM to lilarry

Premium Member

to lilarry
Dad jokes aside,

We're aware of the issue and we're many guys working on that internap mess.

There was one design flaw that caused some pops to fail regardless. We're fixing that and all pop's should be working properly in a few minutes regardless of new York issues.
MartinM

MartinM to Mango

Premium Member

to Mango
said by Mango:

Thirty-three PoPs and they don't even have proper outbound failover?

Yes Mango. Strike us while we are down. Just kidding. We're up and running.
GusHerb
join:2011-11-04
Chicago, IL

GusHerb

Member

I see that, I can call out again! Does any of those fixes you guys are working on involve fixing the part where those of us with POP's that didn't fail still couldn't call out?
MartinM
VoIP.ms
Premium Member
join:2008-07-21

MartinM

Premium Member

said by GusHerb:

Does any of those fixes you guys are working on involve fixing the fact that those of us with POP's that didn't fail still couldn't call out?

Indeed, it exposed major flaws, that will be fixed tomorrow. Each Geographical data centres should never be affected by an individual outage. This will be addressed with a whip if necessary. Let's say some of us are really pissed, pardon my language.
Mango
Use DMZ and you get a kick in the dick.
Premium Member
join:2008-12-25
www.toao.net

Mango

Premium Member

said by MartinM:

This will be addressed with a whip if necessary.

I know you are not amused with the situation right now, but your above quote made me laugh!
lilarry
Premium Member
join:2010-04-06

lilarry

Premium Member

said by Mango:

I know you are not amused with the situation right now, but your above quote made me laugh!

I found myself chuckling too - and as anyone reading this thread can tell, I'm not necessarily in the greatest mood either.
VoIP2Go
join:2013-12-14

VoIP2Go to MartinM

Member

to MartinM
said by MartinM:

Indeed, it exposed major flaws, that will be fixed tomorrow

As usual, you guys handled this flaw.....well, flawlessly. I'm glad it happened as I know your platform will be improved as a result.
PX Eliezer1
Premium Member
join:2013-03-10
Zubrowka USA

PX Eliezer1 to lilarry

Premium Member

to lilarry
At least this happened after the end of the East Coast workday, and before the weekend, and before Christmas.

-----

In a few hours everyone can sit down with a Labatt Blue, Anchor Steam Beer, or Montejo.
MartinM
VoIP.ms
Premium Member
join:2008-07-21

5 recommendations

MartinM

Premium Member

said by PX Eliezer1:

At least this happened after the end of the East Coast workday, and before the weekend, and before Christmas

Indeed. It wasn't a busy night, well, for us it was.

---

I've waited a bit to post this, to post in a calm manner once the storm is over.

Let's say that a series of unfortunate events led to this interruption of service.

- A core router, in LGA6 Internap DataCenter located in New York went down. This is not a piece of equipment we have control over.

- www2-mirror.voip.ms, which is an independent replica, hosted in Chicago, in case New York goes down didn't take over immediately. The DNS took longer to update than expected. We'll be addressing that Monday, in a meeting with the technical staff. This website should take over in a matter of minutes when the main goes down. It's an exact live replica of our whole system. It was deployed years ago, and its hardware regularly updated and kept up to date, to take over for events just like that. The website was eventually up and running on its mirror location. It has served well many times in the past, specifically during the Sandy Storm or a few times when Internap took a nap. (Bad, intended pun)

- Regarding other geographical locations that experienced long call delays: Our programmers found a deprecated piece of code that was in place to increase security with our customer accounts. Without going into specifics, this was deprecated in favor of a completely independent "Per-pop" system to ensure that each individual pop doesn't have another point of failure other than itself. Some servers still did use a connection to our old system, located in New York, which was down, preventing outgoing calls.

We just spent the whole night with the programming team to ensure no traces are left of this code and that each POP is now fully independent. We'll continue refining and conducting emergency exercises on test-pops next week. We've have many servers that we use to conduct emergency test procedures.

- We've moved traffic back to our Main Website in New York, but let's say we'll start moving away all of our core infrastructure out of Internap in January. The datacenter is in Chicago and have had zero uptime in years. (Websites, Wiki, Tickets, Various Databases).

As for the New York POP's, we're actively looking for a replacement of Internap. Their Voxel days when they were flawless are long gone.

On behalf of the whole team, I truly apologize. Internap's Data CEnter failure should have resulted in a simple, quick relocation of our website to our mirror site, POP redirection and never should have had any kind if impact to other geo-locations. As always, we'll use this incident as free education for all of our staff, including management, and we'll conduct more emergency exercices to reduce events like this to at most, minutes of downtime, not an hour.

Regards,
lilarry
Premium Member
join:2010-04-06

lilarry

Premium Member

Thank you Martin for taking the time to elaborate on this. It means a lot. I know you guys have had a long night. This is one of many reasons we route so much of our traffic through VoIP.ms
iamhere
join:2013-01-26
canada

iamhere to MartinM

Member

to MartinM
said by MartinM:

On behalf of the whole team, I truly apologize. Internap's Data CEnter failure should have resulted in a simple, quick relocation of our website to our mirror site, POP redirection and never should have had any kind if impact to other geo-locations. As always, we'll use this incident as free education for all of our staff, including management, and we'll conduct more emergency exercices to reduce events like this to at most, minutes of downtime, not an hour.

It's nice to see companies actually take responsibility when things go wrong. I know "stuff" happens to everybody but not everybody would explain the issue(s) in this kind of detail in a very public forum.

Kudos to you and your team!
Mango
Use DMZ and you get a kick in the dick.
Premium Member
join:2008-12-25
www.toao.net

Mango to MartinM

Premium Member

to MartinM
said by MartinM:

Their Voxel days when they were flawless are long gone.

It is so frustrating when suppliers used to be awesome. Thanks for the post.
PX Eliezer1
Premium Member
join:2013-03-10
Zubrowka USA

PX Eliezer1

Premium Member

said by Mango:

It is so frustrating when suppliers used to be awesome.

Internap has had issues.

Unrelated UPS Failures Cause Three NYC Outages for Internap

In an unfortunate series of unrelated equipment failures, Internap recently experienced three outages at its Manhattan data centers in one week’s time.

The May 16 outage at 111 8th Avenue we reported on earlier was followed by two outages of the hosting service provider’s data center at 75 Broad Street. All three were caused by component failures in uninterruptible power supply systems....

»www.datacenterknowledge. ··· nternap/

Maybe Internap needs some generators.

-----

Kudos indeed to Voip.MS' MartinM for the detailed explanations and analysis during a very hard day's night.
NefCanuck
join:2007-06-26
Mississauga, ON

NefCanuck to lilarry

Member

to lilarry
I think this serves as a reminder that as flawless as we want everything to work in our lives, that a lot of things can (and sometimes do) fail outside of anyone's control.

Kudos to Martin and the staff at voip.ms for being able to deal with this as quickly as you did.

NefCanuck
voxframe
join:2010-08-02

4 recommendations

voxframe to lilarry

Member

to lilarry
Kudos Martin and the team.

Martin I know this is the wrong time to bring this up, and I feel it's akin to kicking someone while they are down, but I'm hoping this will be a bit of a wakeup call for the team to start working more on redundancy and failovers. DNS redirection is not an acceptable solution as it takes a long time to propagate, let alone I've seen tons of ATAs and PBX software that seems to completely ignore DNS TTL and such. Granted that's not how it's supposed to be, it still is.

I'm glad this nasty piece of code was located, so each server can operate independently should a large portion of the network go down. But there's gotta be something more done for failover in cases like this. I know I'm told many times when I bring this up that "voip.ms offers a robust network with many servers etc etc", essentially "We don't need failover since our network is so strong". It's gotta get looked into guys, and a little faster than the current priorities have it set at.

I'd really like to see automatic failover using tools such as DNS SRV, as well as maybe primary and backup servers for DIDs etc. It's not much use having DNS SRV if you still can't migrate your DID to a different server. Plus you lose voicemail etc in the dance.

I realize this is like asking to completely rebuild the infrastructure, but it's the "Achilles Heel" of the voip.ms network right now.