dslreports logo
 story category
CallCentric Suffers National VoIP Outage
Company's aware of outage, but no explanation yet
Users in our VoIP forum indicate that VoIP provider Callcentric has been having a national outage since around 11AM EST, something users say is fairly uncommon for the operator. The company's website has also been up and down for much of the day. "All our engineers and developers were notified about 30 seconds after it started by both internal and external monitoring systems," says the company. "We sincerely apologize for this outage and once we have restored service we will be investigating the cause further." These are the kinds of outages VoIP operators can't afford to have with cable operators consistently gobbling up the lion's share of VoIP customers.
view:
topics flat nest 

OOPS
@comcast.net

OOPS

Anon

Callcentric Down in Miami Florida

More than two hours passed since outage and TA can't register Callcentric.
cyclone_z
join:2006-06-19
Ames, IA

cyclone_z

Member

DDoS?

Is it a denial of service attack?

From what I've heard cable VoIP isn't much better in the reliability department.

swintec
Premium Member
join:2003-12-19
Alfred, ME

swintec

Premium Member

I don't think so!

"These are the kinds of outages VoIP operators can't afford to have with cable operators consistently gobbling up the lion's share of VoIP customers."

I have had far more downtime with TW THIS YEAR than I have ever had with Voicepulse in the 5 or so years we have had them.

dcurrey
Premium Member
join:2004-06-29
Mason, OH

dcurrey

Premium Member

Re: I don't think so!

Absolutely no danger of me going to cable phone options. Too many good voip providers out there with better features and far better pricing.

Sly
Premium Member
join:2004-02-20
Tennessee

Sly

Premium Member

Re: I don't think so!

Agreed... My Comcast goes out more often than my VoIP does...

Now I realize that you can't get VoIP without internet, but I have had about 7-8 outages this year with Comcast while this is the first one for Callcentric.
AVonGauss
Premium Member
join:2007-11-01
Boynton Beach, FL

AVonGauss

Premium Member

To me...

I don't believe anybody outside CallCentric knows what happened at this point, but to me, this just illustrates in general the need for good multiple paths of customer communication and probably a bit of additional disaster recovery planning. What concerns me about this event is not so much the event itself, but the lack of ability for the customer to get information regarding the event.
caco
Premium Member
join:2005-03-10
Whittier, AK

caco

Premium Member

Re: To me...

This happened 2 hours ago. Aren't you jumping the gun a little?
AVonGauss
Premium Member
join:2007-11-01
Boynton Beach, FL

AVonGauss

Premium Member

Re: To me...

How so?
caco
Premium Member
join:2005-03-10
Whittier, AK

caco

Premium Member

Re: To me...

How much info do they need to provide besides we are sorry and we are working on it, no eta at this time. Once we know more you'll know more. The priority should be fixing the problem and constant communication takes away from that.
AVonGauss
Premium Member
join:2007-11-01
Boynton Beach, FL

AVonGauss

Premium Member

Re: To me...

Until they posted on this forum over an hour later, most customers did not even know that much information.

ArrayList
DevOps
Premium Member
join:2005-03-19
Mullica Hill, NJ

ArrayList

Premium Member

Re: To me...

don't you just have to try to make a call and have it not work to know theres an outage?

Mainah
@rr.com

Mainah

Anon

Still working here

I just started with Callcentric this week, due in part to reviews on BBR. I have not been down today - have continued to get inbound calls. I don't use them for outbound, though.

crazyk4952
Premium Member
join:2002-02-04
united state
Ubiquiti EdgeRouter Lite
Ubiquiti UniFi AP-LR
Polycom VVX300

crazyk4952

Premium Member

not good

This is really hurting their reputation. This is really too bad since it seems like they had spend a lot of time and effort to actually own a lot of their systems instead of contracting out to someone else.

I know that when I look for a VOIP provider, reliability is at the top of my list. If I see forum posts with users experiencing unreliable service, I no longer consider that VOIP provider a viable option for me...
Fisamo
Premium Member
join:2004-02-20
Apex, NC

Fisamo

Premium Member

Re: not good

So one outage is automatically considered 'unreliable service'? Seems to me that CallCentric has done an excellent job keeping their systems up and running for quite a while (almost NO forum complaints that I've EVER seen, not counting anything associated with today's event).

Agreed--a multi-hour outage is not a good thing, for anyone. However, as has been stated by others, cable operators, telcos (POTS), and others have experienced multi-hour and multi-day outages, and you don't generally hear much about it. However, this one event does not cause me to automatically rate this provider as 'unreliable', especially given their track record of UPTIME, customer service, good value, etc.

FWIW, I say this as a non-customer... For my needs, I felt that a different provider (Voipo) would be best. But I would still recommend CallCentric without hesitation.

crazyk4952
Premium Member
join:2002-02-04
united state

crazyk4952

Premium Member

Re: not good

If this is their only outage in the next few months, then I do not think I would allow this incident to affect my opinion of them. However, if another incident happens soon, then I will really start to question their reliability.
PX Eliezer704
Premium Member
join:2008-08-09
Hutt River

PX Eliezer704 to crazyk4952

Premium Member

to crazyk4952
One swallow does not make a summer, and one (or two) outages does not make for a bad reputation.

Most of the Voip companies have had outages at times, even AT&T's own CallVantage.

And outages are even more common among internet service providers. It just does not become "newsworthy".

Tweak
Premium Member
join:2002-06-08
Colonial Heights, VA

Tweak to crazyk4952

Premium Member

to crazyk4952
I am a callcentric customer I have an international direct dial number. I have been very pleased with Call centric . If you need reliability go with a true clec or Ilec your local cable company or telco. Due to the massive complexity of these systems its too much to ask for 100% up-time. Even a government regulated telco or cableco aren't expected to have that much up-time, due to the nature of how these services operate. If you have a need for more uptime you might want to consider getting a carrier class non internet voip provider. Yes its more expensive but you get what you pay for.

crazyk4952
Premium Member
join:2002-02-04
united state
Ubiquiti EdgeRouter Lite
Ubiquiti UniFi AP-LR
Polycom VVX300

crazyk4952

Premium Member

Re: You expect too much from Internet voip

Maybe you expect too little. As we more more toward IP based communications and away from switched circuits, people are going to expect these types of communications to have the same reliability as they are used to having.

Ten years ago, I can remember having cell phone outages all of the time. Now, I cannot remember when the last time my cell phone was not working.

I am not trying to be argumentative, this is just my opinion. It does not mean that it is right, or that everyone has to agree with me

Tweak
Premium Member
join:2002-06-08
Colonial Heights, VA

1 edit

Tweak

Premium Member

Re: You expect too much from Internet voip

You cant expect anything to have a 100% uptime. You pay more for the higher reliability.
beachnik
join:2004-01-03
Manhattan Beach, CA

beachnik

Member

today's outage...

I've been with CALLcentric since around March 09. While today's outage was a little annoying, my overall experience with them has been positive.

Tweak
Premium Member
join:2002-06-08
Colonial Heights, VA

Tweak

Premium Member

Wow im impressed with the level of detail

Many customers have asked us about what caused the outage today, and what we are doing to prevent future outages. Below we will provide a summary of the cause of the outage, and what we are doing to improve our network. As the cause of this issue was technical, we have tried to provide a basic overview of the issue that occurred.

As background, which is relevant to this outage: Callcentric developed and designed our systems in-house; and we continue to maintain and improve on our services in-house as well. When we launched Callcentric publicly in July 2005 we built our systems in a scalable fashion in order to accommodate the growth of customers and traffic on our network. Most of the systems and hardware on our network including our core database infrastructure (which runs on Sun Solaris Cluster) has been operational since September 2004 as we developed our service and opened it to the public. While we spent a great deal of time building our infrastructure before we launched our service, we've found over the last 4+ years that there are some areas of our network that contain bottlenecks that we've been working to resolve over the last 1 year as our customer base and traffic have grown. This is in addition to having an open network and supporting an ever growing list of customer provided software and hardware as well as customer network architectures which has put additional strain on our network over the years. While some of these bottlenecks we should have anticipated better in the past; the incredible and unexpected growth of our customer base has exceeded the expectations we had when our systems architecture was designed 5 years ago.

As was announced a few weeks ago, we plan to perform a maintenance window on Monday October 5, 2009 from 03:00 AM to 07:00 AM US Eastern time (07:00 GMT/UTC to 11:00 GMT/UTC). This maintenance window is being done to replace a core component of our systems - our primary database. Due to the complexity involved in this change it requires our network to be taken offline while this work is performed. This maintenance is the first of a total of three major changes we plan to make to our core network infrastructure over the coming months.

The reason we are performing this maintenance window is related to the outage that occurred today. As was mentioned above, our core database has been running since 2004 with very few issues over the years. However, about 1 year ago we began planning to replace this infrastructure for many reasons including processing power, memory, and storage space. We've spent the last year moving many of our systems around and adding additional systems for non-real-time database activities in order to off-load our primary and "real-time" database. The changes and planning we've done over the last year were also for the purpose of being able to perform the changes that will occur during the maintenance window in a timely manner.

Unfortunately over the last few days we've begun to have some serious issues with our core database. This included 3 other temporary (less than 1 minute) failures which went un-noticed by customers because redundant systems took over. Our engineers and developers have spent the last few days trying to decipher the causes of these failures, and this morning about 20 minutes before the outage started they had identified the issue as a memory leak in one of our applications. Our engineers were planning how to best mitigate this memory leak as the service outage began; before they were able to take action to correct it.

Because we are approaching what's known as an "edge" condition on parts of our systems (which our maintenance window next week is designed to resolve), there was a rolling and cascading effect caused by one systems failure which affected our core database, and in turn our application servers, proxy servers, web servers, load balancers, and session border controllers. In essence, loads constantly shifted from one part of our network to another causing incredibly high loads on our infrastructure which quickly took everything down.
The outage lasted for as long as it did due to the complexity of the load shifts that were occurring while we were trying to stabilize each part of the system; enough to bring the entire system back online.

We believe at this time that we have our systems back on-line in a way that we can spend the next few days without service affecting issues until we perform the maintenance window on October 5th, which should mitigate the issues that occurred today and allow the growth of our customer base and traffic going forward for the long-term.

In addition to the outage that occurred today, one other item we did not perform well on today was customer notification. Due to the way our systems failed we couldn't get our web site up quickly to display a message that we were experiencing a service outage and that we were working on the problem. While we were immediately aware of the outage due to both internal and external (third party) monitoring, we didn't have a good way to notify our customers that we were aware of the issue and working to correct it. As the service outages we've had in the past have not been frequent and generally did not last as long as this outage; customer notification is an issue we didn't spend enough time considering in the past even though we should have. We have a number of internal ideas as well as customer suggested ideas we will begin investigating so that we can provide better and more timely information to customers in the event of future outages or problems; which of course we are also trying to prevent in general.

Finally, thank you to both the customers that sent in polite and encouraging comments today during and after this outage, as well as to the customers that were furious and used fairly explicit language. Both groups provided us with some good ideas and motivation to work even harder. We sincerely apologize for the outage that occured today. We work very hard to try and avoid any downtime on our services, and will continue to try and do an even better job going forward. We greatly appreciate all of our customers business and hope to keep your business going forward.

Sincerely,
Greg Blumstein
VP Operations
Callcentric, Inc.
34574589 (banned)
join:2009-09-05

34574589 (banned)

Member

No Excuse

Still does not excuse them from not having an update on their site or at least posting somewhere what was going on. All they did was release a statement after the fact. At least when PP down their site was still up with a status report, live chat was available, and the post in the PP forum.

Tweak
Premium Member
join:2002-06-08
Colonial Heights, VA

Tweak

Premium Member

Re: No Excuse

If you read the statement they apologized about that. You know I really would like a company focus on fixing a problem rather then having to post an update. When we start demanding carrier class service from these types of company's we will start paying the higher prices.
34574589 (banned)
join:2009-09-05

1 edit

34574589 (banned)

Member

Re: No Excuse

said by Tweak:

If you read the statement they apologized about that. You know I really would like a company focus on fixing a problem rather then having to post an update. When we start demanding carrier class service from these types of company's we will start paying the higher prices.
Still not an EXCUSE... When PP had a meltdown don't you think they rectified the problem as well? Only difference is they were smart enough mot to put all their eggs in one basket. CC could have done better, and I feel they really let us down. Especially since they never gave an update until after the outage. Sorry not feeling much love for CC right now...

Tweak
Premium Member
join:2002-06-08
Colonial Heights, VA

1 edit

Tweak

Premium Member

Re: No Excuse

You actually don't know that call centric put all its eggs in one basket. Really what good does it do to the customer posting an outage notification? You can pick up the phone and realize you have no dial tone . The 2 or 3 minutes it takes typing up the notification is 2 or 3 minutes less spent working on resolving the problem. Really what good does it know about an update? What is pp?

ptrowski
Got Helix?
Premium Member
join:2005-03-14
Woodstock, CT

ptrowski to 34574589

Premium Member

to 34574589
said by 34574589:
said by Tweak:

If you read the statement they apologized about that. You know I really would like a company focus on fixing a problem rather then having to post an update. When we start demanding carrier class service from these types of company's we will start paying the higher prices.
Still not an EXCUSE... When PP had a meltdown don't you think they rectified the problem as well? Only difference is they were smart enough mot to put all their eggs in one basket. CC could have done better, and I feel they really let us down. Especially since they never gave an update until after the outage. Sorry not feeling much love for CC right now...
You forgot that it was TWO outages in TWO days for Phonepower.
34574589 (banned)
join:2009-09-05

34574589 (banned)

Member

Re: No Excuse

said by ptrowski:

said by 34574589:
said by Tweak:

If you read the statement they apologized about that. You know I really would like a company focus on fixing a problem rather then having to post an update. When we start demanding carrier class service from these types of company's we will start paying the higher prices.
Still not an EXCUSE... When PP had a meltdown don't you think they rectified the problem as well? Only difference is they were smart enough mot to put all their eggs in one basket. CC could have done better, and I feel they really let us down. Especially since they never gave an update until after the outage. Sorry not feeling much love for CC right now...
You forgot that it was TWO outages in TWO days for Phonepower.
I not concerned with the outage so much as the lack of communication.