dslreports logo
 
    All Forums Hot Topics Gallery
spc
Search similar:


uniqs
1574
vabello
join:2011-05-05
Allamuchy, NJ

vabello

Member

SamKnows packet loss

What do other people using SamKnows on Optimum see for UDP packet loss? I've always measured a low amount of packet loss on my Optimum connection using Smokeping both running inside of my home testing Internet destinations as well as from an external Smokeping instance testing my Optimum connection directly. I have been running a SamKnows device for a little while now and it seems to agree with my own test results. The problem is nobody at Optimum wants to believe it's a problem or something to fix. I've gone to the direct forums and they ask if I'm getting proper speeds, which I am... and there are no downstream uncorrectable errors occurring when there is loss. I've gone as far as to suggest the errors are occurring on the upstream to the CMTS and asked them to check if there are errors being seen on that end, but they don't seem to be able or willing to check that. From their perspective there is no issue but I have all this data saying there is... by my standard at least.

I expect 0% loss most of the time. Instead I see something like 0.1% to 0.25% all the time for all SamKnows test points. SamKnows is also registering failed page loads once in a while. I sometimes need to refresh a page after requesting it for it to work, but it's very infrequent where I never even assumed it was related.

Anyway, what are other people seeing on their Optimum connections? I don't know how to get support to look into this or if I'm expecting too much from a DOCSIS connection. My own networks I maintain I expect 0% loss and measure it thoroughly and frequently.
vabello

vabello

Member

Click for full size
SamKnows Packet Loss
Click for full size
Signal Levels
Here's some screenshots of the SamKnows data and my signal levels.
vabello

vabello

Member

Just a traceroute to 8.8.8.8 to see the path, it starts out:

r1#traceroute 8.8.8.8
Type escape sequence to abort.
Tracing the route to google-public-dns-a.google.com (8.8.8.8)
VRF info: (vrf in name/id, vrf out name/id)
1 10.240.184.73 12 msec 12 msec 12 msec
2 67.59.240.113 8 msec 8 msec 8 msec
3 rtr2-ge1-11.mhe.prnynj.cv.net (67.83.249.133) 12 msec
ool-4353f98d.dyn.optonline.net (67.83.249.141) 12 msec
rtr2-ge1-11.mhe.prnynj.cv.net (67.83.249.133) 12 msec
4 65.19.119.197 12 msec
64.15.7.49 8 msec
64.15.7.41 16 msec
5 64.15.3.226 12 msec
451be0c6.cst.lightpath.net (65.19.120.198) 16 msec
64.15.3.226 28 msec

So the first point I can run a ping to is 67.59.240.113. Running 10k pings to it right now gives me:

Success rate is 99 percent (9984/10000), round-trip min/avg/max = 4/9/56 ms

or 0.16% packet loss. I'm thinking it's either in the upstream taking errors or maybe in the Optimum network for my node and I'm the only network engineer around here that notices or cares. I'd love to be able to test from a neighbors house, but I don't know anyone well enough to do that.

MxxCon
join:1999-11-19
Brooklyn, NY
ARRIS TM822
Actiontec MI424WR Rev. I

1 edit

MxxCon to vabello

Member

to vabello
Click for full size
Click for full size
Click for full size
Click for full size
SK does show some issues, but I wouldn't say I noticed them myself..

momcat1
No Relation To The Bobcat
join:2002-10-21
Wappingers Falls, NY

momcat1 to vabello

Member

to vabello
Click for full size
Click for full size
Click for full size
Click for full size
vabello
join:2011-05-05
Allamuchy, NJ

vabello

Member

Thanks both for sharing. I don't see your graphs for actual packet loss, just latency. What does your UDP packet loss look like over time?

momcat1
No Relation To The Bobcat
join:2002-10-21
Wappingers Falls, NY

momcat1 to vabello

Member

to vabello
Click for full size
UDP packet loss 8/23-8/29
Here's our UDP packet loss. Since our box hasn't been online for even a full week yet, there's not too much to be seen.
vabello
join:2011-05-05
Allamuchy, NJ

vabello

Member

Click for full size
Thanks. I've attached just 2 days. Yours looks much better than mine. I have more consistent loss on each test. In fact, I'd say it's almost inverted from your experience. I have the rare 0% loss. This is what I've complained about, but I don't know how to find a tech that will actually look into the underlying cause or escalate it. They only know how to troubleshoot what they know. I've swapped modems (which improved things slightly from the older model Arris vs the new one), tried different devices and test directly from them plugged into the cable modem. There are no downstream errors. I think the errors are occurring on the upstream or is between the CMTS and Optimum's router it connects to. I think it's the upstream considering I saw a slight difference with a different newer Arris modem. I've got the TM822G now. The only thing I can take out of the equation is the very first split which one side goes the cable modem and the rest goes to other splits for my TV's. I have no pixilation issues on my TV's ever which coincides with the 0 uncorrectables on my modem. It's just annoying because I know there is a problem somewhere but can't fix it.
frdrizzt
join:2008-05-03
Ronkonkoma, NY

frdrizzt

Member

If you take out the SK router do you see packet loss on extended pings from your PC? Before you go crazy trying to convince CV to help, just be sure that's not the case.
vabello
join:2011-05-05
Allamuchy, NJ

vabello

Member

said by frdrizzt:

If you take out the SK router do you see packet loss on extended pings from your PC? Before you go crazy trying to convince CV to help, just be sure that's not the case.

Yup, two different PC's and 3 different routers.

DAOWAce
join:2006-10-25
Flanders, NJ

2 edits

DAOWAce to vabello

Member

to vabello
said by vabello:

The problem is nobody at Optimum wants to believe it's a problem or something to fix.

They really do go out of their way to try and wash their hands of issues.

Hell, I even had my case closed for the reason "it's the game servers you're connecting to" after giving a game related example of the issue, despite stating it happens globally on the internet and that our OV line goes out every time it happens.

Sometimes they seemed completely oblivious there were even any problems; not sure how much of that was playing stupid or not.
said by vabello:

I don't know how to get support to look into this or if I'm expecting too much from a DOCSIS connection.

You go to the head of their IT department: Wilt.

Talking to him forced the supervisors to contact me and do something about it.

Unfortunately, in my case, it was never resolved through weeks of truck rolls, because the issue was node oversaturation of all things.. at least that's what I concluded when it suddenly resolved itself when they added more (downstream) bonding channels, then came back a year or two later, then disappeared again after more (upstream) channels were added, yadda yadda. I've asked numerous times for the node to be split, and they're still refusing after all these years. (If FiOS was on my street, I'd have jumped long ago..)

And yes, the packet loss is upstream; it's always upstream. Rarely will it ever be downstream.

FWIW, I disconnect from Steam multiple times a night when I leave it up. Not sure if that's Steam, OOL, or Samknows saturating my internet while it runs tests. My 2 day report looks like this:


mbernste
MVM
join:2001-06-30
Piscataway, NJ

mbernste to vabello

MVM

to vabello
Click for full size
Except for August 28th (unsure what happened there), my packet loss has been very low since I've gotten the SK box.

andrewc2
join:2011-06-05
Matamoras, PA

andrewc2

Member

Click for full size
I had some large packet loss friday night into saturday morning. It was bad enough that I contacted CV at 2am to make sure it was reported and fixed by morning.
cablewizzard
join:2009-06-14
Woodbury, NY

cablewizzard to vabello

Member

to vabello
said by vabello:

I expect 0% loss most of the time. Instead I see something like 0.1% to 0.25% all the time for all SamKnows test points.

You're complaining about 0.25% packet loss, measured over full 24-hour data collection periods?

Do you know how ridiculous this sounds in the eyes of those here that are running real-world WAN networks of any kind?

Do you have the wildest guess what your packet loss to a typical far-away (15-20 hops half way across the country) service is? It's well over an order of magnitude higher than this...

This has all the hallmarks of you complaining that your personal NASCAR favorite is only getting a 192mph lap average, when he should be getting 193mph...

CV will ignore you, and for good reason.
cablewizzard

cablewizzard to MxxCon

Member

to MxxCon
said by MxxCon:

SK does show some issues, but I wouldn't say I noticed them myself..

You have a high number of failed Web requests starting around July 27th , jumping to >20% - anything happen around that time? It's completely inconsistent with the Website load times, so I am wondering if SK has some problems with their tests.

Your DNS latency graph doesn't look so good: a jump from 10 to ~22ms on Jan 1st, and then steady increase into the 40-50ms range up to now. Did SamKnows change their testing method?

That 15-day period in Sep 2013 looks rather strange, too, from 10 to 70-90ms, and then straight back to 10ms.

Anyone else here see something like this in their SK graphs?

MxxCon
join:1999-11-19
Brooklyn, NY

MxxCon

Member

I moved apartments(in the same building, different section) on Dec 30th.
vabello
join:2011-05-05
Allamuchy, NJ

vabello to cablewizzard

Member

to cablewizzard
said by cablewizzard:

said by vabello:

I expect 0% loss most of the time. Instead I see something like 0.1% to 0.25% all the time for all SamKnows test points.

You're complaining about 0.25% packet loss, measured over full 24-hour data collection periods?

Do you know how ridiculous this sounds in the eyes of those here that are running real-world WAN networks of any kind?

Do you have the wildest guess what your packet loss to a typical far-away (15-20 hops half way across the country) service is? It's well over an order of magnitude higher than this...

This has all the hallmarks of you complaining that your personal NASCAR favorite is only getting a 192mph lap average, when he should be getting 193mph...

CV will ignore you, and for good reason.

Ridiculous? Doesn't sound ridiculous to me at all or most people I work with in the networking community. Yes, 0.25% packet loss is problematic. I think the way you're describing it is misleading though. This is an average of .25% packet loss that is constant almost all the time. It's not a sudden amount of loss that happened at 4AM that skewed the data to show .25% loss in a 24 hour period. I guess you don't mind TCP retransmissions causing delays while typing via SSH or Remote Desktop, or dropped UDP packets for DNS lookups causing delays to resolve things.

The bigger reason I'm laughing is because I built, run and maintain a national network for a fortune 50 company and do not tolerate anything near this no matter how many hops it goes through. Loss should be 0% all the time. If it's not then you have a problem that has to be fixed. There are numerous explanations for loss, but they are always there and can be fixed. Just because something goes a long way isn't an excuse to accept data loss. I actually don't need to guess what packet loss to a destination is half way across the country. I have full telemetry and statistics of what is is all the way across the country between many major cities on my own network, and it's almost always 0%.

I don't agree with the NASCAR reference. I'm not complaining about speed. It's not like my Ultra101 is running at 100.75Mbps. The packet loss would be equivalent to the car having a cylinder misfire about 22 times a minute at 9000RPM. I'd certainly want that fixed as well.

I will say that in the past 43 days my uncorrectable rate on every downstream channel has remained at 0 and the most my correctable count is on any channel is just shy of 2000. I've never seen the stats that good on my connection, so I'm unsure if anything is being done in the background since I have this Samknows box.

I respect your opinion that it's not a problem, and maybe to you that is acceptable in your own real world networks, but it's not for me personally. I'm just trying to isolate and fix the cause because that's what I do. Thanks for your input though.
flash123
join:2001-12-26
Piscataway, NJ

flash123 to cablewizzard

Member

to cablewizzard
I am agree with vabello. I been tracking down issues that

1) credit transaction fails when a customer support person clicks send button and has to run it again since the application did not tell them the sent went through but the application did not get a respond because of dropped packet across the Internet. Try debugging this when >=0.25% error happens and you do not control the destination application that is using https. Now a 2nd transaction happens and client see two attempts of the credit card.
2) A client needs to download massive amount of data withing SLA. A 0.25% error rate over that time period cause the throughput under 10mbps over >100mbps bandwidth pathway. How you would feel if you know smallest bandwidth is 200mbps throughout but was only getting 10mbps throughput outside the rules of physics and TCP window size? See »www.silver-peak.com/prod ··· itioning what errors does on communication link for throughput.

I cannot prove optonline is not dropping packets easily but I do see peering drops of Cogent and Comcast that have to question what starts happening during daytime activities across peering points. I do not see issue on-net but at Optline.net/Lightpath peering partners, one has to start questioning who is at fault.

vabello
join:2011-05-05
Allamuchy, NJ

vabello

Member

So just an update on this... I decided to search out Wilt and just send him a message asking if he could have someone take a look. I gave him thorough information and tests I was doing to show the issue. This was late at night. I surprisingly got a reply from him at around 11:30PM that night from his iPhone saying "You got it. We'll get it right." I replied and thanked him for his prompt response.

I wasn't expecting anything extraordinary, but the next day while I was at work, my wife called me and said there were Optimum techs at my house checking things outside. They said they couldn't find any issue and someone would be in touch with me. I figured as much and just assumed it wasn't getting fixed anytime soon. Still impressed with the rapid response though.

The following day, beginning around 12PM, the packet loss suddenly disappeared on my connection! Like, totally gone. I've been running tests since then for the past day and get perfect results every time.

Success rate is 100 percent (10000/10000), round-trip min/avg/max = 4/8/52 ms

Samknows also shows 0% loss almost constantly to all test points. Not sure what was done, but it seems something was fixed somewhere.

Anyway, if Wilt reads these forums, he's the man and thanks! I'll keep an eye on it and report back if I see any difference. My experience is very consistent now with what I would expect on my connection and everything is working great.
vabello

vabello

Member

One other thing... after the techs came out, I noticed my downstream power increased a bit. At that point the problem was still there. I now notice my upstream power is higher than usual. Rather than indicating a problem, I wonder if something was reconfigured to tell the modems to transmit at a higher power level and if that is possibly what fixed the problem. Just random speculation...
frdrizzt
join:2008-05-03
Ronkonkoma, NY

frdrizzt

Member

The downstream is the level received by the modem, not transmitted (that would be the upsteam signal). If the downstream improved, they replaced/repaired something that caused more signal loss than expected.

Boooost
@24.190.186.x

Boooost to vabello

Anon

to vabello
Go find a crow and send it to cablewizzard.
vabello
join:2011-05-05
Allamuchy, NJ

vabello to frdrizzt

Member

to frdrizzt
said by frdrizzt:

The downstream is the level received by the modem, not transmitted (that would be the upsteam signal). If the downstream improved, they replaced/repaired something that caused more signal loss than expected.

Yep, I totally understand that. I was just mentioning it as an interesting side fact that I noticed after they visited, but this didn't fix it at the time. The problem was still there even with that change. It is actually almost too high now. Most channels are 10db (and that's after going through a splitter) and my upstream is around 48db. I always thought it was a little unbalanced.
frdrizzt
join:2008-05-03
Ronkonkoma, NY

frdrizzt to vabello

Member

to vabello
Oh, ok. The modems all get the same tftp config based on type/area, so they didn't change that. And configuring the CMTS to balance to a higher power level (which would lead to modems locking an equal amount of dB higher) is a bad idea in almost all cases, as it will simply lead to a higher number of max transmit devices. In the cases where it's not bad (the upstreams are not transmitting at the same level), it's generally a better idea to find that cause of the imbalance and fix that. Generally it's ingress, but sometimes bad cable/connector.
vabello
join:2011-05-05
Allamuchy, NJ

vabello

Member

Things are still great. 0% loss on all my own testing to points inside and outside Cablevision's network. Wilt has been in contact and reiterated several times if anything goes south with my connection to let him know immediately. It seems case is closed. I'm very happy.
cablewizzard
join:2009-06-14
Woodbury, NY

1 recommendation

cablewizzard

Member

said by vabello:

Things are still great. 0% loss on all my own testing to points inside and outside Cablevision's network. Wilt has been in contact and reiterated several times if anything goes south with my connection to let him know immediately. It seems case is closed. I'm very happy.

Your problem sure sounds like it was a minor problem at the CMTS (RF) port side, close to you, and too small to set off alarms anywhere - I assume that's the point where it was fixed. The story would have been different if there was a problem on upstream links and switch ports upstream of the CMTS - 10G links don't just show a fixed percentile of loss, for any reason.

Now as far as your Fortune-50 network is concerned, which I presume is in the financial industry: you control every link, use no public peering points, use no other networks that are contracted for anything but layer-1 with 100% committed data rates, have no guesswork about application traffic makeup and schedule, and work in an industry that, to top it all off, is paranoid about 100% availability, and gets all upset if RTT rises by fractions of a millisecond on a trading network. That's not a public network carrying general Internet traffic, and is not comparable to one by a long shot, and has entirely different operating margins (like: no links over 50% full at peak to keep link latencies within much smaller ranges).

And while we're at it, a comment about flash123's post:
You're providing a service (file downloads) that is SLA'd for speed, over the Optimum Online service (even business)? Hate to break it to you: there's no SLA on the Optimum side, except for dispatch of business repair service (4 hrs?). Last time I checked, providing file download services (= file UPLOAD services, as far as OOL is concerned, e.g.: the direction where bandwidth costs 7-15 times as much as in the OTHER direction) to others was NOT permitted by the T&C.

That calculator is a joke - a self-serving scare-mongering product placement if I ever saw one: it turns 108 Mbps @51ms RTT and 0% loss into an effective 10 Mbps link. 'that entire performance based on CIFS (Windows file sharing)? Jesus Christ, what a bunch of crock. And the "with shady product" speed is exceeding the link speed by a factor of 20? Compressing much?

And: An HTTPS conn is just like every other TCP connection: it uses SACK and fast-retransmit to make up for packet loss (even a few packets in a row, up to a loss rate of a few (small number of) PERCENT) with absolutely NO slowdown in transfer speed (or collapsing the congestion window). It takes a LOT more to stall a TCP conn than a few packets dropped here and there: obviously, you have no end-to-end visibility into your particular problem - but for it to become observable often and for that to become a timeout until the client OS side (or the browser) gives up, there has to be 10's of seconds of no connectivity end-to-end.
vabello
join:2011-05-05
Allamuchy, NJ

vabello

Member

said by cablewizzard:

said by vabello:

Things are still great. 0% loss on all my own testing to points inside and outside Cablevision's network. Wilt has been in contact and reiterated several times if anything goes south with my connection to let him know immediately. It seems case is closed. I'm very happy.

Your problem sure sounds like it was a minor problem at the CMTS (RF) port side, close to you, and too small to set off alarms anywhere - I assume that's the point where it was fixed. The story would have been different if there was a problem on upstream links and switch ports upstream of the CMTS - 10G links don't just show a fixed percentile of loss, for any reason.

Now as far as your Fortune-50 network is concerned, which I presume is in the financial industry: you control every link, use no public peering points, use no other networks that are contracted for anything but layer-1 with 100% committed data rates, have no guesswork about application traffic makeup and schedule, and work in an industry that, to top it all off, is paranoid about 100% availability, and gets all upset if RTT rises by fractions of a millisecond on a trading network. That's not a public network carrying general Internet traffic, and is not comparable to one by a long shot, and has entirely different operating margins (like: no links over 50% full at peak to keep link latencies within much smaller ranges).

Hi. It's possible it was at the CMTS port. I have no idea as the resolution to the issue wasn't shared with me. I'm not sure what you mean by 10G links don't just show a fixed percentile of loss, for any reason. Dirty fiber, faulty optics, bad splices or faulty ports on patch panels, etc. etc., can all cause a fixed amount of packet loss on a 10G link due to signal loss. I work with 1/10/40Gb links every week and do troubleshoot and fix these things if they occur.

No, my employer is not in the financial industry. We are present on 6 public peering exchanges and also peer with some networks privately in multiple cities across the US. Most of our network is comprised of waves and dark fiber. We have nothing to do with trading, but our availability and RTT is critical to application response time that we host. We have SLA's to meet and pay out if we don't meet them. This network is all Internet traffic and is public facing and yes we don't exceed 50% on any link capacity with the way we operate our network.

I'm not sure what all of this has to do with anything. You sound very angry about something as if I can't possibly do what I'm doing for some reason, so I apologize if it's something I said that upset you. That wasn't my intent and wasn't the intent of this thread. I was just looking to get an obvious problem fixed which it now is and for this I'm grateful to Wilt to listening and getting it corrected. But thanks again for your contribution to the topic and insight regarding the possible problem at the CMTS RF port.