dslreports logo
 
    All Forums Hot Topics Gallery
spc
Search similar:


uniqs
2312
voxframe
join:2010-08-02

voxframe

Member

Apple TV strange problem

Hello Everyone,

We seem to be running into a strange problem with a large handful of ATV users.

In a nutshell, they can play 1 movie/show, then they need to reboot the ATV box.

The user can watch one show/movie/preview without a problem. After that (Even if not finished) and they try and do a new video stream, they get thrown the error that they can't connect to iTunes. No matter what you do, you will always wind up with that error. Reboot the box VIA power cycle or the Restart option in the menu, and problem is solved... For one single video stream. Once that's done, you're back to square one again and need to reboot.

When it plays, it plays flawlessly. No jittering, lag, etc.
It also does not seem to be speed related as we have different clients with different speed profiles (Some completely unlocked for testing) and they all get the same issues.

I have tried a bench test setup at our shop and cannot reproduce the problem. I can lower the speeds and induce jitter etc, and it will cause buffering issues as expected, but this seems to be something different. I cannot trigger a "Can't connect to iTunes" error.

Is there some stupid connection that these things establish to iTunes (Probably for the sake of DRM) that can't be interrupted, and if it is, it doesn't re-establish?

Anyone else run across this?

Inssomniak
The Glitch
Premium Member
join:2005-04-06
Cayuga, ON

Inssomniak

Premium Member

Do you NAT.? My guess is it something to do with NAT.
voxframe
join:2010-08-02

voxframe

Member

We do.

I'm just curious what it could be with NAT.

What's more weird, is I have the same NAT setup on my bench and it doesn't cause a problem.
VCWireless
join:2010-11-17
Valley Center, CA

VCWireless to voxframe

Member

to voxframe
I've had horrible luck with any Apple product and NAT. Maybe try disabling NAT on one of the problem customers to see if it resolves it.

Inssomniak
The Glitch
Premium Member
join:2005-04-06
Cayuga, ON

1 recommendation

Inssomniak to voxframe

Premium Member

to voxframe
said by voxframe:

We do.

I'm just curious what it could be with NAT.

What's more weird, is I have the same NAT setup on my bench and it doesn't cause a problem.

Are you on the same NAT subnet, as all the other Apple TV devices or isolated from them? Do you go thru the same connection tracking table/router as your customers? Same natted IP?
voxframe
join:2010-08-02

voxframe

Member

Hmmm interesting idea.

All of our NATed clients get an address from their CPE in the 192.168.50.0/24 subnet.

Strange though as my bench test is again using the same 192.168.50.0/24 subnet just as the clients would. And I don't see the issue. Normally every variable I can think of has been reproduced on the bench but I can't catch this one.

Inssomniak
The Glitch
Premium Member
join:2005-04-06
Cayuga, ON

1 edit

Inssomniak

Premium Member

You might not catch this problem unless others are using their box at the same time. Assuming your bench setup is exactly like being at a customers house.

The only thing I can think of, is say when a customer is watching a movie, another customer turns on the box, and starts a movie, re-writing your conntrack table pointing the NAT to the new customer, leaving your old customer with no tracked entry.

Sooo. Start a movie, call a customer to also start one after you, then you see what happens to your box

TomS_
Git-r-done
MVM
join:2002-07-19
London, UK

TomS_ to voxframe

MVM

to voxframe
I could imagine something like this if there were a bunch of ATVs on a common network segment, all requesting the same port to be forwarded via uPNP.

But if its just plain overloading on a single/pool of IPs then I would expect it to support n ATVs.

I suspect watching the NAT table to see if there are any translations being taken over, dropping off, or whatever would be a useful exercise.
jcremin
join:2009-12-22
Siren, WI

jcremin

Member

said by TomS_:

But if its just plain overloading on a single/pool of IPs ....

At what point (or how many customers) would you consider a single IP being overloaded for NAT?

TomS_
Git-r-done
MVM
join:2002-07-19
London, UK

TomS_

MVM

Overloading is a type of NAT.

Typically you have a single IP address, behind which you have many clients. When each client tries to establish a connection it chooses a random source port, and sends out a SYN to the destination host on the port it wants to connect to. The NAT box then translates the source IP to be its WAN address, and may also change the source port. On the way back packets are matched up using the translated port combination, and the original source IP is restored as the new destination address.

This is usually referred to as NAPT, but can also be called overloading.

Semaphore
Premium Member
join:2003-11-18
101010

Semaphore to voxframe

Premium Member

to voxframe
...sniffer time whoot whoot!
thewisperer
Premium Member
join:2008-01-16

thewisperer to voxframe

Premium Member

to voxframe
Justin:

you running fiber on the back end?

what do you have coming in and what kind of client ratio?
voxframe
join:2010-08-02

voxframe

Member

Fiber back end, Ubnt M series AP/CPE, 5.8 Ghz.

CPE connects VIA PPPoE to MT concentrator at the POP, from there it is backhauled to the head end where needed.

Speed profiles
3/1 - No Netflix/ATV period
5/1 - Standard Definition works fine
8/1 - HD works fine
10/2 - HD works fine with room to play

Regardless of profiles, this problem shows up.
Some clients are NAT from single address, some are assigned unique external addresses, some are assigned static external addresses. All seem to exhibit the problem at some point. This is why I'm hesitant to say NAT is the issue.

My thought is perhaps recursive DNS, we run DNS from our two core Mikrotik routers, which grab it from our two main recursive servers that query root servers.
I realize this isn't ideal, but the reason for the extra topology hop is that it gives me a quick way to swap DNS from ours to a public DNS server, should ours fail. CPE radios pull DNS from our core MTs that normally query our main servers, or public ones if we have a problem. Normally when I have DNS issues though, I see it pretty clearly on our graphing when queries are failing or doing something odd. Haven't seen that.

I need to really do some packet work on this, the problem is catching it in a live situation that won't annoy a client during the troubleshooting. Live on the bench, going through an entire client mockup network, I can't reproduce it, so I have nothing to do packet captures on.

Speed wise, I can lower and raise the profiles and see the effects on the ATV, but even when dropped to a dialup speed, the box doesn't produce these errors, it just gets slower.

My hunch is the ATV box isn't even attempting to talk to the apple store when this problem comes up. Another hunch, the main ports used to talk to the Apple store is 80 and 443, which are the same ports normally for the Ubnt radios. I don't use 80 for management, but the godamn Ubnt firmware keeps enabling the 443 as the default web port lately. Could this have anything to do with it? I realize this is outgoing vs incoming ports and should normally not have anything related, but maybe there's something going on with the SSL on the radio that would cause weirdness like this? I will doublecheck which radios have it enabled etc, might be of use.

Another fun issue, this I haven't fully confirmed either, but it's my hunch. ATV will fail, but netflix on the same box will run. So if the ATV can't reach the iTunes store, it will still run Netflix just fine. Indicating speed/packetloss/etc isn't an issue. I haven't completely confirmed this 100% yet, but it's looking like it.

Another stumper to follow the NAT theories... VOIP works flawlessly. Regardless of carrier, protocol, equipment, etc. Normally SIP/VOIP is the most NAT-intolerant thing I know, and we have hundreds using it with every carrier you can think of, and no NAT-esque problems (One way audio, disconnects, registration problems, etc)

TomS_
Git-r-done
MVM
join:2002-07-19
London, UK

TomS_

MVM

I shouldn't imagine it would be related to port 443, as you say its incoming vs outgoing.

And besides, the radio should only be listening for incoming packets destined for itself with a destination port of 443 that are of a SYN or ACK types, or part of an already established TCP session with itself. If its trying to ingest packets from other sessions then its got a pretty nasty bug. But in that situation you'd expect to see web pages failing as well, especially things like Google and Facebook which all default to HTTPS these days.

You could rule out DNS related problems caused by the MT by removing that extra hop, maybe for a subset of customers if theres any that experience it often enough amongst themselves, but not individually to make it easy to debug.

Ideally the ATV should be resolving once and caching for an extended period of time to avoid having to resolve every couple of minutes.

battleop
join:2005-09-28
00000

battleop to jcremin

Member

to jcremin
One.
jcremin
join:2009-12-22
Siren, WI

jcremin to TomS_

Member

to TomS_
said by TomS_:

Overloading is a type of NAT.

Thanks for the info. Learned something new today
jcremin

jcremin to battleop

Member

to battleop
said by battleop:

One.

In a perfect world....

John Galt6
Forward, March
Premium Member
join:2004-09-30
Happy Camp

John Galt6 to voxframe

Premium Member

to voxframe
Is this issue confined to one portion of your network topology, or is it pervasive?
voxframe
join:2010-08-02

1 recommendation

voxframe

Member

It seems to be widespread.

I haven't been able to clamp down on any common/uncommon denominator yet. We are a PPPoE shop with a very small handful of DHCP clients. I don't think we even have any ATV users in the DHCP range (it's that small).

What about PPPoE/MTU? It's a long shot, but maybe? Currently I don't do any MTU manipulation at all, asides from what the MT Routers do on their own when PPPoE sessions establish.

Semaphore
Premium Member
join:2003-11-18
101010

Semaphore to voxframe

Premium Member

to voxframe
So just yappin here .... the apple box is probably opening sequential ports for that out bound SYN to initiate the session. When you reboot the Apple TV box you reset that sequence counter so it starts at the bottom again. If the ATV service on the remote side is listening on port 80, and a customer who cannot get to ATV can still get to Netflix etc which is also port 80 (?) then it seems to me like a problem with the source port side, not the outisde NAT overload... maybe the inside NAT overload. Not likely on the Radios since you're encapsulating inside a PPPOE tunnel the radios shouldn't see any thing other than PPPOE. Do you tunnel from the PPPOE concentrator across your wireless network ? Maybe "inside" your PPPOE concentrator ?

What version of ROS are you running on the PPPOE concentrators ? What model of RB are you using ?(please don't say RB435/RB435 or RB450(without the G) )

Do you limit the number of threads per CPE ?

S
voxframe
join:2010-08-02

voxframe

Member

LMAO! No not RB-450. (We do use the G for small relay setups lol)
RB-1100AHx2 as PPPoE concentrators. ROS 5.25.

Normally the CPE terminates PPPoE to the POP (Tower/AP) and from that location it is backhauled as regular routed IP to the head end, NATed where needed, and sent out on fiber. We don't bother encapsulating our backhaul traffic, at the moment anyway. (Never had a use/reason for it).

We currently don't limit the number of threads per CPE, unless there are limits by default, which I don't think there are (We probably really SHOULD be, and I'm looking into it)

Right now I'm at a stuck point as I don't have a suitable location with the problem that I can take some time to really break down the issue and packet sniff. Damn.

TomS_
Git-r-done
MVM
join:2002-07-19
London, UK

TomS_ to voxframe

MVM

to voxframe
oooooo PPPoE ....

What MTU do you set on your PPP sessions for your PPPoE customers?

Unfortunately PPPoE tends to require a small reduction in MTU on the PPP session to accommodate some extra headers. This is typically 8 bytes, so you need to set the MTU to 1492 instead of leaving it at the default of 1500. This should be set on both sides - the customer CPE and on your BRAS/LNS.

There is a reasonable chance this is contributing, if not causing these issues. Though typically it is also associated with web pages that cant be opened or seem to stall after partially loading.
jcremin
join:2009-12-22
Siren, WI

jcremin

Member

said by TomS_:

This is typically 8 bytes, so you need to set the MTU to 1492 instead of leaving it at the default of 1500.

I find that I still have issues with PPPoE even when I set it to 1492. Certain things just don't load properly. I tried a bunch of numbers, and finally just decided to set every client's MTU to 1400 and haven't had an issue since.

TomS_
Git-r-done
MVM
join:2002-07-19
London, UK

TomS_

MVM

I guess it depends on whether you have other layers of encapsulation within your network, and/or beyond your network.

But under normal conditions, PPPoE = MTU 1492.

I used to provision my network to support the highest MTU possible. For most of my routers this was 1600 bytes, which made things like MPLS possible without eating in to the payload MTU. Unfortunately most cheapie broadband CPE doesn't seem to grasp the concept that MTU can be higher than 1500, so we had to live with 1492 for them.

Inssomniak
The Glitch
Premium Member
join:2005-04-06
Cayuga, ON

Inssomniak

Premium Member

I'm at 1480 for MTU, have been for years with pppoe. That leaves me with 4 bytes of wiggle room after mpls/VPLS and pppoe encapsulation.
Some of the hardware I have is low on the l2mtu department

Semaphore
Premium Member
join:2003-11-18
101010

Semaphore to voxframe

Premium Member

to voxframe
said by voxframe:

LMAO!
Right now I'm at a stuck point as I don't have a suitable location with the problem that I can take some time to really break down the issue and packet sniff. Damn.

Good to hear - I use the 1100AHx2 at our pops and it works well. 5.24 everywhere.

For sniffing I use Mikrotik's built in Packet sniffer with Streaming to a Linux box running Wireshark. You need to disable TZSP interpretation on WireShark. I disable promiscuous mode on the sniffer too. »wiki.mikrotik.com/wiki/E ··· ireshark It works like a champ - I can stream packet captures from anywhere right to my desktop.

I'm sure we have a few hundred Apple TV customers. Hulu and Netflix too. (Gawd it used to be soooo much easier to do this job and make a buck before all this streaming crap).

Tiered Cached DNS like you do. Also cascaded HTTP caches. Almost everything is Mikrotik.

One leg of the network still backhauls over UBNT Rockets 'naked'. No PPPOE anywhere though - MPLS, some VLANs. I haven't heard about this type of problem from any of our subs.

I'd be makin ugly faces towards the CPE or the PPPOE concentrator myself.

I would think that MTU would manifest itself full time. This looks to me like either DNS recursion or port exhaustion.. mind you I could be blowin smoke out my tail pipe too.

BTW I found that 1492 for PPPOE when I used to use it was the magic number too.

battleop
join:2005-09-28
00000

battleop to jcremin

Member

to jcremin
If you are a service provider you should not be using NAT for any reason. Time and time again people come here with all kinds of problems that are directly related to a provider trying to cheap out and NAT to their customers. Until recently IPs have always been cheap and if you are really doing this right as a business and not a hobby trying to make a few bucks they have never been that hard to come across.

Semaphore
Premium Member
join:2003-11-18
101010

Semaphore

Premium Member

said by battleop:

If you are a service provider you should not be using NAT for any reason. Time and time again people come here with all kinds of problems that are directly related to a provider trying to cheap out and NAT to their customers. Until recently IPs have always been cheap and if you are really doing this right as a business and not a hobby trying to make a few bucks they have never been that hard to come across.

So you what ? bridge their router and assign a separate ARIN IP to every device in their Prem ? It must be a nightmare to manage that. In one form or another NAT happens. I've seen BIG companies with Global scale B2B connections running quadruple bi-directional NAT. It blows to try to troubleshoot that, but it works. NATs been around for over 20 years. The ALG's are often not up to date with the protocols and that sucks, or worse vendors instantiations are not RFC compliant, but it's a reality of life. It's not an excuse to say it doesn't work, although it may be a reason.
voxframe
join:2010-08-02

voxframe

Member

I don't want this to degenerate into a "NAT sucks" thread.

I hate NAT to be honest, but it's what I'm stuck with, like it or not.
As I said though, this happens to clients that we assign static external addresses to as well.
BUT - We do that through a 1:1 NAT to the CPE. From the CPE it still has its own NAT to the client's internal network, but it's not a shared NAT, so the issue of overloading should not exist, in theory...

Going to try some NAT related stuff today and see if I can knock off some other issues.
jcremin
join:2009-12-22
Siren, WI

jcremin to battleop

Member

to battleop
said by battleop:

If you are a service provider you should not be using NAT for any reason.

That's a pretty strong opinion. I'm guessing I'll probably go to hell for running a bridged network too.

Fact of the matter is not all of us can easily get huge blocks of IP addresses and MOST customers have no reason for a dedicated IP. Out of my 300ish accounts, about 280 are running behind a single public IP. If I had thousands of addresses, I'd think about giving everyone their own IP, but I don't. I have one /24 and I use 50% of it for my network equipment which is far more critical to me to be able to access remotely than some grandma who wants to get on Facebook. Almost every customer in the world has NAT with their wireless router. Many of us just add a second layer of NAT to that, and I've not seen or heard of any issues other than an occasional gamer or someone who wants to access their security system or weather station remotely.