dslreports logo
 
    All Forums Hot Topics Gallery
spc
Search similar:


uniqs
3176
andys03
join:2014-01-13
usa

1 edit

andys03

Member

Is AT&T blocking VOIP traffic?

I have a customer who has an VOIP PBX connected to a Level 3 fiber connection. He has offices all across the country using different ISPs. Two of those offices use AT&T, both in different states. One is T1, the other is DSL. For the past week, every day around noon EST, both AT&T sites have one-way voice issues where they cannot hear the other side. It seems to last the rest of the day, and the next morning, things work fine again until noon. The phone logs show they are not receiving the RTP stream. All the other non-AT&T sites work fine. I had them try connecting their phones to other systems on other ISPs (one on Comcast, one on Level 3, and one on Megapath) with no success. I had them plug a phone directly into the T1 router with a public IP, bypassing the firewall, with no success. I've changed the SIP and RTP ports to non-standard ports without success.

I'm coordinating with them to set up a packet capture on the far end while I do a packet capture on the PBX , but I was hoping to find out if others have encountered one-way voice issues with AT&T recently, and if so, how you were able to resolve the problem.

garys_2k
Premium Member
join:2004-05-07
Farmington, MI

garys_2k

Premium Member

said by andys03:

The phone logs show they are not receiving the RTP stream.

So, is it one way audio, or no audio at all?
said by andys03:

I had them try connecting their phones to other systems on other ISPs with no success.

Different ISP, same problem?
jobr
join:2004-10-21
Halifax, NS

jobr to andys03

Member

to andys03
Are you saying there are only audio problems around noon? If so, how long do they last? And do the problems start exactly at noon ? (try a call at 11:59 and one at 12:01).

Is there anything about the two problem sites that's unique to them other than AT&T? Same router, same type of phones, etc?
jobr

jobr to garys_2k

Member

to garys_2k
said by garys_2k:

Different ISP, same problem?

I interpreted this to mean that the phones were connected to a different PBX, which is not on Level 3.
andys03
join:2014-01-13
usa

andys03

Member

Sorry, to clarify, the problem starts at noon and goes on until the next business day. The next morning, things are fine again until noon.

We also had them connect to PBXes on different ISPs. One on Comcast, one on Megapath and one on Level 3, all with the same result.
Mango
Use DMZ and you get a kick in the dick.
Premium Member
join:2008-12-25
www.toao.net

Mango

Premium Member

That is a very curious problem.

I'm sorry I don't have any advice (I would have suggested a packet capture if you weren't already planning that) but please do come back and let us know what the problem was, if you have the time.
andys03
join:2014-01-13
usa

andys03

Member


I've been doing this for 8 years and never saw anything quite like this. I've run into ISPs enabling SIP ALG in the past, but changing ports and/or talking to the NOCs fixed the problem. I guess I won't know for sure until I compare packet captures of the PBX with the phone.
Stewart
join:2005-07-13

Stewart to andys03

Member

to andys03
If a packet trace does not immediately reveal the problem, please post:

On a normal external call, does the incoming RTP come from the PBX, the trunking provider, or the carrier? Is it initially set up that way, or are re-invites involved?

On a normal internal call, does the audio come directly from the other phone, get hairpinned in a NAT router, or get relayed by the PBX? Are internal calls affected by this problem?

Does the problem always start at a precise time (presumably related to a scheduled routing or other policy change), or is it approximate (possibly related to exhaustion of some resource)?

When the problem starts, are calls already in progress also affected?

It would be useful to see a pre-failure trace, for comparison with the failing case.
andys03
join:2014-01-13
usa

1 recommendation

andys03

Member

I was finally able to get 2 sets of packet captures today simultaneously on the phone and PBX. To make a long story short it is not an ALG issue. Everything was normal in the morning, but in the afternoon, the phone did not see the incoming RTP stream from the PBX. It received all other traffic from the PBX, just not RTP. The captures from the phone system did show incoming and outgoing RTP, which is what you'd expect. In any case, Something upstream from their network is blocking the incoming RTP stream in the afternoons and evenings. Very weird.
kaila
join:2000-10-11
Lincolnshire, IL

kaila

Member

said by andys03:

...In any case, Something upstream from their network is blocking the incoming RTP stream in the afternoons and evenings. Very weird.

Have you gone over their LAN router/firewall settings? Also, do they have an employee that comes in around noon?

It's possible they have some other device (or app) inside the LAN which is reliably awakening around noon, telling (port triggering) the router/firewall to aim RTP media at said device/app. In the evening when the device or app goes home/shuts down/sleeps, the router/firewall eventually closes the dynamic forwarding and all is well- at least until noon the next day.
Stewart
join:2005-07-13

Stewart to andys03

Member

to andys03
It should be easy to see what's going wrong (though it may not be easy to fix).

My assumptions (please correct those that are wrong):
1. PBX always proxies media.
2. No re-invites during call setup.
3. PBX on a static public IP address.
4. Phones behind a NAT, with one static public IP shared by all phones at the location.
5. No VPNs involved.
6. The phone's capture would show all RTP sent to its IP address, even if the port numbers, payload type, etc. were wrong.

There are several ways this system could have been designed. Please state which one, or explain in detail if otherwise.

1. Phones are not NAT aware; NAT router has a SIP ALG.
2. Phones are not NAT aware; PBX sends RTP back to the address and port from which it came.
3. Phones do NAT mapping and are assigned unique RTP port ranges. Phones have static private addresses and the NAT router forwards each port range to the respective phone.
4. Phones do NAT mapping and are assigned unique RTP port ranges. The router does not translate the UDP source port number, unless that port is already in use. Other devices sharing the NAT are well behaved, such that the port is never already in use.

On a failing call, for both the phone and PBX captures, please post:
1. Media IP address and port specified by the SDP in the final INVITE.
2. Source address and port of the outbound RTP.
3. Destination address and port of the inbound RTP (at PBX only).
(Mask public IP addresses and anything else you consider sensitive.)

Does RTP for internal calls go through the NAT? Through the PBX? Do they also show the trouble?
andys03
join:2014-01-13
usa

1 edit

andys03

Member

@Stewart and Kaila:
I've tested phones outside the firewall, connected directly to the T1 router and giving it a public address. It is definitely not a NAT or firewall issue.
Further: it happens in 2 different offices in 2 different states every day at the same time. Both offices are on AT&T's network. One has DSL and the other has a T1 (no other broadband is available at either location.) The former uses the AT&T modem as it's router, the other uses a Linksys firewall. All of their other offices who have broadband with other carriers never have this problem.

Comparing packet captures from when there was two way audio and one way audio, they are almost identical in every way except that the phone sees the RTP stream when it works and does not see the RTP stream when it doesn't.

Also, connecting them to other PBXes on other ISPs does not help.

The packet captures are between two extensions, one in an AT&T office and one in another office. The PBX is the media server. However, the issue is 100% repeatable whether extension to extension, external, incoming, outgoing, checking voicemail, etc.
I've eliminated every single network, NAT and firewall issue I can think of. I also spent a good amount of time with engineers for the PBX's manufacturer poring over the packet capture. It has to be something on AT&T's network. It is the only thing these two offices have in common.
Stewart
join:2005-07-13

Stewart to andys03

Member

to andys03
I am certain that this problem does not affect all of AT&T; I have several contacts using Callcentric over AT&T DSL and U-Verse, from whom I have successfully received calls during afternoon hours, last week and this.

I am reasonably certain that the problem does not affect much of AT&T; if it did there would be numerous complaints in this forum. Though nontechnical users might not recognize such trouble as an ISP issue, we would have seen lots of posts "Is provider xxx down?" or "My IP phone model yyy stopped working."

I'm pretty sure that Hanlon's razor applies here; given your testing with non-standard ports, intentional blocking would require DPI. Even then, it would be simpler to just affect the signaling.

IMHO, one or more of your early tests were performed and/or observed incorrectly. The folks in the remote office are probably not very VoIP savvy and were under a lot of pressure to restore service. It's easy to make a mistake.

A hypothetical example of a flawed test: The phones normally do NAT traversal, with the external public IP part of the provisioning info. When one was tested on a (different) public IP, they forgot to turn off NAT mapping in the phone. The signaling works normally, but there is no incoming audio, because the phone is telling the PBX to send it to the wrong address.

Your test descriptions are too vague to audit. For example, when you tested with a different PBX, if that was another one of yours, it could have shared a faulty database, firewall config, etc. If I suspected that the ISP was blocking or corrupting all SIP, I'd test e.g. from a softphone to a Callcentric account, i.e. a setup with almost nothing else in common. Or, I'd rig a mobile hotspot or other temporary Internet connection, to test with only the ISP changed.

Answering some of the questions in my previous post may help you find the trouble.
Stewart

Stewart to andys03

Member

to andys03
Another thought: Have you tried using a VPN from an AT&T-fed site to the PBX or to a non-AT&T site as a workaround? If it works, it is fairly strong evidence that the problem lies with AT&T and should allow tracking it down at your leisure.

If VPN doesn't help, in addition to pointing the finger away from AT&T, it provides a way to capture traffic "on the WAN side of the router", which is a useful intermediate point in this situation.
andys03
join:2014-01-13
usa

andys03

Member

Yeah, but unfortunately none of their computers in either location have speakers or microphones. I'm waiting for them to get a headset so we can test. I did try testing with X-Lite and sending DTMF, but the results were inconclusive. Troubleshooting in general has been a nightmare as it took a week before someone in either office could plug a computer into the back of a phone so I could do packet captures.

I'll try and upload my pcaps and answer your questions later but I'm out of the office today.
andys03

andys03 to Stewart

Member

to Stewart
Click for full size
To answer your questions, most situations are 4: Phones do NAT mapping and are assigned unique RTP port ranges. The router does not translate the UDP source port number, unless that port is already in use. Other devices sharing the NAT are well behaved, such that the port is never already in use.
We do 3 (Phones do NAT mapping and are assigned unique RTP port ranges. Phones have static private addresses and the NAT router forwards each port range to the respective phone.) only when we encounter rare situations with a router in which SIP ALG cannot be disabled. Across our customers, we have near 1000 phones connecting remotely to their respective PBXes (more specifically, Allworx). The phones themselves are NAT friendly and rarely have issues. There has never been an issue with taking a phone from behind a NAT router, assigning it a static, public IP, and connecting it directly to the Internet connection for troubleshooting purposes. The phones are NAT aware and no settings need to be changed on either the phone nor PBX. You'll see a line in the phone's log file that will say something like:
tSuaMain: SUA: Public (NAT) address for 5118 - 50.248.x.x:5060
if it detects it is behind a NAT router.

When I finally was able to do the packet capture, we did two sets of tests. The first in the morning with two way audio, the next in the afternoon when things got funky. Both were from an extension at one of the AT&T offices to one of the extensions in the main office. That second phone is local to the APBX. I'll post results for both tests.

Working call:
1. From the 200 OK sent by the phone: IP: 12.12.x.x. Port: 16384
2. Source IP: 192.168.1.100 Port: 16384
3. From the PBX's capture, the Incoming RTP (received by PBX from phone) IP and source is: 12.12.x.x and 16384. The outgoing RTP (sent to phone by PBX) IP and destination is: 12.12.238.138 and 16384

One way audio call:
1. From the 200 OK sent by the phone: IP: 12.12.x.x port: 16384
2. Source: 192.168.1.100 Port: 16384
3. From the PBX's capture, the Incoming RTP (received by PBX from phone) IP and source is: 12.12.x.x and 16384. The outgoing RTP (sent to phone by PBX) IP and destination is: 12.12.238.138 and 16384

I've also attached a screenshot of the VOIP flow from Wireshark. Working example on the left, broken on the right.

Now, I'm well aware of the improbability of this only affecting my only my customer. I've seen things like ISPs enabling SIP ALG on their modems without informing customers. I've also seen Comcast ship out a modem that drops packets and adds latency if the customer has /29 block of IPs*. However, I've never seen anything quite like this. I've tested every possibility I could both think of and walk the customer through (again, it took a week just to get them to plug a computer into the phone.)
I also had the Allworx engineers look over pcaps several time. The only thing we can say is that it appears that in the afternoon, something upstream from both offices is blocking the RTP packets from the PBX.

In any case, our customer has a ticket open with AT&T and we hope to be troubleshooting with them soon. It's already been escalated up a few notches.

*The Netgear cg3000dcr, if you are interested. Even pinging the router locally would return latency up over 100ms and 5% packet loss. Comcast stopped using them after a year or so.
Stewart
join:2005-07-13

Stewart to andys03

Member

to andys03
Thanks for the captures. Boy, I'm stumped.
said by andys03:

Both were from an extension at one of the AT&T offices to one of the extensions in the main office.

I'm guessing you meant to say the opposite, which would be consistent with the rest of your post. Otherwise, the captures were taken in the wrong place!

While I can certainly imagine a faulty ALG being installed in a DSL modem/router, the fact that the T1 is also affected boggles the mind. Is any of that CPE even under AT&T control? Could they possibly be doing it at the AT&T end of the T1?

Tell me more about the inconclusive test. If the X-Lite user heard the IVR before sending his DTMF, then the discussed trouble did not affect him (though there could have been another problem with the call). Or, was he lacking both speaker and mic? If a desktop doesn't have a speaker or mic connected, surely someone in the office has earbuds that they use with their smartphone. You can plug them into the green speaker jack. Confirm they are working by hearing sounds from the OS. Confirm that the softphone audio is working by hearing ringback tone, etc. For a makeshift mic, plug another pair of earbuds into the red microphone jack. Speak loudly into the left earpiece.

How does the "plugging a computer into the phone" work? Does all traffic from the network get replicated to that port? For example, if the RTP got sent to a port other than 16384, would you see it?

Do the packets sent by the phone "smell" as if coming from behind a NAT? For example, are there private IP addresses in the Via or Contact headers?

Have you tried a VPN as a workaround? Or, do you have some other workaround in place? Can you rig a way to capture traffic on the DSL or T1 line?

I'll post again if I can think of anything. This case is truly bizarre. I don't believe that AT&T is being malicious; they would have little to gain and lots to lose. While a dysfunctional ALG is plausible, it seems very strange to have been installed on two such different networks at the same time, yet not be affecting lots of other customers.

battleop
join:2005-09-28
00000

battleop to andys03

Member

to andys03
If it's a problem on AT&T's side it's not intentional. I have hundreds of Adtran TA900s on AT&T MIS Circuits that are terminating SIP accounts without trouble.
battleop

battleop to andys03

Member

to andys03
I've worked around the Comcast issue with a GRE tunnel in the past. Do you have the ability to create a tunnel from the Cust Prem back to the PBX's network?
andys03
join:2014-01-13
usa

andys03 to Stewart

Member

to Stewart
The packet captures were from an extension in the main office dialing the extension in one of the AT&T offices. I did simultaneous captures from the phone in the AT&T office and the WAN port of the PBX. There's a second ethernet port in the back on of the phone. We can mirror all traffic to the network port of the phone to that second port and capture the traffic. g a computer in the phone, install Wireshark, etc. It's functionally the same as plugging a hub or managed switch and mirroring ports. As long as traffic is directed to that phone's IP address, I'll see it, no matter what port it uses. What make it strange is when we stop receiving RTP traffic, I see other traffic from the PBX, such on UDP 2088, all SIP messaging, etc. It's ONLY the RTP packets that disappear. Also, I've tried defining non-standard RTP port ranges to use, like (25111 - 25120), and it made no difference.

The working and non-working RTP packets are nearly identical as far as I can tell. I can dig deeper if you like.

Regarding X-Lite, they do not have speakers no microphones. When I tried the DTMF test, I saw RTP traffic at all after connecting into an IVR. I tried the same test on my workstation which does have speakers, but no mic, and saw the same thing. It may be a settings issue so I can't rely on that test yet.

If we get the softphone thing worked out, I can set up a PPTP VPN from the workstation to the PBX. If that works, then it seems even more likely that something upstream is filtering out the RTP traffic.
andys03

andys03 to battleop

Member

to battleop
In theory yes, but not easily or quickly at the moment.
said by battleop:

I've worked around the Comcast issue with a GRE tunnel in the past. Do you have the ability to create a tunnel from the Cust Prem back to the PBX's network?

andys03

andys03 to Stewart

Member

to Stewart
Click for full size
Here's a comparison of the 200 OK packets sent by the phone. The working example is on the left.

I don't think that AT&T is being malicious per se. If that were the case, I'd expect that they would have this issue all the time, not just in the afternoon and evenings.
Stewart
join:2005-07-13

Stewart to andys03

Member

to andys03
Sorry, I can't see any meaningful difference.

Can you capture traffic on the WAN side of the NAT? Or, can you temporarily forward the RTP port range for one phone? Either will show if the NAT association for the RTP has somehow been lost (or is not being honored because it's address or port restricted and one of those somehow changed).

Do you have remote desktop or similar access to the PC behind the phone? That would probably be quicker and less error prone than depending on the staff at the remote site.

Please clarify: Are there multiple phones sharing the same 12.12 public IP? If so, is it just a coincidence that we are seeing ports 5060 and 16384, i.e. that the phone used for testing is the first one in the block?

Can you please post the graph and 200 ok for the failing call, captured at the PBX?

If you suspect that an ALG has somehow appeared, try setting up the softphone with no NAT traversal (private IP address in all SIP/SDP). Whether the call works or not, it's very likely that some fields would be butchered to a public IP, which a capture at the PBX would show. Even if it doesn't help you find the trouble, it would be a smoking gun to show AT&T.
Stewart

Stewart to andys03

Member

to andys03
It would be interesting to see what happens to a call that is in progress at noon. The RTP might suddenly stop flowing, stop flowing at the next re-invite (assuming that you are using SIP timers), or continue normally.
andys03
join:2014-01-13
usa

andys03 to Stewart

Member

to Stewart
Click for full size
Click for full size
I could capture traffic on the WAN side if I could get on, site. It'd be a simple matter of throwing a switch between the NAT and the T1 router. I may try to get them to place a phone outside the router with a public IP again, but the first time was a disaster. The one laptop they had access to had a bad NIC port so we couldn't plug in to do a packet capture. In any case, the phones have enough logging capabilities that I can see that it was not receiving RTP when it was configured with a public IP.

We have remote access to PCs there. If they get a USB headset, I can setup a softphone per our previous conversations and do more testing without depending on the staff there.

All of their offices, both AT&T and others, have multiple phones, anywhere from 2 to 10. The two AT&T locations have 2 or 3. The first phone registers on 5060, the others use random ports. All phones stop working at the same time.

16384 is the first port in the RTP media range. Since that was the only call at the time, it was using that port.

Attached are the graph and 200 OK.

I'll try disabling NAT traversal next time we test. It may not be until Tuesday as I'll be at other customers until then. Hopefully we'll hear from AT&T before then.
andys03

andys03 to Stewart

Member

to Stewart
I'm guessing around noon, but we know for sure that calls work up until at least 11:30 AM EST and are not working as early as 1:00 PM. They do not have a heavy enough call volume at either office to know the exact time, unfortunately. If we could figure it out, it'd be great to park a call and pick it back up, transfer, etc during that time.
Just for the hell of it, I'm running traceroutes in the AM and PM to see if the route changes.
andys03

andys03

Member

I found an area to investigate. Below are 3 traceroutes. The first two are from a computer sharing the same connection as the Allworx down to the Texas location, the first during the morning, the second in the afternoon. The 3rd is after the phones stopped working from Texas back to the Allworx.

Day

1 1 ms 1 ms 1 ms 192.168.250.1
2 2 ms 2 ms 2 ms 72.237.x.x
3 6 ms 6 ms 6 ms 4.26.22.105
4 7 ms 7 ms 6 ms ae-3-80.edge3.newyork1.level3.net [4.69.155.145]

5 8 ms 6 ms 7 ms ae-3-80.edge3.newyork1.level3.net [4.69.155.145]

6 14 ms 15 ms 18 ms att-level3.newyork1.level3.net [4.68.63.142]
7 45 ms 43 ms 43 ms cr2.n54ny.ip.att.net [12.122.130.170]
8 45 ms 40 ms 43 ms cr2.wswdc.ip.att.net [12.122.3.38]
9 44 ms 43 ms 44 ms cr1.attga.ip.att.net [12.122.1.173]
10 43 ms 43 ms 43 ms 12.122.2.146
11 * * * Request timed out.
12 * * * Request timed out.

Afternoon

C:\Users\andy>tracert 12.12.x.x

Tracing route to 12.12.238.138 over a maximum of 30 hops

1 1 ms 1 ms 1 ms 192.168.250.1
2 45 ms 2 ms 2 ms 72.237.x.x
3 6 ms 6 ms 6 ms 4.26.x.x
4 7 ms 6 ms 6 ms ae-3-80.edge3.newyork1.level3.net [4.69.155.145]

5 6 ms 6 ms 6 ms ae-3-80.edge3.newyork1.level3.net [4.69.155.145]

6 27 ms 28 ms 25 ms att-level3.newyork1.level3.net [4.68.63.142]
7 58 ms 55 ms 55 ms cr2.n54ny.ip.att.net [12.122.130.170]
8 56 ms 55 ms 54 ms cr2.wswdc.ip.att.net [12.122.3.38]
9 50 ms 54 ms 51 ms cr1.attga.ip.att.net [12.122.1.173]
10 61 ms 59 ms 59 ms 12.122.2.146
11 52 ms 53 ms 53 ms 12.123.153.105
12 77 ms 77 ms 78 ms 12.252.237.166
13 * * * Request timed out.
14 * * * Request timed out.

Afternoon Reverse
C:\Users\user51>tracert 72.237.x.x

Tracing route to 72.237.102.131 over a maximum of 30 hops

1 1 ms 1 ms 1 ms 192.168.1.254
2 1 ms 1 ms 1 ms 12.12.223.137
3 5 ms 5 ms 5 ms 12.89.57.117
4 7 ms 5 ms 6 ms cr1.dlstx.ip.att.net [12.122.100.26]
5 8 ms 6 ms 6 ms gar26.dlstx.ip.att.net [12.123.16.85]
6 * * 2264 ms 4.68.62.229
7 53 ms * * ae-2-52.edge2.Newark1.Level3.net [4.69.156.41]
8 58 ms 57 ms * ae-2-52.edge2.Newark1.Level3.net [4.69.156.41]
9 63 ms 63 ms 62 ms ADVANTAGE-V.edge2.Newark1.Level3.net [4.26.x.x
6]
10 68 ms 68 ms 68 ms 72.237.x.x

The route from the PBX to the phone stays the same, going through AT&T in Atlanta, GA. However, the route back from the phone goes through Dallas, TX. I'm having them do traceroutes on Monday morning to see if the route is different when the phones are working.
Stewart
join:2005-07-13

Stewart to andys03

Member

to andys03
Did you solve this mystery? We all want to know whodunit.

If not, any more clues?
andys03
join:2014-01-13
usa

andys03

Member

Nothing yet. They finally got a headset for one of the computers and we tried testing with XLite. It didn't work, even when VPNing in, but it was hard to tell if that was due to a settings issue in xLite or something else. It seemed to indicate a codec mismatch, but I was on the road and didn't have time to do much more. We are going to follow up on Friday morning, when the phones are working and I've had more time to play with a SIP client on my laptop and confirm it works. I've also been begging their IT guy to follow up with AT&T but he hasn't done that yet.
Stewart
join:2005-07-13

Stewart

Member

For testing with X-Lite, I recommend using an old version 3.0, downloadable from many sites. IMO the newer "cloud based" stuff tries to do too much automatically, which makes it difficult to use for troubleshooting.

3.0 gives you clean control over NAT traversal (see screenshot) and codec selection (Options, Advanced tab).

If you cannot rule out the possibility that the PBXes are somehow part of the problem, try testing with one or more VoIP providers. Callcentric deals well with most NAT issues, including sending media back to whatever IP and port it comes from. A free account should suffice for this test; call 17771234567 to test outbound and use SIPBroker to test incoming.

AnveoDirect is the opposite, requiring proper NAT traversal at the client end. A free trial account will allow you many brief test calls to your mobile. When you view CDRs, there is a link to a SIP trace for each call. If you suspect a SIP ALG or other NAT traversal logic trying to "help" you, try setting X-Lite to "Use local IP address". The call should of course not work, because X-Lite will be telling Anveo to send media to a private address. However, you can then look at their SIP trace, which should be identical to what you capture at the client with Wireshark. If it's not, you'll have some solid evidence to show AT&T.