how-to block ads
11.0 Diagnostic Tools and Procedures
Normally this test is intended to check your first hop's (your ATTIS aggregation router) ping response over a period of time to look for high latency and/or packet loss. It is frequently requested by the tech's in AT&T Direct
1. The first step is to determine the IP of your first hop (that is the first hop outside of your LAN). To do this, you need to run a traceroute to a site such as yahoo.com. Excellent instructions on how to run a traceroute can be found in this FAQ: »Site FAQ »How do I run a traceroute and post the results in the forums?.
In the tracert example below, the first hop IP is 184.108.40.206.
2. Then your first hop IP must entered into the command prompt or MS-DOS prompt window as ping -n 100 220.127.116.11 followed by the ENTER key. Your computer will begin pinging that IP 100 times and the ping results will scroll upwards and off your command prompt window.
3. Once the pinging has completed, you should copy the last four summary lines (the individual ping results are not needed) to your clipboard by left-clicking on the icon in the upper left hand corner of the command prompt window to open a menu. From the "Edit" sub menu, choose Mark. Using your mouse, drag the selection box from one corner to the other of the four lines of text you wish to copy. Once you have selected the text, press the ENTER key to copy the text to the clipboard.
An example of the summary lines might look like this:
4. You can now paste the extended ping summary results into a forum post by pressing CTRL-V while you are creating a new post.
Pathping Gateway Test.
This gateway ping test described above assumes that there isn't a problem with the packet transit to the gateway. If the computer's network interface card (NIC), the router before the modem, the modem's ethernet port, or the cables in between these devices could be defective, then a gateway pathping test can be run to check for packet losses.
To do this test, enter your first hop IP into the command prompt or MS-DOS prompt window as pathping -n 18.104.22.168 followed by the ENTER key. Your computer will then begin pinging your gateway and all the hops between for 100 times. The result of such a pathping test might look like this:
A: Pingplotter is a commercial program that can be very useful to monitor latency and packet loss problems on the network. The paid versions allow a 30-day evaluation period. There is also a reduced functionality, freeware-version of the program available that can be used to record network problems as well. It's main limitations when compared to the paid version are its lack of latency over time graphs and the inability to save an image capture of the results from within the program.
Before PingPlotter can be used, there are some setting changes that need to be made to maximize the value of the data obtained. These three settings should be changed:
1. The number of times to trace should be set to Unlimited. You can manually stop the program when you have enough data to demonstrate the network problem.
2. The trace interval should be set to 1 - 5 seconds if a short duration (say up to 15 minutes) PingPlotter test is to be run. If an extended duration test is planned, say for hours or overnight, then a trace interval duration of 15 - 60 seconds would be more appropriate.
3. The samples to include should be set to ALL in order to show all the data captured by the entire run of the program.
Other useful setting changes:
4 and 5. The number of errors received and the Min and Max latencies can also be shown on the trace list by right-clicking on it and choosing Customize View and changing those settings.
6. On the paid versions of the program, a time chart for any hop can be added at the bottom of the display by double left-clicking on the hop. Right click on a time chart to choose Hide Graph. The first hop and the last hop graphs are usually of most interest.
To post an image of your results on a forum, you will have to save a copy of the image by selecting Save Image from the drop down menu of File if you are using a commercial version of the program. If you are using the freeware version, you can capture a copy of your computer screen to your clipboard by hitting the CTRL-PRT SC buttons simultaneously. Then open the Paint program, paste the screen image into it, and use Paint's cut-and-paste to edit the image to just a shot of the PingPlotter window. Whichever method you use, save the image in PNG format for an attachment to a thread.
Note: Packet loss on one intermediate-hop router does not usually cause packet loss at the end point router. An intermediate router showing packet loss may be set to give low priority to ICMP packets or to not respond at all. Packet loss that starts at an intermediate-hop router and then persists through the remaining hops to include the end point router does indicate a problem!
AT&T Speedtest and others are great for measuring the average or aggregate throughput speed over a period of time. NetMeter is good for showing the big picture of real-time, streaming speed changes averaged over 10-second intervals.
To measure throughput speeds on the individual TCP/IP packets, to analyse just how those speeds change over time, and to try to infer what might be causing slowdowns, you need to use the Wireshark program (formerly know as Ethereal and WinPCap). Wireshark is the world's foremost network protocol analyzer and can do much more that just throughput speed measurements, but the scope of this FAQ is limited to this small aspect of the program's capabilities.
1. Download and install the latest build for Wireshark and WinPCap (WinPCap, a helper program, is now built into the Wireshark download and installs as part of the whole package). If you have an older Ethereal and WinPCap version, the procedures presented in this FAQ will probably work, but with the program improvements to this network protocol analyzer, you should really consider upgrading.
2. Then you must decide which network interface you will use to capture packets. Select the Capture choice from the menu bar and then choose Interfaces. If you are lucky, you have a limited number of choices. This FAQ is intended for tests using wired connections, but wireless connections may also be tested. If you are confused by which one of multiple choices in the Wireshark: Capture Interfaces window to use, normally the interface your computer is connected to will be showing live packet traffic. Also the Details buttons can help you decide which NIC to use.
3. Once you decide on the NIC, you can set it to be the default NIC for the program by selecting it in the Capture -> Options -> Interface: choice box.
4. Start a packet capture, by selecting Capture -> Start which launches a packet capture window that shows real-time, capture activity on various packet types. This window can be minimized after it is launched.
5. Then while the packet capture window is running, you must start a download or upload from or to a high speed site. One good site to test with is the Optimum Online 16 MB FTP download test.
6. Once the download or upload test is finished, terminate the packet capture window with the Stop button, and Wireshark will display a listing of all packets captured.
7. If you have captured packets from an FTP download or upload, scroll through the packet list until you find a line that has FTP-DATA listed under the Protocol column and highlight it by single clicking on the line.
8. Then from the menu bar select Statistics -> TCP Stream Graph -> Throughput graph to generate a graph of the data's throughput speeds. There is a separate graph control window that will be opened behind the graph. Each individual "+" mark is a single packet of data. Note: The throughput speeds are measured in bytes/second.
9. Wireshark doesn't have a built in image capture function to export the graphs to a file. You must either use an image capture program or hit the CTRL-PRT SCR buttons simultaneously. Then open the Paint program, paste the screen image into it, and use Paint's cut-and-paste to edit the image to just a shot of the throughput graph inside the window borders. Whichever method you use, save the image in PNG format for an attachment to a thread.
10. Save the Wireshark data file. Other tests can be run on the data at a later time.
Some examples of Wireshark throughput graphs collected on Elite (6016/768) speed lines with various conditions.
1. This is a normal test with no throughput problems. This level of "fuzziness" of the throughput speed distribution is normal and may reflect slight timing inaccuracies in the computer's clock. If the graph is zoomed in, the "fuzziness" will resolve into several jittery speed bands.
2. A packet delay (RTT increase) over the internet backbone from the OOL FTP speed test site interrupted the transmission. The resumption of the normal RTT caused the packets to have the characteristic curved ramp (recovery curve) back up to the full speed baseline after the sharp speed dip.
3. Test data captured during an evening peak usage, "exhausted" router slowdown. Throughput speeds were probably in the 3500-4000 kbps range at the time of this test. The distinctive feature during the slowdowns were the vertical "dripping" pattern of the successive packets slowing to even slower speeds. Note that there also appears to have been at least one packet delay or RTT increase that would have added to the throughput slowdown.
4. A test of a line syncing at 6016/768 kbps, but its aggregation or gateway router (BRAS or redback) was incorrectly set to the 3000 kbps profile. This throttling in the gateway router reduces the average throughput on the line to about 2550 kbps, but the throughput speeds have periodic "drips" below and "spikes" above that "baseline" speed every second or so. This maybe caused by the gateway router's buffer first filling and then emptying out over that drip-spike cycle time. Normally a Wireshark throughput test wouldn't have to be run to identify this particular problem of mismatched DSLAM and gateway speeds as it is well understood and widely known. This particular test was run to examine the action of the gateway router's buffers.
5. This is a test on a line with a defective DSLAM port card. The chart has the appearance of repeated packet delays and/or packet losses that are normally caused by TCP/IP layer problems. Ping test times on the line were normal and ping packet losses were 3% or less, but download throughput speeds were less than 20 kbps at times. An FTP download throughput speed test on this line was only about 2800 kbps. This problem was cured when the card was replaced. Notice how the line periodically manages to get up to the full speed baseline about every 12 - 15 seconds or so. This periodicity may be a clue to the problem cause.
6. Numerous packet interruptions were caused by intensive CRC error activity in this test. These CRCs are believed to have occurred when the ATM traffic rate was set faster than the ADSL rate causing ATM cells to be dropped. This then corrupts the AAL5 logical packets. See: »www.cisco.com/univercd/cc/td/doc ··· nfig.htm . This unusual CRC activity only occurred while using a non-AT&T supplied modem and the errors were not caused by impulse noise. Note the anomalously high speed "spikes". Maybe a buffer within the modem was emptying to give these +16000 kbps "spike" speeds?
7. A self-induced, throughput slowdown caused by saturating the upload stream on an obsolete modem, the 5360 Speedstream, which doesn't have ACK prioritization. The download speeds are throttled back due to ACK starvation which occurs when download packets have to be re-sent after the modem fails to ACK the FTP server in time. These download throughput speeds were only about 700-800 kbps during the upload duration and this is typical for a modem without any degree of ACK prioritization.
Primarily what you should be doing here is pattern matching your throughput graph to the example graphs that have been associated with known problems. Interpretation of these throughput graphs is a dark art and the understanding of them is evolving! It can be seen from some of the examples that different problems can cause similar looking graphs. Probably no graph by itself is conclusive proof of a particular problem, but the graphs should be used with other data to help narrow down the causes of a throughput problem.
Thanks to mktanamachi and tsarath for contributing charts with throughput problems.
Feedback received on this FAQ entry:
Occasionally DNS name servers can have lookup delays for commonly accessed (cached) addresses. Of course, if a sluggish name server has a cache miss and has to check for a name at the root server, then the lookup delay will be even longer. Because the lookup delay for common addresses consists of two parts: the RTT time, the latency to the server, and the query time from the nameserver's cache, a simple pinging of the slow name server will not always show a problem if the server is sluggishly serving up the IP addresses to your browser.
Even worse than a simple delay, is a lookup query timeout when the name server fails to respond within a second or two or ever. Various OSes handle this differently, but WinXP will retry the lookup after one second and then begin trying additional nameservers with increasingly longer wait times. These timeouts can be a result of UDP packet loss en route to the nameserver and so are not always due to problems at the nameserver, but with the routers before it.
deblin has put together a small Windows (non-Windows users see Note 1), command line program (751 KB when zipped), ns_bench that tests the total lookup time using adns. It performs this lookup on a cached common address, google.com, through five iterations using the name server IPs supplied by you and then provides the average and the standard deviation of the five queries. It also reports any query timeouts (retries) that it sees during the test. Due to the way adns is programmed, a timeout is considered to be a lack of response within 2000 ms.
Ns_bench can be used to tell you if your primary or your secondary DNS servers are sluggish by comparing their lookup times to their ping latencies. It can also be used to compare the lookup times for AT&T Anycast name servers versus those of some other DNS providers such as OpenDNS.
If you wish, you could do some of what ns_bench does by installing dig for Windows instead, then manually sending the "dig A www.google.com @nameserver IP" command five times in a row at the command line for it, and then averaging the query times; however, because WinXP is limited to a 10 ms time resolution, you will not be able to measure the query time with the precision of ns_bench. Now dig is also a useful tool for more advanced DNS testing and everyone should consider adding it to their toolkit for that reason, but with ns_bench, you are not required to install dig.
How to install and use ns_bench:
Download the ns_bench zipped file to a folder, such as My Documents, and unzip it. It will make an ns_bench\win32 sub-folder containing its executable program, ns_bench, and the cygwin1.dll file. The program and its components are self-contained within the sub-folder and do not install to or affect your registry.
This is a command line tool, and once it is unzipped, you can run it by right clicking on the ns_bench\win32 folder, and selecting "Open Command Window Here" from the list if you have that particular Microsoft power toy installed (or see instruction 4a on the dig for Windows page for how to install "CMD Prompt Here" to your right-click menu),
Click Start, then select Run, enter 'cmd" (or 'command" for Win98/ME) in the entry box, and hit the ENTER key. Then change the directory to where you unzipped it, e.g., if you downloaded to My Documents, enter 'cd \My Documents\ns_bench\win32' at the command line and hit the ENTER key.
If you see the "'ns_bench.exe' is not recognized as an internal or external command, operable program or batch file" error, then you are not running it from the directory (folder) that the ns_bench program is in.
To run the program, you simply enter 'ns_bench' followed by one or more DNS server IPs at the command line. An example of the program run to test the AT&T Anycast servers would be ns_bench 22.214.171.124 126.96.36.199 which would give results something like this:
A head-to-head shootout with OpenDNS nameservers would be entered as ns_bench 188.8.131.52 184.108.40.206 220.127.116.11 18.104.22.168 which would give results something like this:
proving that in this instance, the OpenDNS servers are NOT faster.
Proxy nameservers such as your router gateway IP or your Speedstream modem at 192.168.0.1 can also be tested with this tool and the latency they add can be compared to the direct access latency of primary/secondary nameservers.
If you make a txt file and place the following line of text in it:
"ns_bench 22.214.171.124 126.96.36.199 188.8.131.52 184.108.40.206
and cut-and-paste the text between the quotes from the text file into the ns_bench command line, you can run the program by repeatedly pasting from the clipboard or you can install the batch program in Note 2.
Uses of ns_bench and interpretation of the results:
The use of ns_bench may fall into three categories:
1) Testing and evaluating properly operating nameservers to be able to select the fastest from among several for use as a primary or secondary nameserver;
2) Proving that a nameserver is sluggishly responding to queries (several hundreds of ms delay). This delay would be much greater than its ping latency, the RTT time to it; or
3) Showing that there is a major problem with a nameserver when it either fails to respond after 2000 ms OR that the UDP packets to the nameserver are simply lost in transit. This problem would be indicated by the number of retries that the tests show. Occasionally even the best nameservers will have a retry, especially during busy times of the day. If a nameserver has a one or more retries on consecutive ns_bench tests, then that would indicate a problem and could cause browsing delays. The route to the nameserver should be tested for packet loss if it consistently times out (retries).
The effect of DNS lookup time on browsing:
Now a difference of say 20 ms versus 200 ms may not seem significant, but when an average web page might have 15 links that must be checked through the name server, the difference in browse delay time added by the name server lookup can jump from a relatively unnoticeable 0.3 seconds (15 links * 0.02 seconds) to a DNS cost wait of 3 seconds if serial connections are used!
Even worse can be the delay caused by DNS query timeouts. WinXP expects that the nameserver will respond within a second or it re-transmits the request and waits another two seconds. It then queries all the available DNS servers listed in the TCP/IP properties settings and waits two more seconds. After that, it will re-query all the name servers and wait four seconds and finally it will re-query all the servers and wait eight seconds, after which the WinXP DNS client times out: DNS query procedure. So if DNS queries to the primary name server failed every time on an average web page, the worst-case total delay could exceed 45 seconds under serial connections.
Thanks to deblin for his time in programming this applet. It started simple, but became a little more involved than was originally envisioned.
Note 1. A link to a binary for OSX (Leopard / x86 only) is available here: http://pflog.net/ns_bench/. The source code is also available for compilation at the above link and has been tested on FreeBSD and Linux.
Note 2. If you are a Windows user and you dislike running ns_bench from a command-line window everytime you suspect a DNS slow down, you can unzip the attached ns_bench.bat.zip file to the folder your ns_bench.exe file is in. Then right click on the batch file (ns_bench.bat) and choose "Send to Desktop" to create a shortcut. The ns_bench batch file is pre-configured with eight commonly used DNS servers:
These eight servers can be added to or edited as necessary. There is also an ns_bench.ico file that can be changed for the default batch file icon. This batch file then puts the ns_bench test only a double click away on your desktop.
Feedback received on this FAQ entry: