dslreports logo
 
    All Forums Hot Topics Gallery
spc
Search similar:


uniqs
18751
nbinont
join:2011-03-13

nbinont to AkFubar

Member

to AkFubar

Re: Google DNS versus ours

said by AkFubar:

Posted in the direct forum... no can read it...

Sorry everyone - the problem occurs when the specific address is NOT in the TSI DNS cache. If I posted the site everyone here would check it out and TSI would not be able to see the effect on the first request. Hence the Direct forum post.

TSI Gabe
Router of Packets
Premium Member
join:2007-01-03
Gatineau, QC

TSI Gabe to TSI Marc

Premium Member

to TSI Marc
Are you saying this is still happening now?
nbinont
join:2011-03-13

nbinont to TSI Marc

Member

to TSI Marc
said by TSI Marc:

said by mlord:

said by TSI Gabe:

The first one, he's using 192.168.1.1 as his DNS server, not ours, hence why it's not working.

That's probably his router, which probably IS using TSI's server. Very standard setup, that.

Gabe appears to have looked that one up at the time and wrote this later in the thread:

»Re: [Cable] Slow DNS resolution for site on Teksavvy

Yes, but the first time I believe the dig command was was not going to the right server: »Re: [Cable] Slow DNS resolution for site on Teksavvy

And this problem happens regularly. but only on TSI. Very repeatable.
nbinont

nbinont to TSI Gabe

Member

to TSI Gabe
said by TSI Gabe:

Are you saying this is still happening now?

Yep, for the past 8 months. Everyday

TSI Gabe
Router of Packets
Premium Member
join:2007-01-03
Gatineau, QC

TSI Gabe

Premium Member

K...honestly this is going to be hard to reproduce now that we are running other DNS servers...I guess keep me posted if it happens again and I'll take another look. Since right now I'm able to resolve those domains just fine.
jstory
join:2011-02-05
New Westminster, BC

jstory to TSI Gabe

Member

to TSI Gabe
Just did.

Got a query response time of 76 msec and 75 msec, respectively, compared to 33 msec for 8.8.8.8.

mtr reports a ping of 13 msec, so maybe the server is just under heavy load.

TSI Marc
Premium Member
join:2006-06-23
Chatham, ON

TSI Marc to nbinont

Premium Member

to nbinont
k, I've asked Gabe to look into it more closely. I'm sure it's something logical we just need to find what it is.
nbinont
join:2011-03-13

nbinont to TSI Gabe

Member

to TSI Gabe
said by TSI Gabe:

K...honestly this is going to be hard to reproduce now that we are running other DNS servers...I guess keep me posted if it happens again and I'll take another look. Since right now I'm able to resolve those domains just fine.

Thanks! I'll follow up and try to reproduce it.

TSI Marc
Premium Member
join:2006-06-23
Chatham, ON

TSI Marc to jstory

Premium Member

to jstory
said by jstory:

Just did.

Got a query response time of 76 msec and 75 msec, respectively, compared to 33 msec for 8.8.8.8.

mtr reports a ping of 13 msec, so maybe the server is just under heavy load.

Gabe responded to you earlier. you have to use the Vancouver DNS servers since you're in BC..

That's because Vancouver has separate DNS servers.

You need to use 76.10.191.198 & 199 (these servers are not yet updated though)

we too can make our IPs respond no matter where you are but we just haven't done that.. there's no real need.

Google has that 8.8.8.8 block anycasted.. it's a routing trick.. that make all the routers think that IP is really close but in fact there are srvers everywhere with the same IP...

Mike2009
join:2009-01-13
Ottawa, ON
TP-Link Archer C7
Technicolor DCM476
Grandstream HT701

Mike2009 to HiVolt

Member

to HiVolt
said by HiVolt:

said by mlord:

Same here.

Same thing I've experienced...

Me too that's why I switched to using google a couple of years ago. I'll try out the TSI ones again.

NytOwl
join:2012-09-27
canada

NytOwl to TSI Gabe

Member

to TSI Gabe
A nifty tool for those interested in benchmarking DNS servers more from their connection(s), and/or comparing more alternatives to TSI's own:

»www.grc.com/dns/benchmark.htm

I haven't ran it yet, myself, but I will once I eventually get my network setup all sorted out.

Guspaz
Guspaz
MVM
join:2001-11-05
Montreal, QC

1 recommendation

Guspaz to TSI Marc

MVM

to TSI Marc
said by TSI Marc:

Google has that 8.8.8.8 block anycasted.. it's a routing trick.. that make all the routers think that IP is really close but in fact there are srvers everywhere with the same IP...

There's no reason TSI couldn't anycast the DNS server IPs so that the same DNS IPs are used anywhere in TSI's territory :P Of course, that's kind of pointless since the vast majority of people use DHCP/PPPoE to automatically set the DNS servers anyhow. Most of the people setting theirs by hand are DNS ricers :P

TSI Marc
Premium Member
join:2006-06-23
Chatham, ON

TSI Marc

Premium Member

well.. there is one good reason.. and well, it's that it would take a bit of lifting to do it and make sure it's done right and for what? not much value..

we'd have to move everything else off that class C...
nbinont
join:2011-03-13

nbinont

Member

said by nbinont:

said by TSI Gabe:

K...honestly this is going to be hard to reproduce now that we are running other DNS servers...I guess keep me posted if it happens again and I'll take another look. Since right now I'm able to resolve those domains just fine.

Thanks! I'll follow up and try to reproduce it.

Well, it seems like whatever Gabe did over the last week has fixed it for me! I verified that it was still acting up a few days ago (and it was), but tonight it seems to be resolving correctly the first time.

I waited for the cached entry to expire in TSI's DNS server, then asked it to resolve the site again. Last week it would fail a few times before finally getting something for the cache, and then be good until the cache expired again.

Tonight, after the cache expired it worked the first time. Waited for the cache to expire again (30 min expiry in this case), and tried again. Success again!

I assume I must be on one of Gabe's new DNS servers - and they seem to be working well!

Guess I'll have to go update my review...
Bugblndr
join:2010-03-02
Burlington, ON

Bugblndr

Member

Good news, time to change things up on my setup a bit and see for myself.
mlord
join:2006-11-05
Kanata, ON

mlord to nbinont

Member

to nbinont
said by TSI Marc:

Yes, but the first time I believe the dig command was was not going to the right server:
»Re: [Cable] Slow DNS resolution for site on Teksavvy

You do realize that only the originator (and TSI) can read threads in TSI Direct, right? Not the rest of us, so posting links to those threads doesn't help anyone here.

TSI Marc
Premium Member
join:2006-06-23
Chatham, ON

TSI Marc

Premium Member

said by mlord:

said by TSI Marc:

Yes, but the first time I believe the dig command was was not going to the right server:
»Re: [Cable] Slow DNS resolution for site on Teksavvy

You do realize that only the originator (and TSI) can read threads in TSI Direct, right? Not the rest of us, so posting links to those threads doesn't help anyone here.

yeah hehe AkFubar pointed that out to me too.. hehe I didn't realise at the time.. looks like nbinont's issue is solved now too. so it's all good

koitsu
MVM
join:2002-07-16
Mountain View, CA
Humax BGW320-500

koitsu to TSI Gabe

MVM

to TSI Gabe
I'm not a Teksavvy customer, but I have a question -- one which after scouring the Internet (assuming what you're using is namebench) nobody has asked.

What on earth does the vertical axis represent? It just says "%". Percentage of what? Packet loss? DNS query rejection (NXDOMAIN, or other error)?

Basically that graph doesn't mean anything unless it's explained 1) where the data is coming from, 2) the exact test being used, and 3) what each axis represents.

For example, namebench.py can send 250 queries to a DNS server. That's nice -- is that 250 concurrent lookups per second? Is that 250 queries total and then it graphs the response time? If the latter, then shouldn't the X axis be query number and the Y axis be response time (in milliseconds), with the visual results being a "scatter graph" followed by a line drawn which indicates the average median?

Surely I can't be the only one questioning what on earth that thing is actually showing.

Otherwise, if I take it to mean "percentage of queries and how long they took to be answered", it looks to me like TSI's servers are taking between 60-200ms about 70% of the time, and 10-20ms the remaining 30% of the time. While comparatively, Google's ervers are taking between 60-200ms about 80% of the time, and 20-30ms the remaining 20% of the time. And to me, that isn't impressive (if anything the results should be the opposite -- first-time query should be slow, but subsequent queries for the same NS/A/PTR/etc. should return almost instantly due to record caching, assuming all recursive records involved don't have stupid TTLs like 1 second. )

Let me show you what actual kernel developers working on UDP stacks tend to graph when it comes to nameserver performance:

»people.freebsd.org/~kris ··· d-pt.png
»people.freebsd.org/~kris ··· gige.png
»people.freebsd.org/~kris ··· pt-2.png
»people.freebsd.org/~kris ··· -nsd.png

Welcome to why just blindly dumping "pretty pictures" isn't helpful without concise (and precise) documentation alongside.

TSI Marc
Premium Member
join:2006-06-23
Chatham, ON

TSI Marc

Premium Member

I'm sure Gabe will chime in but I think it's pretty straight forward what the graph says...

85-90% of queries take 10ms to return a request and all requests always take less then 200ms...

your graphs show queries per second and load.. we're highlighting how quickly a query is returned not how many it can return which is also an important stat no doubt but given we have 4 servers.. load is less of an issue for us.

koitsu
MVM
join:2002-07-16
Mountain View, CA
Humax BGW320-500

koitsu

MVM

I don't find this graph straight-forward in any way shape or form.

"85-90% of all queries take 10ms to get a result". Okay, that's because you look at the graph and see that the point where the graph "shoots off horizontally" starts at 85%, with the vertical axis being at 10ms, correct? That's the only way I can see how you reached that conclusion.

Except if you apply the same logic to the data shown on the rights side of the graph, you could safely say that 97% of all queries took 200ms to get a result...

The following graph (X axis = duration, Y axis = nameserver IP) makes perfect sense but doesn't really provide any hard data, though as I said, that one does make sense. It's the first graph that doesn't.

NightMayor
join:2010-04-28
York, ON

NightMayor to NytOwl

Member

to NytOwl
said by NytOwl:

A nifty tool for those interested in benchmarking DNS servers more from their connection(s), and/or comparing more alternatives to TSI's own:

»www.grc.com/dns/benchmark.htm

I haven't ran it yet, myself, but I will once I eventually get my network setup all sorted out.

This is a great program and for me it helped decreased ping a bit in online game play. For me it had shown that TSI's DNS servers were only second to Rogers'. (I haven't tested it in a couple of days) I still use OpenDNS though because of their Web Filters and overall better security.

TSI Marc
Premium Member
join:2006-06-23
Chatham, ON

TSI Marc to koitsu

Premium Member

to koitsu
it's a simple graph...

x axis = time in ms
y axis = % of querries..

if 100 querries were sent, order the results by shortest amount of time and put a dot along the y axis and how much time it took and that's the distribution you would get.
TSI Marc

TSI Marc to koitsu

Premium Member

to koitsu
and no axis of evil

koitsu
MVM
join:2002-07-16
Mountain View, CA
Humax BGW320-500

koitsu to TSI Marc

MVM

to TSI Marc
Marc, politely: I've had two other senior systems engineers (like myself) look at the graph. Both of them are equally as perplexed, and in the same way I am.

I'll let Gabe respond from here on out, but I'll explain more verbosely:

What you've described *makes sense* (as in conceptually what you want is doable), but what your first graph actually shows doesn't jibe with what you claim the results are -- and it's because of the type of graph being used + how the data is being graphed.

(Readers should note I ABSOLUTELY believe Teksavvy's claims that their nameservers take ~9ms on average vs. Google's 20-30ms. And the reason for that is quite honestly network round trip time between TekSavvy customer and Google's DNS servers, also taking into consideration authoritative nameservers on the Internet who do not work with large EDNS packets (this adds time to the response)).

I believe the data you have is confusing because you're using a line graph rather than a scatter graph or scatter plot.

Honestly what should be happening under the hood:

Loop iteration #1:

1. Issue 100 DNS queries and keeps track of the response time of each query. Query types will vary (different zones, TLDs, A vs. NS vs. PTR etc.), and response times will vary (some will be cached results, some won't be -- those which aren't should be much higher in response time)

2. Get an average response time: add up all 100 query response times, divide by 100. Result: average response time of 100 queries.

3. Graph result on Y axis, with Y axis label "average response time (in ms) of 100 DNS queries". X axis should be incremental based on time, or simply an incrementing variable ($loopcount++).

Loop iteration #2: repeat step 1/2/3, except in step 3, the X axis location should be further to the right than before, and that you can draw a line from iteration plot data point #1 to iteration plot data point #2.

The resulting graph would look roughly something like this.

The first loop iteration -- assuming all the nameservers its querying have *no cached records* -- should be very slow (high response times due to recursive, non-cached lookups). The 2nd loop iteration should be much faster (cached results), the 3rd as well, etc. etc...

The 2nd to Nth results should be "roughly" all within the same amount of time -- however, this greatly depends on the data set being measured (more specifically: what the per-record TTL is of something being resolved, or the SOA TTL associated with that record's zone).

If you were to take all the graphed averages (how many depends on how many loop iterations you let things run for -- it matters! If just one loop, then the results are worthless!) and put them in their own data set. You could then graph those using a bar graph or bar chart, where each bar would represent response time sections, e.g. 0-10ms, 11-20ms, 21-30ms, etc. and let people see what the "general average" response time is for everything. This is akin (mostly) to the 2nd graph you listed in your post (the blue horizontal bars), except with more granularity.

And trust me, I am quite familiar with data/metrics graphing -- I wrote all of what you see there, sans the dygraphs library, and have had to write an entire code base (all perl + dealing with the mess that is RRDTool) to graph VirtualHost bandwidth usage on Apache (using no third-party modules). Not trying to troll or give you a headache, mate!

TSI Gabe
Router of Packets
Premium Member
join:2007-01-03
Gatineau, QC

TSI Gabe

Premium Member

The graph is being generated by a tool called namedbench, I believe it's Google themselves that released it. This isn't something I created.
TSI Gabe

TSI Gabe

Premium Member

I understand what you are saying though, there are more details the namedbench report spews out that is missing here and I didn't necessarily want to publish it for fear of releasing internal network info.

koitsu
MVM
join:2002-07-16
Mountain View, CA
Humax BGW320-500

koitsu

MVM

Understood. And yeah, in my original/first reply, I linked to the namebench site -- their graphs are identical in layout (see "Response Distribution Chart"), meaning the use of a line plot model.

I had a 4th colleague of mine (better educated than myself, especially in mathematics) look at the graphs as well, and he agrees the presentation model is incorrect for what kind of data is trying to be plotted (not that the data itself is wrong!). There are better presentation/layout models (scatter, etc.) that would present the information in a way that makes more sense, but that's not your fault -- it's the fault of namebench. Although since it uses the Google Chart API, the HTTP arguments could be changed to refer to a different model.

The part that shocks me the most is that namebench was written by a pair of Google employees. I'm surprised that someone would write such a useful tool then completely botch the visual representation part. "It's open source, so go fix it, koitsu!" Yeah, and it's Python; I'd rather swallow hot coals.

Anyway, thanks for chiming in and clarifying a bit, TSI Gabe See Profile, very much appreciated!

AkFubar
Admittedly, A Teksavvy Fan
join:2005-02-28
Toronto CAN.

1 edit

AkFubar to TSI Gabe

Member

to TSI Gabe
Congrats Gabe/Marc et al. Internet access seems much more snappy here on new page loads.

Cheers!

jasmo34
join:2008-03-20
~ London ~

2 edits

jasmo34 to TSI Gabe

Member

to TSI Gabe
Question... On the "NAMEBENCH" tool...

Some of the DNS servers I tested are coming back with the message "Unable to get uncached results for: namebench2802998020.wordpress.com. ...".

They are then excluded from the results rankings, although some raw response times are still posted by namebench.

What exactly does that message mean, and what is its significance in regards to those servers?
edit1: And are there any steps to take to eliminate this situation? Flushing DNS cache, etc.?

edit2: NEVER MIND!!! Duh!
MaynardKrebs
We did it. We heaved Steve. Yipee.
Premium Member
join:2009-06-17

MaynardKrebs to TSI Gabe

Premium Member

to TSI Gabe
Gabe,

You might want to invest in a copy of this bible
»www.edwardtufte.com/tuft ··· oks_vdqi