how-to block ads
1.1 How To
The first thing that you will need to do is decide how your site is going to be accessed. You can either access the site via IP address, or domain name. If you choose to use a domain name, you will have a few more decisions to make. You can get a free subdomain, but we recommend that you spend about $10/year to get your own domain.
After that you will need a DNS (Domain Name Server). You can either pay for DNS, use a free DNS server, or host your own DNS server. If already have a hosting plan, your provider may provide you with DNS.
The last thing that you have to worry about is hosting. Like the last option, you can either use free hosting, paid hosting, or host your own.
For more information on any aspect of hosting check out our Webhosting Chat forum and also the Webhosting FAQs.
If you are interested in a free HTML editor, please see our list of free HTML or WYSISYG editors.
http://www.norid.no/domreg.html which has long listed all the two-letter country-code domains.
validate your HTML and test your site using Firefox (»www.mozilla.org/). You also should also test your site with IE 5.5.
Although Internet Explorer 6.0 has been out for awhile, there are still a fair number of people running IE 5.5, and the differences between IE 5.5 and IE 6.0 browsers is significant.
The following page will give you instructions on how to run IE 5.5 (and other older versions of IE) on your system. You just download a ZIP file, expand it somewhere, and double-click IEXPLORE.EXE... you are running 5.5.
Another place to find old browser versions is kindly provided by evolt.org, where they seem to have a version of every browser you have never heard of.
If you look at your web statistics, you will see that most people still use IE. Besides the people that have to run another browser because of their operating system, many of these non-IE people browse with Firefox (almost all operating systems) and Safari (Apple OS X). Smaller numbers of visitors will also be using browsers like Opera or Konqueror.
As a web developer, you want to write your code to one set of rules called Web Standards. Those same rules are carefully implemented by the browser. In a perfect world, you would write perfectly valid code, and all the browsers would render your site perfectly. But this isn't the real world.
After you make lots of pages, you will come to what every experienced web developer knows: browsers suck. You will have to account for various bugs in various browsers.
So, what's your best chance of having your site be visible to the widest audience? It's simple. Just validate your markup and styles.
Valid markup (HTML/XHTML) and styles (CSS) will most likely let your site render reliably in all browsers.
It is a good idea to use Firefox for your browser while you develop your site. Not only it is an excellent browser, it has a great plugin called "Web Developer" that will validate the markup you are looking at... and also help you in lots of other ways. If you use a standards compliant browser for development, you will develop markup and styles that are close to web standards.
Each time you come to a breaking point, validate your code. There are free online tools to validate your styles and another tool to validate your markup. After validation, you will generally see that your validated code will usually work fine in all those non-IE browsers. Congratulations!
After you have validated markup and styles, look at your site in IE. Most of the time IE 6 will also work. The markup should be ok, but the styles may not be. There are unique things to IE 6 you may have to work around (most famously, the broken box model). And if you care about IE 5.5 (most do) you have to account for a whole bunch of other bugs. You can handle all that with CSS stylesheets that account for different browser versions.
18.104.22.168 - - [14/Nov/2004:04:51:13 -0500] "GET /robots.txt HTTP/1.0" 200 69 "-" "msnbot/0.3 (+http//search.msn.com/msnbot.htm)"
This is an example of a search engine spider (msnbot, in this case) requesting files from your server. But how do you prevent a bot from doing this, or how do you direct a bot to only index certain portions of your site?
To define what a spider can and cannot do on your site, we can use the Robots Exclusion Standard. You will notice in the example above a request is made for /robots.txt. When a robot first visits your site, it checks for this file first to find out what it is allowed to look at.
Start by creating a file called robots.txt and place it in the root directory of your webserver. In the following example, we will block access to the entire site to all bots:
To block access only to a special directory on your site (in this case, /secret):
You can also block only one bot (again we'll do msnbot):
Are you getting 404 (not found) errors when robots try to find robots.txt?
Even if you don't want to block any robots at all, create a robots.txt with the following, which allows all robots access to your entire site.
Most of the more popular search engine spiders (msnbot, Yahoo! Slurp, googlebot) are well-behaved and will obey your directives. Keep an eye on your logs to make sure that they do obey. If you believe they are not doing as they should, you should report it to the bot owner. There is usally a URL in the bot's user-agent string you can visit to find out details about who's running it, how to contact them, and so forth.
Since misbehaving robots don't pay attention to robots.txt, you may have to block the offending robot's traffic. You can do this by examining your logs to see what sort of signature the robot can be identified by when it comes to your site. You may choose to block the robot's traffic by IP address or "user-agent" (what a robot calls itself).
The robots meta tag
There is another method of controlling access to your content. This one works on a page-by-page basis. Add the following line inside the head section:
<meta name="robots" content="noindex, nofollow" />
This will tell any robot not to index this page in its search results, nor should it follow any links on the page.
For more information on web robots and robots.txt, visit The Web Robots Pages.