Need to be found on the Web?

Contact Us for:

web marketing
search engine optimization

SEO - HTML Blog - Tutorial

We will from time to time attempt to include information we have found to be useful as a guide to Search Engine Optimization, beginning with basic HTML principals.



Tuesday, May 23, 2006

Search Engine Robots

Every search engine has a continually-refined and complicated process for updating its search database. The "big three", that is Google, Yahoo, and MSN, all employ a combination of methods, but all three use search engine "spiders" (or crawlers, robots, any of a dozen nicknames). These spiders are largely-autonomous programs which travel the web, jumping from page to page based on the link structure, the same way a regular user might. While the spiders sweep across the web, they collect a variety of data about the pages they encounter, including such items as modification dates, descriptions, and other information contained in the meta tags within a site.

Upon reaching a page, a spider might either index the page initially if it isn't already within the database, or it might update its record of the page, based on how much has changed. The frequency of these spider visits are determined by a variety of factors, including how "static" the content of the page is, and the relative importance of the page itself (PageRank is a measure of this, in Google's world).

Ideally, if you wish to promote your website, you need the spiders to visit often, and "see" all the important sections of your site in order to maintain an updated index that will serve your goals. Accomplishing this is a major goal of SEO on a whole, and is by no means simple. For one, the ways in which spiders move around and collect data is proprietary, and can only really be guessed at by crunching log data, and measuring how quickly a site is indexed and visited.

A separate blog entry will be devoted to each of the big three engines and their methods of indexing, but one general standard for instructing the basic behavior of spiders is a simple textfile called "robots.txt", which is placed in the top-directory of the webserver.

A number of directives can be specified within this file, most notably which areas of your site are "off limits" to search engine spiders. For instance, adding the following lines to robots.txt...

User-agent: webcrawler
Disallow: /

...will tell webcrawler not to index or collect information on any part of your website. Wildcards can be used, as illustrated in the following lines...

User-agent: *
Disallow: /secret
Disallow: /logs

...which keeps every search engine spider (that follows this standard) out of the noted folders. Note that wildcards aren't supported in the actual file path, so instead of /secret/*, just use /secret/.

There is also a META equivalent to this method; simply add the following line to your HTML file, if you don't want it indexed:


If you'd like the page indexed but not the links contained within, use:



Web Marketing Services

Web marketing, including Search Engine Optimization ( SEO ) is our speciality.

If you are interested in our services please contact us through the "Contact Us" link above.

Toronto Web Services





Necessary Tools and Sites


HTML - Kit




toronto web services
25 dunblaine avenue
toronto, ontario canada
m5m 2r6


Mission Statement

In a world dominated by the internet, design and compatibility are crucial to a company's success. Care must be taken to properly and formally code a webpage, so it works across the board on a variety of systems, and is still fully compatible with the latest search and information management technologies. This weblog will teach designers the fundamentals of such coding, including the crucial need for proper validation, the methods in which one can create code which requires less debugging after the fact, and the methods in which Search Engine Optimization can be used to promote the stature of a company within a page of otherwise nondescript search results.





© 2003 - 2009 TorontoWebServices™ Toronto | Ontario | Canada. All rights reserved

home - ecommerce services - web marketing - fees - coding -toronto seo - toronto directory - condos - cosmetic medicine - SEO HTML Tutorial - toronto search engine optimization - toronto condos - toronto waterfront condominiums - laser hair removal toronto - toronto lawyers - financial planning canada - toronto tickets - niagara - transcription services - worthwhile causes - green - toronto cars - toronto restaurants - toronto dentists - toronto entertainment - plastic surgery - payday loans

Branding and graphic design: Basis | Photography: Sai Sivanesan

Google™ Yahoo!® MSN®

Valid HTML 4.01 Transitional Valid CSS!