Directories, Search Engines & Traffic…

Want to know the difference between a search engine and a directories and why each is so important to your success? This section will give you all of that information and more.

You would be using search engines so you know how they work from the user perspective. From your own experience as a user, you also know that only those results that list at the top of the heap are most likely to attract you. It doesn’t amuse you to know that your search yielded 44316 results. Perhaps even number 50 on your list will not get your custom or even your attention. Thus you know that getting listed on the top or as near to the top is crucial. Since most of the search engine traffic is free, you’ll usually find it worth your time to learn a few tricks to maximize the results from your time and effort. In the next section, you will see how search engine works – from your perspective as a website owner.

Web Crawlers

It is the search engines that finally bring your website to the notice of the prospective customers. Hence it is better to know how these search engines actually work and how they present information to the customer initiating a search.

There are basically two types of search engines. The first is by robots called crawlers or spiders.

Search Engines use spiders to index websites. When you submit your website pages to a search engine by completing their required submission page, the search engine spider will index your entire site. A ‘spider’ is an automated program that is run by the search engine system.

A Spider visits a web site, reads the content on the actual site, the site’s Meta tags and also follow the links that the site connects to. The spider then returns all that information back to a central depository, where the data is indexed. It will visit each link you have on your website and index those sites as well. Some spiders will only index a certain number of pages on your site, so don’t create a site with 500 pages!

The spider will periodically return to the sites to check for any information that has changed. The frequency with which this happens is determined by the moderators of the search engine.

A spider is almost like a book where it contains the table of contents, the actual content and the links and references for all the websites it finds during its search, and it may index up to a million pages a day.

Example: Excite, Lycos, AltaVista and Google.

When you ask a search engine to locate information, it is actually searching through the index which it has created and not actually searching the Web. Different search engines produce different rankings because not every search engine uses the same algorithm to search through the indices.

SpamDexing

One of the things that a search engine algorithm scans for is the frequency and location of keywords on a web page, but it can also detect artificial keyword stuffing or spamdexing. Then the algorithms analyze the way that pages link to other pages in the Web. By checking how pages link to each other, an engine can both determine what a page is about, if the keywords of the linked pages are similar to the keywords on the original page.

Most of the top-ranked search engines are crawler based search engines while some may be based on human compiled directories. The people behind the search engines want the same thing every webmaster wants – traffic to their site. Since their content is mainly links to other sites, the thing for them to do is to make their search engine bring up the most relevant sites to the search query, and to display the best of these results first.

In order to accomplish this, they use a complex set of rules called algorithms. When a search query is submitted at a search engine, sites are determined to be relevant or not relevant to the search query according to these algorithms, and then ranked in the order it calculates from these algorithms to be the best matches first.

Search engines keep their algorithms secret and change them often in order to prevent webmasters from manipulating their databases and dominating search results. They also want to provide new sites at the top of the search results on a regular basis rather than always having the same old sites show up month after month.

An important difference to realize is that search engines and directories are not the same. Search engines use a spider to “crawl” the web and the web sites they find, as well as submitted sites. As they crawl the web, they gather the information that is used by their algorithms in order to rank your site.

Directories rely on submissions from webmasters, with live humans viewing your site to determine if it will be accepted. If accepted, directories often rank sites in alphanumeric order, with paid listings sometimes on top. Some search engines also place paid listings at the top, so it’s not always possible to get a ranking in the top three or more places unless you’re willing to pay for it.

Let us now look at a more detailed explanation on how Search Engines work. Crawler based search engines are primarily composed of three parts.

Spidering

A search engine robot’s action is called spidering, as it resembles the multiple legged spiders. The spider’s job is to go to a web page, read the contents, connect to any other pages on that web site through links, and bring back the information. From one page it will travel to several pages and this proliferation follows several parallel and nested paths simultaneously. Spiders frequent the site at some interval, may be a month to a few months, and re-index the pages. This way any changes that may have occurred in your pages could also be reflected in the index.

The spiders automatically visit your web pages and create their listings. An important aspect is to study what factors promote “deep crawl” – the depth to which the spider will go into your website from the page it first visited. Listing (submitting or registering) with a search engine is a step that could accelerate and increase the chances of that engine “spidering” your pages.

The spider’s movement across web pages stores those pages in its memory, but the key action is in indexing. The index is a huge database containing all the information brought back by the spider. The index is constantly being updated as the spider collects more information. The entire page is not indexed and the searching and page-ranking algorithm is applied only to the index that has been created.

Indexing

Most search engines claim that they index the full visible body text of a page. In a subsequent section, we explain the key considerations to ensure that indexing of your web pages improves relevance during search. The combined understanding of the indexing and the page-ranking process will lead to developing the right strategies.

Meta Tags

The Meta tags ‘Description’ and ‘Keywords’ have a vital role as they are indexed in a specific way. Some of the top search engines do not index the keywords that they consider spam. They will also not index certain ‘stop words’ (commonly used words such as ‘a’ or ‘the’ or ‘of’) so as to save space or speed up the process. Images are obviously not indexed, but image descriptions or Alt text or “text within comments” is included in the index by some search engines.

The search engine software or program is the final part. When a person requests a search on a keyword or phrase, the search engine software searches the index for relevant information. The software then provides a report back to the searcher with the most relevant web pages listed first. The algorithm-based processes used to determine ranking of results are discussed in greater detail later.

These directories compile listings of websites into specific industry and subject categories and they usually carry a short description about the website. Inclusion in directories is a human task and requires submission to the directory producers. Visitors and researchers over the net quite often use these directories to locate relevant sites and information sources. Thus directories assist in structured search.

Another important reason is that crawler engines quite often find websites to crawl through their listing and links in directories. Yahoo and The Open Directory are amongst the largest and most well known directories. Lycos is an example of a site that pioneered the search engine but shifted to the Directory model depending on AlltheWeb.com for its listings.

Hybrids

Hybrid Search Engines are both crawler based as well as human powered. In plain words, these search engines have two sets of listings based on both the mechanisms mentioned above. The best example of hybrid search engines is Yahoo, which has got a human powered directory as well as a Search toolbar administered by Google. Although, such engines provide both listings they are generally dominated by one of the two mechanisms. Yahoo is known more for its directory rather than crawler based search engine.

Keywords

Search engines rank web pages according to the software’s understanding of the web page’s relevancy to the term being searched. To determine relevancy, each search engine follows its own group of rules. The most important rules are

The location of keywords on your web page; and
How often those keywords appear on the page (the frequency)

For example, if the keyword appears in the title of the page, then it would be considered to be far more relevant than the keyword appearing in the text at the bottom of the page.

Search engines consider keywords to be more relevant if they appear sooner on the page (like in the headline) rather than later. The idea is that you’ll be putting the most important words – the ones that really have the relevant information – on the page first.

Search engines also consider the frequency with which keywords appear. The frequency is usually determined by how often the keywords are used out of all the words on a page. If the keyword is used 4 times out of 100 words, the frequency would be 4%.

Of course, you can now develop the perfect relevant page with one keyword at 100% frequency – just put a single word on the page and make it the title of the page as well. Unfortunately, the search engines don’t make things that simple.

While all search engines do follow the same basic rules of relevancy, location and frequency, each search engine has its own special way of determining rankings. To make things more interesting, the search engines change the rules from time to time so that the rankings change even if the web pages have remained the same.

One method of determining relevancy used by some search engines (like HotBot and Infoseek), but not others (like Lycos), is the Meta tags. Meta tags are hidden HTML codes that provide the search engine spiders with potentially important information like the page description and the page keywords.

Meta Tags Versus Back Link Popularity

Meta tags are often labeled as the secret to getting high rankings, but meta tags alone will not get you a top 10 ranking. On the other hand, they certainly don’t hurt. Detailed information on meta-tags and other ways of improving search engine ranking is given later in this chapter. Fact is, these days with popularity playing a more major role in search engine positioning, meta tags do not carry as much weight as they used to.

In the early days of the web, webmasters would repeat a keyword hundreds of times in the Meta tags and then add it hundreds of times to the text on the web page by making it the same color as the background. However, now, major search engines have algorithms that may exclude a page from ranking if it has resorted to “keyword spamming”; in fact some search engines will downgrade ranking in such cases and penalize the page.

Link analysis and ‘click through’ measurement are certain other factors that are “off the page” and yet crucial in the ranking mechanism adopted by some leading search engines. This has emerged as the most important determinant of ranking, but before we study this, we must first look at the most popular search engines and then look at the various steps you can take to improve your success at each of the stages – spidering, indexing and ranking.

Google is a privately held company that was founded by two Stanford graduates, Larry Page and Sergey Brin in 1998. Dr. Eric Schmidt, the CEO joined in 2001 and by the end of the year the company had shown a profit.

Google is the search engine that powers the search directory for Yahoo. This partnership started in the year 2000 and recently there was a report that the contract is being extended. Last year, Yahoo paid Google about $7.2 million for Web search services. PositionTech has been a contender too for Yahoo’s business. Google also provides an Apple-specific Search Engine specifically tailored to deliver highly targeted results related to Apple Computer and the Macintosh computing platform.

The Apple-specific search engine, located at www.google.com/mac.html, makes searching for everything from Apple’s corporate information to product-related news faster and easier.

PositionTech has a robust networking business and a foothold in enterprise search. However, it recently posted deep losses. The company reported a wider net loss in the second quarter 2002, with lower revenue. Its loss broadened to $104 million or 72 cents a share, from $58.3 million, or 46 cents a share, a year earlier. Revenue fell to $30.8 million from $39.5 million a year earlier.

To stay healthy and competitive in consumer search, PositionTech introduced in the last year a program that generates fees from Web sites listed in its database. PositionTech charges companies such as Amazon.com and eBay to list more than 1,000 Web addresses; they might pay anywhere from 5 cents to 40 cents per click when Web surfers jump to their pages from PositionTech’s database. The revenue generated from paid inclusion is shared with partners such as MSN and Overture.

The most recent 2007 estimates, according to “Internet World Stats, there are an estimated 232 million Internet users online in North America alone, at work or at home, 90 percent of whom are estimated to have made some type of search request during any month.

AltaVista is one of the oldest and most well-known search engines. It was launched in December 1995. It was owned by Digital, then run by Compaq (which purchased Digital in 1998), then spun off into a separate company, which is now controlled by CMGI and is now owned by Overture Services, Inc. Over the years it has lost its prominent position to Google and Yahoo.

In March 2002 AltaVista was the last launched release of Enterprise Search v2.0 software that it sells to the Enterprise search market, similar to Verity prior to the Overture buy out. AltaVista was the first to launch ‘important freshness and relevancy initiatives’, crawling key areas of the Internet four times per day and increasing relevancy by 40%.

Overture returns search results based on the amount of money an advertiser has paid. This system has made Overture one of the few profitable advertising based search engine businesses. At one time Yahoo signed a three-year deal with Overture to provide paid search results. As this write up does not cover pay per click advertising, we are not focusing much on Overture.

America Online signed a multiyear pact with Google for Web search results and accompanying ad-sponsored links, ending relationships with pay-for-performance service Overture Services, its algorithmic search provider of nearly three years

We thought it worth mentioning Verity in this, although this is not relevant to search engine optimization. Verity is among the leading providers of enterprise search software. This software is used by numerous enterprises for offering information search within their own sites, portals, intranets/extranets and e-commerce sites.

Several software OEMs providing portal or enterprise software also bundle such software in their offerings. If you wish to provide search facilities to visitors within your site, Verity may be an option, particularly if you have a large site with lots of information. However, GOOGLE has made it extremely to use their search tool for individual sites and it’s free.

Overture returns search results based on the amount of money an advertiser has paid. This system has made Overture one of the few profitable Net businesses. Yahoo has already signed a three-year deal with Overture to provide paid search results. As this write up does not cover pay per click advertising, we are not focusing much on Overture.

America Online signed a multiyear pact with Google for Web search results and accompanying ad-sponsored links, ending relationships with pay-for-performance service Overture Services and PositionTech, its algorithmic search provider of nearly three years

We thought it worth mentioning Verity in this, although this is not relevant to search engine optimization. Verity is amongst the leading providers of enterprise search software (PositionTech is another well known player in this). This software is used by numerous enterprises for offering information search within their own sites, portals, intranets/extranets and e-commerce sites.

Several software OEMs providing portal or enterprise software also bundle such software in their offerings. Verity (VRTY) went public in 1995 and has achieved sales worth $ 145 million in fiscal 2001. It is a profitable company. If you wish to provide search facilities to visitors within your site, this may be an option, particularly if you have a large site with lots of information.

Meta Search Engines

Dogpile is a meta-search engine that searches the Internet’s top search engines such as About, Ask, FAST, FindWhat, Google, LookSmart, Overture and many more. With one single, powerful search engine, you get more relevant and comprehensive results. When you use Dogpile, you are actually searching many search engines simultaneously.

Dogpile, founded in 1996, is the most popular Internet meta-search engine. The site joined the InfoSpace Network in 2000 and is owned and operated by InfoSpace, Inc.

Mamma.com is a “smart” meta-search engine – every time you type in a query Mamma simultaneously searches a variety of engines, directories, and deep content sites, properly formats the words and syntax for each, compiles their results in a virtual database, eliminates duplicates, and displays them in a uniform manner according to relevance. It’s like using multiple search engines, all at the same time.

Created in 1996 as a master’s thesis, Mamma.com helped to introduce meta-search to the Internet as one of the first of its kind. Due to its quality results, and the benefits of meta-search, Mamma grew rapidly through word of mouth, and quickly became an established presence on the Internet. Mamma.com’s ability to gather the best results available from top Internet sources and to procure an impressive array of advertisers during the Internet boom of the late 1990s caught the interest of many potential investors. In late 1999 Intasys Corporation (Nasdaq: INTA) invested in a 69% stake of the company and in July 2001 bought the remaining shares and now fully owns Mamma.com.

A Search Engine Is Just A Tool

A search engine is just a tool. It’s only as smart as the people who master-minded it. If IxQuick™ is smarter and more relevant in its findings than other metasearch engines (and many highly qualified people would agree that it is), it’s because it was conceived to meet the needs of some of the most inquisitive minds in the world – those of researchers, educators, scholars and the world scientific community.
When a user enters a query at the IxQuick.com website, their powerful proprietary technology simultaneously queries ten major search engines and properly formats the words and syntax for each source being probed.
IxQuick™ then creates a virtual database, organizes the results into a uniform format and presents them by relevance and source. In this manner, IxQuick.com provides users with highly relevant and comprehensive search results.

The post Directories, Search Engines & Traffic… appeared first on Online Marketing Training Courses.