| |
|
|
| |
search engines
optimization
It can
sometimes be an expensive experiment getting an outside
company carry out search engines optimization for you,
they often use suspect methods for temporary results.
The most important thing is the actual site and the html
behind it, rather than the actual submitting to search
engines. First here's some background on search engines
and directories.
search engines
- The user
browses the search engines database of indexed
sites by entering keywords or phrases
into a search box
- "Crawl" web
sites using software - often called
"spidering"
- Some use the
<meta> tag information
- Index the
content of <title> meta tag and
actual page content
- Site may
be found by search engine
spidering software
- Usually
need to submit site
- Users
find sites by searching for
keywords
- Can
sometimes
change your listing
- e.g. Google
directories
- Listings where
user has to choose the category /
categories to search through
- Compiled by
human editors
- Each has very
specific standards which sites must meet
to get listed - very strict
- Site may
be found by people who
compile directory
- Usually
need to submit site
- Can also
guarantee someone will look at your site
quickly if you pay
-
Doesn't guarantee entry
to directory, but more likely
- Difficult
to change your listing
once it's there
- Some list sites in
alphabetical order so obviously this will
determine how far down the list your site will
appear
- The main
disadvantage is that users have to know which
category to choose
- And you
have to know which category users may search
for your type of site in
- e.g. Yahoo
other points
- Many search
engines / directories are both - list
sites from an indexed database as well as
from category listings
-
Directories will often
list the results from their directory
first, then display results from
partnering search engine(s)
- e.g.
Yahoo shows results
from its own database first, then uses
the Google search engine to display further
results
- Important to
submit correctly to the search engines and
directories
- So your
site comes up when a user searches for relevant
keywords/through relevant
categories
- It is crucial to
do this manually to at least the top 20
- Can use
software, such as SubmitWolf, to register
with 100s of others, but not the top
20
- They each have
different criteria to which they allow web sites
to be listed
-
Failing to comply
could result in a bad listing or being
permanently rejected
- This is becoming
increasingly important - there's no point
having a fantastic web site if no one finds it!
|
submission tips
html & page design
- Keep HTML
as simple as possible
- Keep the most
important textual information at the top of
the html
- For users - so
they don't have to scroll
- For Search
Engines - include keyword phrases which
are in the meta tags
- Using tables
etc can mean text is further down the html than
the user sees on screen
-
Consider using a layers
based design - when you create website you can use css to position the
layers with textual content at the top of the
html even if they're further down the visible
page
- If keywords
are also links - increases their
efficacy
- Use
keywords in your
<h1 - h6> heading and subheading tags
- Each
page should have
lots of keyword rich content for its topic
and corresponding meta tags
- Include
textual hyper links to important areas of the
site from the home page
- Call files
/ folders / domain (and subdomain)
names something descriptive -
directories like this (acts as another keyword)
- Include a site
map, linked to from the home page
- Means
the search engine will be able to index
the whole site quickly and
easily
- Put all
JavaScript & CSS in seperate files
- Include
a robots.txt in the root of the create
website
directory detailing js and css
files to be excluded from
indexing
meta tags
- The meta tags
(in the <head> tag of the HTML of each page) for
title, keywords and description
should be utilised to their full value, and match
keywords that appear in the body the page
- Don't use
the same html meta tags across
the site - each page should
have its own
- Optimise
your meta tag information
- View
source on sites of the same genre
as yours that come up highly on search
engines and directories and see what
keywords they use
- Check your
site stats to see which words and phrases
your users are searching for
submitting
- Submit
manually to each search
engine/directory once
- Usually
submit your home page (and any other key
pages from your site on some search engines -
they each have limits)
- Be careful
(i.e. don't spam) about the ones which use
each other's databases:
- Excite,
Magellan, City.net and Webcrawler use the
same database
- AOL
Search, AltaVista, AltaVista UK,
Dictionary.com, DogPile, Go2Net, Google,
HotBot, Infospace, Lycos, Netscape Search
all use Open Directory
- If your
site hasn't appeared
after 4-6 weeks - repeat submission
- Submit to the
correct category in directories
- Some search
engines base their listings on link popularity
- how many other sites link to yours? - are there
any sites which may be willing to link to yours?
-
Referring sites have to be of good quality
and/or similar subject
area
- Use the "final
comments" box where they have one - last chance to
make an impression
- Read all the
Search Engines/Directories "How to submit your
site" sections before submitting
create website (most have
them)
- Consider "buying
your way in" - Yahoo, LookSmart etc have the
option to pay to have your site reviewed
within a few days
-
Doesn't guarantee listing,
just that your site will get reviewed quickly
what not to do
- Re-direct
any page (you want listed)
using JavaScript
-
Probably includes the
js to get out of frames some people put
in their <body> tag
-
Definitely includes
the JavaScript re-direct to detect
browsers for plug-ins
- Have no
real HTML content on the home page
- This is
especially true for graphics-heavy "splash
pages" and Flash enabled sites
- Include a
"hidden" paragraph in a small font size or the
same colour as the background - some search
engines/directories (e.g. Yahoo) will actually
penalise for this - considered spam
-
Especially if this hidden content is also
just a list of keywords
- Put
insufficient text links on the page, so there's
nowhere for the search engine to spider
- Put all
navigational links in layers/javascript - the
search engines can't always spider through
- This can
include rollover images as links
- Use frames
- search engines can't spider them
- In the search
engine results, the site will be seperated into
the individual frames
- If you MUST
use frames - make sure there are navigational
links in each frame used
- And always use
the <no frames> tag
- Use spamming
techniques - it will get you barred from search
engines and directories
improving the listing
- Analyse site stats
- Which
keywords people are searching for - change
create website meta tags and content accordingly
- Which
search engines/directories are/are not
putting traffic to the site
-
Click-through rate
improves site listing
- See if
/ where users are clicking "Back" button
- negative effect on listing
- How
long users stay on
site / pages - can improve listing
- Ensure the page
design is not detrimental to search engines
spidering your site
- Possibly have
information rich pages
- About things
users would be interested in
-
Updated frequently -
visitors return for updates
- Possibly
moreover.com newsfeeds?
- Link these
info rich pages from site map
|
meta tags
- Meta tags are
included in the html, between the <head> and </head>
at the top of the document
- It is crucial to
get the correct information in these as this
could decide whether or not your web site is listed
- The meta tags
(in the HTML of each page) for title,
keywords and description should be
utilized to their full value, and match keywords
that appear in the body of the page
- Ensuring these
keywords and phrases are in the page
headings, body content
and links is highly beneficial
- Do not
repeat them more than five times
on a page though - search engines can pick this
up as spamming
- Some
search engines (not all)
use meta tags to obtain the information about
your site - directories don't use them
- Meta tags
are an important aspect of
search engine optimization
step by step
instructions
planning
- Think of a
short keyword intensive phrase for the page
- Keep to 90-150
characters or less (incl. spaces, punctuation
etc.)
<title>Meta tags for search engine
optimization<title>
- Use this phrase as
the basis for a short paragraph for the
description meta tag
- Do
not split the phrase
up in the paragraph - keep it intact
- It is
beneficial to include any other keyword
phrases may be relevant for the page
- Keep to 250
characters (incl. spaces, punctuation etc.)
<meta
name="description" content="Meta tags for
search engine optimization, including page
title, keywords and description.">
- Use this
description paragraph as the opening paragraph in
the main content of the page
- n.b. There is
not much point putting
create website keywords in the
description meta tag if they are not included in
the actual body content anywhere
- Think of any
other possible keyword/keyword phrases which are
relevant to the page
- Use these and
the keywords from the title and description
for the keywords meta tag
<meta
name="keywords" content="meta tags search
engine optimization keywords description
page title">
title
The <title> tag
is becoming increasingly important to listings.
- It should be
descriptive, but short (about 150
characters)
- Include
keywords
- Build it
around a target phrase for the page
- It needs to
make sense as a sentence, but try to limit the
use of irrelevant words (a, the, etc.)
- Try to
make it specific to
your web site
- Never over the
top, marketing hype "we are the best site etc."
- Is not
descriptive or include valuable keywords
- Can get
you kicked off certain search
engines/directories!
- What the user sees
at the top of their browser and therefore
what is shown when they bookmark a page -
make it useful!
description
The description
(<meta name="description" content="This is what the user
sees when the search engine displays the site in a list
among all other similar sites and can sway which they
click on">)
- This should be
informative and not too long (there are
recommended limits - around 250 characters) as only
a set number of characters will be displayed
- Should include
html keyword phrases
- Ideally
keywords which are repeated in the title
and body content of the page
- Put your
most important keywords at the beginning
of the
description
- Don't
use too many stop words (a, on, the, etc)
- search engines don't index them - wastes your
description
keywords
The keywords
(<meta name="keywords" content="keyword keyword
keyword">)
- These are
the possible keywords a user may
submit when searching for
information
- It is worth
putting in variations, acronyms (e.g.
employment, vacancies, jobs, careers) and spelling
mistakes of words
- Use plural
over singular
- Don't use
commas between keywords:
- Adds
characters and
separates words which may be searched for
together
- Makes
repeating of words harder (SEs don't like
repetition)
-
Experiment - if no
commas isn't helping your listing, reassess
after a few months
- 1000
characters (or less) in
total - more is just ignored
- Don't
repeat a keyword more then
3 times
- Be careful
over variations of the same word (e.g. cook,
cooked, cooking)
- Match
with important keywords in the body
content of the page
|
Spamming Search
Engines
- Search Engines &
Directories want to have the best sites
listed
-
Increases the
user experience
- Gets them
more sponsorship etc
- They take a
harsh line against
spamming
- Spamming:
using "tricks" to try and get your site
listed
don't
- Use hidden
keywords
- Text
the same colour as the
background colour
- Use "doorway"
pages to submit with
- These are
usually pages which contain nothing more than
keywords and a link to your home page
- Use other
company names in your meta tags /
content of your site
without permission
- Considered
spamming, and could get you sued by
said company
- Use "popular"
phrases in keywords (e.g. sex, porn, MP3,
Britney Spears) to try and boost listing
- Submit
to a search engine / directory
too frequently
- Leave
4-6 weeks between
submissions
- Include hidden
links to other sites to try and improve listing
- Links to other
sites don't help unless the referring site
is very popular and preferably of the
same genre as yours
- Besides, you
need good quality links to your site
to improve your ranking
-
Hidden links will be
considered spam anyway
|
Frames
If you
really have to use frames (which I would never
advocate because of usability issues and search engine
listings), here's how to create a framed site which may
have a chance on search engines.
People often use frames
to provide an easy to maintain site. Navigation can be
kept in a seperate frame (often on the left-hand side)
and therefore be updated just once, it is always visible
when the rest of the page scrolls. The same is true for
a header or a footer that stays static while the content
is scrollable. Some companies like this so that the
corporate logo and contact information are always
visible and easy to find. Using frames on a
flash-enabled html site is one way to get content
indexed by search engines, as at present search
engines can't spider the content in flash.
As an argument
against frames, the easy to update debate can be
quashed by the use of server side includes which are
probably easier to maintain than frames. And while
having the company logo and contact information visible
is important, this can be achieved by having
pages which don't scroll too far down. Frames have many
usability issues, mainly in the form of the url being
destroyed and orphaned pages.
Frames are bad
for search engine optimization because:
- search
engine spiders cannot spider
the content of a site because they cannot follow the
links within a frameset
- there is also
no body text to index on the main frameset
These two factors
can be counteracted by:
- using the <noframes>
tag properly
- making sure that
the content inside your <noframes> tag contains
links to the rest of your site
The search engine will
spider this information as it would a normal html page.
other issues
Make sure there are
navigational links in the main body content of
all your internal framed pages - many sites often
put these at the bottom of the page. This is useful for
both the search engine to spider and for any user who
finds the site on a search engine, as the page will be
orphaned from its frameset.
|
using site traffic
stats
- Which keywords
people are searching for
-
Update your meta tag
keywords / content keywords and links
accordingly
- Which search
engines/directories are/are not putting
traffic to the site
-
Click-through rate
improves site listing
- i.e. if your
site is clicked on from a search engine listing
- Also, how far
into the site users go - i.e. how many clicks
(may be measured)
- Check site stats
to see if/where users are clicking "Back" button
- negative effect on listing
- By seeing
how long users stay on your site - search
engines can "time" how quickly a visitor returns
to their site from yours
- Also, by
seeing which pages are "single access pages"
(i.e. the visitor arrives and exits from the
same page, without visiting others)
- Which sites
have links to your site ("referring sites")
- Value
of these referring sites - more valuable if they
themselves are popular and/or in same
category as your site
|
| |
|
 |
| |
Google's
PageRank Explained
Not long ago, there was just one well-known PageRank
Explained paper, to which most interested people
referred when trying to understand the way that PageRank
works. In fact, I used it myself. But when I was writing
the PageRank Calculator, I realized that the original
paper was misleading in the way that the calculations
were done. It uses its own form of PageRank, which the
author calls "mini-rank". Mini-rank changes Google's
PageRank equation for no apparent reason, making the
results of the calculations very misleading.
Even though the author abandoned mini-rank as a result
of this and another paper, the original, unchanged paper
is still available on the web. So if you come across a
PageRank Explained paper that uses "mini-rank",
it has been superceded and is best ignored.
What is Page Rank?
PageRank is a numeric value that represents how
important a page is on the web. Google figures that when
one page links to another page, it is effectively
casting a vote for the other page. The more votes that
are cast for a page, the more important the page must
be. Also, the importance of the page that is casting the
vote determines how important the vote itself is. Google
calculates a page's importance from the votes cast for
it. How important each vote is is taken into account
when a page's PageRank is calculated.
PageRank is Google's way of deciding a page's
importance. It matters because it is one of the factors
that determines a page's ranking in the search results.
It isn't the only factor that Google uses to rank pages,
but it is an important one. From here on in, we'll
occasionally refer to PageRank as "PR".
Notes:
Not all links are counted by Google. For instance, they
filter out links from known link farms. Some links can
cause a site to be penalized by Google. They rightly
figure that webmasters cannot control which sites link
to their sites, but they can control which sites
they link out to. For this reason, links into a site
cannot harm the site, but links from a site can be
harmful if they link to penalized sites. So be careful
which sites you link to. If a site has PR0, it is
usually a penalty, and it would be unwise to link to it.
How is Page Rank Calculated?
To calculate the PageRank for a page, all of its inbound
links are taken into account. These are links from
within the site and links from outside the site.
PR(A) = (1-d) + d(PR(t1)/C(t1) +
... + PR(tn)/C(tn))
That's the equation that calculates a page's PageRank.
It's the original one that was published when PageRank
was being developed, and it is probable that Google uses
a variation of it but they aren't telling us what it is.
It doesn't matter though, as this equation is good
enough.
In the equation 't1 - tn' are pages linking to page A,
'C' is the number of outbound links that a page has and
'd' is a damping factor, usually set to 0.85.
We can think of it in a simpler way:-
a page's PageRank = 0.15 + 0.85 *
(a "share" of the PageRank of every page that links to
it)
"share" = the linking page's PageRank divided by the
number of outbound links on the page.
A page "votes" an amount of PageRank onto each page that
it links to. The amount of PageRank that it has to vote
with is a little less than its own PageRank value (its
own value * 0.85). This value is shared equally between
all the pages that it links to.
From this, we could conclude that a link from a page
with PR4 and 5 outbound links is worth more than a link
from a page with PR8 and 100 outbound links. The
PageRank of a page that links to yours is important but
the number of links on that page is also important. The
more links there are on a page, the less PageRank value
your page will receive from it.
If the PageRank value differences between PR1,
PR2,.....PR10 were equal then that conclusion would hold
up, but many people believe that the values between PR1
and PR10 (the maximum) are set on a logarithmic scale,
and there is very good reason for believing it. Nobody
outside Google knows for sure one way or the other, but
the chances are high that the scale is logarithmic, or
similar.
If so, it means that it takes a lot more additional
PageRank for a page to move up to the next PageRank
level that it did to move up from the previous PageRank
level. The result is that it reverses the previous
conclusion, so that a link from a PR8 page that has lots
of outbound links is worth more than a link from a PR4
page that has only a few outbound links.
Whichever scale Google uses, we can be sure of one
thing. A link from another site increases our site's
PageRank. Just remember to avoid links from link farms.
Note that when a page votes its PageRank value to other
pages, its own PageRank is not reduced by the value that
it is voting. The page doing the voting doesn't give
away its PageRank and end up with nothing. It isn't a
transfer of PageRank. It is simply a vote according to
the page's PageRank value. It's like a shareholders
meeting where each shareholder votes according to the
number of shares held, but the shares themselves aren't
given away. Even so, pages do lose some PageRank
indirectly, as we'll see later.
Ok so far? Good. Now we'll look at how the calculations
are actually done.
For a page's calculation, its existing PageRank (if it
has any) is abandoned completely and a fresh calculation
is done where the page relies solely on the PageRank
"voted" for it by its current inbound links, which may
have changed since the last time the page's PageRank was
calculated.
The equation shows clearly how a page's PageRank is
arrived at. But what isn't immediately obvious is that
it can't work if the calculation is done just once.
Suppose we have 2 pages, A and B, which link to each
other, and neither have any other links of any kind.
This is what happens:-
Step 1: Calculate page A's
PageRank from the value of its inbound links
Page A now has a new PageRank value. The calculation
used the value of the inbound link from page B. But page
B has an inbound link (from page A) and its new PageRank
value hasn't been worked out yet, so page A's new
PageRank value is based on inaccurate data and can't be
accurate.
Step 2: Calculate page B's
PageRank from the value of its inbound links
Page B now has a new PageRank value, but it can't be
accurate because the calculation used the new PageRank
value of the inbound link from page A, which is
inaccurate.
It's a Catch 22 situation. We can't work out A's
PageRank until we know B's PageRank, and we can't work
out B's PageRank until we know A's PageRank.
Now that both pages have newly calculated PageRank
values, can't we just run the calculations again to
arrive at accurate values? No. We can run the
calculations again using the new values and the results
will be more accurate, but we will always be using
inaccurate values for the calculations, so the results
will always be inaccurate.
The problem is overcome by repeating the calculations
many times. Each time produces slightly more accurate
values. In fact, total accuracy can never be achieved
because the calculations are always based on inaccurate
values. 40 to 50 iterations are sufficient to reach a
point where any further iterations wouldn't produce
enough of a change to the values to matter. This is
precisiely what Google does at each update, and it's the
reason why the updates take so long.
One thing to bear in mind is that the results we get
from the calculations are proportions. The
figures must then be set against a scale (known only to
Google) to arrive at each page's actual PageRank. Even
so, we can use the calculations to channel the PageRank
within a site around its pages so that certain pages
receive a higher proportion of it than others.
NOTE:
You may come across explanations of PageRank where the
same equation is stated but the result of each iteration
of the calculation is added to the page's
existing PageRank. The new value (result + existing
PageRank) is then used when sharing PageRank with other
pages. These explanations are wrong for the following
reasons:-
1. They quote the same, published equation - but
then change it from PR(A) = (1-d)
+ d(......) to PR(A) = PR(A)
+ (1-d) + d(......)
It isn't correct, and it isn't necessary.
2. We will be looking at how to organize links so
that certain pages end up with a larger proportion of
the PageRank than others. Adding to the page's existing
PageRank through the iterations produces different
proportions than when the equation is used as published.
Since the addition is not a part of the published
equation, the results are wrong and the proportioning
isn't accurate.
According to the published equation, the page being
calculated starts from scratch at each iteration. It
relies solely on its inbound links. The 'add to
the existing PageRank' idea doesn't do that, so its
results are necessarily wrong.
Internal
linking
Fact:
A website has a maximum amount of PageRank that is
distributed between its pages by internal links.
The maximum PageRank in a site equals the number of
pages in the site *
1. The maximum is increased by inbound links from other
sites and decreased by outbound links to other sites. We
are talking about the overall PageRank in the site and
not the PageRank of any individual page. You don't have
to take my word for it. You can reach the same
conclusion by using a pencil and paper and the equation.
Fact: The maximum amount of
PageRank in a site increases as the number of pages in
the site increases.
The more pages that a site has, the more PageRank it
has. Again, by using a pencil and paper and the
equation, you can come to the same conclusion. Bear in
mind that the only pages that count are the ones that
Google knows about.
Fact: By linking poorly, it
is possible to fail to reach the site's maximum
PageRank, but it is not possible to exceed it.
Poor internal linkages can cause a site to fall short of
its maximum but no kind of internal link structure can
cause a site to exceed it. The only way to increase the
maximum is to add more inbound links and/or increase the
number of pages in the site.
Cautions: Whilst I
thoroughly recommend creating and adding new pages to
increase a site's total PageRank so that it can be
channeled to specific pages, there are certain types of
pages that should not be added. These are pages
that are all identical or very nearly identical and are
known as cookie-cutters. Google considers them to be
spam and they can trigger an alarm that causes the
pages, and possibly the entire site, to be penalized.
Pages full of good content are a must.
Inbound links
Inbound links (links into the site from the outside) are
one way to increase a site's total PageRank. The other
is to add more pages. Where the links come from doesn't
matter. Google recognizes that a webmaster has no
control over other sites linking into a site, and so
sites are not penalized because of where the links come
from. There is an exception to this rule but it is rare
and doesn't concern this article. It isn't something
that a webmaster can accidentally do.
The linking page's PageRank is important, but so is the
number of links going from that page. For instance, if
you are the only link from a page that has a lowly PR2,
you will receive an injection of 0.15 + 0.85(2/1) = 1.85
into your site, whereas a link from a PR8 page that has
another 99 links from it will increase your site's
PageRank by 0.15 + 0.85(7/100) = 0.2095. Clearly, the
PR2 link is much better - or is it?
Once the PageRank is injected into your site, the
calculations are done again and each page's PageRank is
changed. Depending on the internal link structure, some
pages' PageRank is increased, some are unchanged but no
pages lose any PageRank.
It is beneficial to have the inbound links coming to the
pages to which you are channeling your PageRank. A
PageRank injection to any other page will be spread
around the site through the internal links. The
important pages will receive an increase, but not as
much of an increase as when they are linked to directly.
The page that receives the inbound link, makes the
biggest gain.
It is easy to think of our site as being a small,
self-contained network of pages. When we do the PageRank
calculations we are dealing with our small network. If
we make a link to another site, we lose some of our
network's PageRank, and if we receive a link, our
network's PageRank is added to. But it isn't like that.
For the PageRank calculations, there is only one network
- every page that Google has in its index. Each
iteration of the calculation is done on the entire
network and not on individual websites.
Because the entire network is interlinked, and every
link and every page plays its part in each iteration of
the calculations, it is impossible for us to calculate
the effect of inbound links to our site with any
realistic accuracy.
Outbound links
Outbound links are a drain on a site's total PageRank.
They leak PageRank. To counter the drain, try to ensure
that the links are reciprocated. Because of the PageRank
of the pages at each end of an external link, and the
number of links out from those pages, reciprocal links
can gain or lose PageRank. You need to take care when
choosing where to exchange links.
When PageRank leaks from a site via a link to another
site, all the pages in the internal link structure are
affected. (This doesn't always show after just 1
iteration).
The page that you link out from makes a difference to
which pages suffer the most loss. Without a program to
perform the calculations on specific link structures, it
is difficult to decide on the right page to link out
from, but the generalization is to link from the one
with the lowest PageRank.
Many websites need to contain some outbound links that
are nothing to do with PageRank. Unfortunately, all
'normal' outbound links leak PageRank. But there are
'abnormal' ways of linking to other sites that don't
result in leaks. PageRank is leaked when Google
recognizes a link to another site. The answer is to use
links that Google doesn't recognize or count. These
include form actions and links contained in javascript
code.
Form actions
A form's 'action' attribute does not need to be the url
of a form parsing script. It can point to any html page
on any site. Try it.
Example:
<form name="myform" action="http://www.domain.com/somepage.html">
<a href="javascript:document.myform.submit()">Click
here</a>
To be really sneaky, the action attribute could
be in some javascript code rather than in the form tag,
and the javascript code could be loaded from a 'js' file
stored in a directory that is barred to Google's spider
by the robots.txt file.
Javascript
Example:
<a href="javascript:goto('wherever')">Click
here</a>
Like the form action, it is sneaky to load the
javascript code, which contains the urls, from a
seperate 'js' file, and sneakier still if the file is
stored in a directory that is barred to googlebot by the
robots.txt file.
The "rel" attribute
As of 18th January 2005, Google, together with other
search engines, is recognising a new attribute to the
anchor tag. The attribute is "rel", and it is used as
follows:-
<a href="http://www.domain.com/somepage.html"
rel="nofollow">link text</a>
The attribute tells Google to ignore the link
completely. The link won't help the target page's
PageRank, and it won't help its rankings. It is as
though the link doesn't exist. With this attribute,
there is no longer any need for javascript, forms, or
any other method of hiding links from Google.
So how much additional PageRank
do we need to move up the toolbar?
First, let me explain in
more detail why the values shown in the Google toolbar
are not the actual PageRank figures. According to
the equation, and to the creators of Google, the
billions of pages on the web average out to a PageRank
of 1.0 per page. So the total PageRank on the web is
equal to the number of pages on the web * 1, which
equals a lot of PageRank spread around the web.
The Google toolbar range is from 1 to 10. (They
sometimes show 0, but that figure isn't believed to be a
PageRank calculation result). What Google does is divide
the full range of actual PageRanks on the web
into 10 parts - each part is represented by a value as
shown in the toolbar. So the toolbar values only show
what part of the overall range a page's PageRank is in,
and not the actual PageRank itself. The numbers in the
toolbar are just labels.
Whether or not the overall range is divided into 10
equal parts is a matter for debate - Google aren't
saying. But because it is much harder to move up a
toolbar point at the higher end than it is at the lower
end, many people (including me) believe that the
divisions are based on a logarithmic scale, or something
very similar, rather than the equal divisions of a
linear scale.
Let's assume that it is a logarithmic, base 10 scale,
and that it takes 10 properly linked new pages to move a
site's important page up 1 toolbar point. It will take
100 new pages to move it up another point, 1000 new
pages to move it up one more, 10,000 to the next, and so
on. That's why moving up at the lower end is much easier
that at the higher end.
In reality, the base is unlikely to be 10. Some people
think it is around the 5 or 6 mark, and maybe even less.
Even so, it still gets progressively harder to move up a
toolbar point at the higher end of the scale.
Note that as the number of pages on the web increases,
so does the total PageRank on the web, and as the total
PageRank increases, the positions of the divisions in
the overall scale must change. As a result, some pages
drop a toolbar point for no 'apparent' reason. If the
page's actual PageRank was only just above a division in
the scale, the addition of new pages to the web would
cause the division to move up slightly and the page
would end up just below the division. Google's index is
always increasing and they re-evaluate each of the pages
on more or less a monthly basis. It's known as the
"Google dance". When the dance is over, some pages will
have dropped a toolbar point. A number of new pages
might be all that is needed to get the point back after
the next dance.
The toolbar value is a good indicator of a page's
PageRank but it only indicates that a page is in a
certain range of the overall scale. One PR5 page could
be just above the PR5 division and another PR5 page
could be just below the PR6 division - almost a whole
division (toolbar point) between them.
Tips
Domain names and
Filenames To a spider,
www.domain.com/, domain.com/,
www.domain.com/index.html and
domain.com/index.html are
different urls and, therefore, different pages. Surfers
arrive at the site's home page whichever of the urls are
used, but spiders see them as individual urls, and it
makes a difference when working out the PageRank. It is
better to standardize the url you use for the site's
home page. Otherwise each url can end up with a
different PageRank, whereas all of it should have gone
to just one url.
If you think about it, how can a spider know the
filename of the page that it gets back when requesting
www.domain.com/ ? It can't.
The filename could be index.html, index.htm, index.php,
default.html, etc. The spider doesn't know. If you link
to index.html within the site, the spider could compare
the 2 pages but that seems unlikely. So they are 2 urls
and each receives PageRank from inbound links.
Standardizing the home page's url ensures that the
Pagerank it is due isn't shared with ghost urls.
Example: Go to my
UK
Holidays and UK Holiday Accoommodation site - how's
that for a nice piece of link text ;). Notice that the
url in the browser's address bar contains "www.". If you
have the Google Toolbar installed, you will see that the
page has PR5. Now remove the "www." part of the url and
get the page again. This time it has PR1, and yet they
are the same page.
Actually, the PageRank is for the unseen frameset page.
When this article was first written, the non-www URL had
PR4 due to using different versions of the link URLs
within the site. It had the effect of sharing the page's
PageRank between the 2 pages (the 2 versions) and,
therefore, between the 2 sites. That's not the best way
to do it. Since then, I've tidied up the internal
linkages and got the non-www version down to PR1 so that
the PageRank within the site mostly stays in the "www."
version, but there must be a site somewhere that links
to it without the "www." that's causing the PR1.
Imagine the page, www.domain.com/index.html.
The index page contains links to several relative urls;
e.g. products.html and
details.html. The spider sees
those urls as www.domain.com/products.html
and www.domain.com/details.html.
Now let's add an absolute url for another page, only
this time we'll leave out the "www." part -
domain.com/anotherpage.html.
This page links back to the index.html page, so the
spider sees the index pages as
domain.com/index.html. Although it's the same
index page as the first one, to a spider, it is a
different page because it's on a different domain. Now
look what happens. Each of the relative urls on the
index page is also different because it belongs to the
domain.com/ domain.
Consequently, the link stucture is wasting a site's
potential PageRank by spreading it between ghost pages.

Adding new pages
There is a possible negative effect of adding new pages.
Take a perfectly normal site. It has some inbound links
from other sites and its pages have some PageRank. Then
a new page is added to the site and is linked to from
one or more of the existing pages. The new page will, of
course, aquire PageRank from the site's existing pages.
The effect is that, whilst the total PageRank in the
site is increased, one or more of the existing pages
will suffer a PageRank loss due to the new page making
gains. Up to a point, the more new pages that are added,
the greater is the loss to the existing pages. With
large sites, this effect is unlikely to be noticed but,
with smaller ones, it probably would. So, although
adding new pages does increase the total PageRank within
the site, some of the site's pages will lose PageRank as
a result. The answer is to link new pages is such a way
within the site that the important pages don't suffer,
or add sufficient new pages to make up for the effect
(that can sometimes mean adding a large number of new
pages), or better still, get some more inbound links.
Miscellaneous
The Google toolbar
If you have the Google toolbar installed in your
browser, you will be used to seeing each page's PageRank
as you browse the web. But all isn't always as it seems.
Many pages that Google displays the PageRank for haven't
been indexed in Google and certainly don't have any
PageRank in their own right. What is happening is that
one or more pages on the site have been indexed and a
PageRank has been calculated. The PageRank figure for
the site's pages that haven't been indexed is allocated
on the fly - just for your toolbar. The PageRank itself
doesn't exist.
It's important to know this so that you can avoid
exchanging links with pages that really don't have any
PageRank of their own. Before making exchanges, search
for the page on Google to make sure that it is indexed.
Sub-directories
Some people believe that Google drops a page's PageRank
by a value of 1 for each sub-directory level below the
root directory. E.g. if the value of pages in the root
directory is generally around 4, then pages in the next
directory level down will be generally around 3, and so
on down the levels. Other people (including me) don't
accept that at all. Either way, because some spiders
tend to avoid deep sub-directories, it is generally
considered to be beneficial to keep directory structures
shallow (directories one or two levels below the root).
ODP and Yahoo!
It used to be thought that Google gave a Pagerank boost
to sites that are listed in the Yahoo! and ODP (a.k.a.
DMOZ) directories, but these days general opinion is
that they don't. There is certainly a PageRank gain for
sites that are listed in those directories, but the
reason for it is now thought to be this:-
Google spiders the directories just like any other site
and their pages have decent PageRank and so they are
good inbound links to have. In the case of the ODP,
Google's directory is a copy of the ODP directory. Each
time that sites are added and dropped from the ODP, they
are added and dropped from Google's directory when they
next update it. The entry in Google's directory is yet
another good, PageRank boosting, inbound link. Also, the
ODP data is used for searches on a myriad of websites -
more inbound links!
Listings in the ODP are free but, because sites are
reviewed by hand, it can take quite a long time to get
in. The sooner a working site is submitted, the better.
For tips on submitting to DMOZ. |
| |
|