Robots

Web Robots (also known as Bots, Crawlers, or Spiders), are programs that traverse the Web. Search engines such as Google & Bing etc use them to index web content, spammers use them to scan for email addresses, and they have many other uses.

We use Robots to tell the search engine what to index and what not to index.

We can place a meta tag in the head section of a web page
e.g.
<html>
<head>
<title>…</title>
<meta name=”robots” content=”noindex,nofollow”>
</head>

or

<html>
<head>
<title>…</title>
<meta name=”robots” content=”index,follow,all”>
</head>

Remember we can restrict access to the whole site, a directory or even just a web page in our Robots.txt file

Robots.txt is hosted in the root directory of your website (e.g. public_html)

This is how we structure our robots.txt file…

User-agent: *
Disallow: /members/
Disallow: /images/
Disallow: /downloads/file1.html

there is no need to tell robots which files to allow, if they arent disallowed it will assume to allow

User agent * means all robots

to specify the robot use e.g.
User-agent: Google

Back to SEO Training Index

Get FREE SEO Training & our SEO Newsletter