What is Robots.txt?
The robots exclusion protocol (REP), or robots.txt is a text file webmasters create to instruct robots (typically search engine
robots) how to crawl and index pages on their website.
Cheat Sheet
Block all web crawlers from all content...
More
What is Robots.txt?
The robots exclusion protocol (REP), or robots.txt is a text file webmasters create to instruct robots (typically search engine
robots) how to crawl and index pages on their website.
Cheat Sheet
Block all web crawlers from all content
User-agent: * Disallow: /
Block a specific web crawler from a specific folder
User-agent: Googlebot Disallow: /no-google/
Block a specific web crawler from a specific web page
User-agent: Googlebot Disallow: /no-google/blocked-page.html
Allow a specific web crawler to visit a specific web page
Disallow: /no-bots/block-all-bots-except-rogerbot-page.html User-agent:
rogerbot Allow: /no-bots/block-all-bots-except-rogerbot-page.html
Sitemap Parameter
User-agent: * Disallow: Sitemap: http://www.example.com/none-standardlocation/sitemap.xml
Optimal Format
Robots.txt needs to be placed in the top-level directory of a web server in order to be useful. Example:
http:/www.example.com/robots.txt
What is Robots.txt?
The Robots Exclusion Protocol (
Less