Robots.txt Check

The robots.txt check validates whether your website has a proper robots.txt file that instructs search engine crawlers how to access and index your content.

Imagine if your website was a building and search engines were visitors: robots.txt is like a sign at the entrance telling them which rooms they can explore and which ones are off-limits.

What this check validates

  • File exists - robots.txt file is present at /robots.txt
  • Proper format - Follows standard robots.txt syntax
  • Valid directives - Uses correct User-agent and Disallow rules
  • Accessibility - File is publicly accessible and not blocked
  • Sitemap reference - Includes sitemap location if available

Why Robots.txt matters

  • Crawl Control: Guides search engines on which pages to crawl
  • Resource Management: Prevents crawling of unnecessary pages
  • SEO Optimization: Ensures important content gets crawled first
  • Privacy Protection: Blocks access to sensitive directories

What Robots.txt looks like

Basic robots.txt file structure:

# Allow all crawlers to access everything
User-agent: *
Disallow:

# Block access to admin areas
Disallow: /admin/
Disallow: /private/

# Reference to sitemap
Sitemap: https://yoursite.com/sitemap.xml

Common Rules

# Allow all bots
User-agent: *
Disallow:

# Block all bots
User-agent: *
Disallow: /

# Block specific directories
User-agent: *
Disallow: /admin/
Disallow: /tmp/
Disallow: /private/

Common issues

  • Missing File: No robots.txt file found at the root domain
  • Syntax Errors: Incorrect formatting or invalid directives
  • Over-blocking: Accidentally blocking important content
  • Wrong Location: File not placed at domain root (/robots.txt)
  • Missing Sitemap: No sitemap reference included