Robots.txt Check
The robots.txt check validates whether your website has a proper robots.txt file that instructs search engine crawlers how to access and index your content.
Imagine if your website was a building and search engines were visitors: robots.txt is like a sign at the entrance telling them which rooms they can explore and which ones are off-limits.
What this check validates
- ✅ File exists - robots.txt file is present at /robots.txt
- ✅ Proper format - Follows standard robots.txt syntax
- ✅ Valid directives - Uses correct User-agent and Disallow rules
- ✅ Accessibility - File is publicly accessible and not blocked
- ✅ Sitemap reference - Includes sitemap location if available
Why Robots.txt matters
- Crawl Control: Guides search engines on which pages to crawl
- Resource Management: Prevents crawling of unnecessary pages
- SEO Optimization: Ensures important content gets crawled first
- Privacy Protection: Blocks access to sensitive directories
What Robots.txt looks like
Basic robots.txt file structure:
# Allow all crawlers to access everything
User-agent: *
Disallow:
# Block access to admin areas
Disallow: /admin/
Disallow: /private/
# Reference to sitemap
Sitemap: https://yoursite.com/sitemap.xml
Common Rules
# Allow all bots
User-agent: *
Disallow:
# Block all bots
User-agent: *
Disallow: /
# Block specific directories
User-agent: *
Disallow: /admin/
Disallow: /tmp/
Disallow: /private/
Common issues
- Missing File: No robots.txt file found at the root domain
- Syntax Errors: Incorrect formatting or invalid directives
- Over-blocking: Accidentally blocking important content
- Wrong Location: File not placed at domain root (/robots.txt)
- Missing Sitemap: No sitemap reference included