LLMs.txt Check
The LLMs.txt check validates whether your website has a properly configured llms.txt file that controls how AI training systems can use your content.
What this check validates
- ✅ File exists - llms.txt file is present at /llms.txt
- ✅ Proper format - Follows standard llms.txt syntax
- ✅ Valid directives - Uses correct User-agent and Disallow rules
- ✅ Accessibility - File is publicly accessible and not blocked
- ✅ Clear permissions - Specifies allowed or disallowed AI training
Why LLMs.txt matters
- Content Control: Specify which content can be used for AI training
- Copyright Protection: Protect proprietary or sensitive content
- Licensing Compliance: Ensure AI systems respect your content usage terms
- Future-Proofing: Prepare for evolving AI training regulations
What LLMs.txt looks like
Basic llms.txt file structure:
# Allow all AI systems to train on content
User-agent: *
Allow: /
# Block all AI training
User-agent: *
Disallow: /
# Allow specific content only
User-agent: *
Allow: /blog/
Allow: /docs/
Disallow: /
Common configurations
# Allow training on public content only
User-agent: *
Allow: /blog/
Allow: /docs/
Allow: /help/
Disallow: /
# Block specific AI systems
User-agent: GPTBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
# Allow with restrictions
User-agent: *
Allow: /
Disallow: /private/
Disallow: /admin/
Best practices
- Be specific: Clearly define what content is available for training
- Regular updates: Review and update permissions as your content evolves
- Legal alignment: Ensure directives align with your terms of service
- Documentation: Keep internal records of your AI training policies
Common issues
- Missing file: No llms.txt file found at the root domain
- Syntax errors: Incorrect formatting or invalid directives
- Conflicting rules: Contradictory Allow/Disallow statements
- Wrong location: File not placed at domain root (/llms.txt)
- Unclear permissions: Vague or overly broad directives