LLMs.txt Check

The LLMs.txt check validates whether your website has a properly configured llms.txt file that controls how AI training systems can use your content.

What this check validates

  • File exists - llms.txt file is present at /llms.txt
  • Proper format - Follows standard llms.txt syntax
  • Valid directives - Uses correct User-agent and Disallow rules
  • Accessibility - File is publicly accessible and not blocked
  • Clear permissions - Specifies allowed or disallowed AI training

Why LLMs.txt matters

  • Content Control: Specify which content can be used for AI training
  • Copyright Protection: Protect proprietary or sensitive content
  • Licensing Compliance: Ensure AI systems respect your content usage terms
  • Future-Proofing: Prepare for evolving AI training regulations

What LLMs.txt looks like

Basic llms.txt file structure:

# Allow all AI systems to train on content
User-agent: *
Allow: /

# Block all AI training
User-agent: *
Disallow: /

# Allow specific content only
User-agent: *
Allow: /blog/
Allow: /docs/
Disallow: /

Common configurations

# Allow training on public content only
User-agent: *
Allow: /blog/
Allow: /docs/
Allow: /help/
Disallow: /

# Block specific AI systems
User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

# Allow with restrictions
User-agent: *
Allow: /
Disallow: /private/
Disallow: /admin/

Best practices

  • Be specific: Clearly define what content is available for training
  • Regular updates: Review and update permissions as your content evolves
  • Legal alignment: Ensure directives align with your terms of service
  • Documentation: Keep internal records of your AI training policies

Common issues

  • Missing file: No llms.txt file found at the root domain
  • Syntax errors: Incorrect formatting or invalid directives
  • Conflicting rules: Contradictory Allow/Disallow statements
  • Wrong location: File not placed at domain root (/llms.txt)
  • Unclear permissions: Vague or overly broad directives