Glossary

What Is robots.txt?

robots.txt is a text file placed at the root of a website (example.com/robots.txt) that tells search engine crawlers which pages they're allowed or not allowed to crawl. It's a suggestion, not an enforcement — well-behaved crawlers respect it.

Why It Matters

robots.txt controls how search engines discover and index your content. For most forwarded domains, robots.txt is irrelevant — the domain responds with a 301 redirect before any crawling happens. But understanding robots.txt helps when planning SEO strategies around domain forwarding.

robots.txt Basics

# Allow everything
User-agent: *
Allow: /

# Block everything
User-agent: *
Disallow: /

# Block specific path
User-agent: *
Disallow: /private/

# Reference sitemap
Sitemap: https://example.com/sitemap.xml

For pages you want crawled but not indexed by search engines, use a noindex directive instead of robots.txt Disallow. Note that excessive redirects or blocked pages waste crawl budget — crawlers have limited resources per site.

robots.txt vs Domain Forwarding

Scenariorobots.txt Needed?Why
Domain fully forwarded (301)NoCrawlers follow the redirect
Active websiteYesControls crawler behavior
URL masking (iframe)MaybeThe masked page is served, not redirected
Partial forwarding (some paths)MaybeNon-forwarded paths may need crawler control

For Active Sites (Not Forwarded)

If you’re managing the destination site (where traffic is forwarded to), ensure its robots.txt is properly configured:

  • Allow crawlers to access important pages
  • Reference your XML sitemap
  • Don’t accidentally block redirected URLs from being indexed

Related Terms

Frequently
asked questions

Not typically. Since the domain redirects all traffic (including crawlers), search engines follow the redirect to your destination site, where the destination's robots.txt applies.

Still Confused? Try It Free.

Set up your first domain forward in under 5 minutes. Free plan includes 5 domains.