How to create robots.txt file for Website

Learn how to create a perfect robots.txt file for your website with examples, best practices, and SEO tips to control search engine crawling.

Published on 06 March 2026
Reading Time 4
Number of Words 864

How to create robots.txt file for Website

What is robots.txt

  • A robots.txt file is a plain text file placed in the root of your website that gives instructions to web crawlers (like Googlebot) about what pages or sections of your site should or should not be crawled.

  • It follows the Robots Exclusion Standard (sometimes also called the Robots Exclusion Protocol).

  • It’s not a security measure. Crawlers may ignore it, and it doesn’t prevent pages from being indexed under certain conditions. If you need to keep something private or prevent indexing, use meta tags (noindex), password protection, or other methods.


Why Use a robots.txt File

Here are common reasons you might need one:

  • To prevent crawling of duplicate content (e.g. print views, staging sites, admin or backend pages) so you don’t waste crawl budget.

  • To speed up crawling by telling bots to ignore parts of your site that aren’t useful for search (e.g. scripts, styles, images if you serve them via CDN or don’t need them crawled).

  • To point to your sitemap(s) so crawlers can discover all your important content.


Where to Put the robots.txt File & Basic Rules

  1. Name and Location

    • The file must be named exactly robots.txt.

    • It must be placed at the root of the host (e.g. https://www.yoursite.com/robots.txt). If it’s in a subfolder, many crawlers will ignore it.

  2. Encoding and Format

    • Plain text, UTF-8 encoded.

    • Use simple ASCII characters; avoid fancy quotes or formatting that could break parsing.

  3. Validity Scope

    • The file affects only the host, protocol (HTTP or HTTPS), and port it’s placed on. For example, rules in https://www.example.com/robots.txt won’t apply to http://example.com/ or https://subdomain.example.com/.

  4. Size Limitation

    • Google limits robots.txt to 500 KiB (about 512,000 bytes). Anything beyond that is ignored.


Syntax & Key Directives

Here are the main directives you’ll see / use in robots.txt:

Directive Purpose Example syntax
User-agent: Specifies which crawler(s) the rule applies to, e.g. Googlebot, *. User-agent: *
Disallow: Paths the user-agent cannot crawl. Disallow: /private/
Allow: Paths the user-agent can crawl, even if parts of parent directory are disallowed. Allow: /private/public-info.html
Sitemap: Where your sitemap is located (helps crawlers find pages faster). Sitemap: https://www.example.com/sitemap.xml
  • You can use the wildcard * for matching all crawlers.

  • Comments start with # and are ignored. Useful to annotate your file.


Step-by-Step: How to Create & Deploy a robots.txt

Here’s a practical workflow:

  1. Open a plain text editor
    Use something simple like Notepad (Windows), TextEdit (Mac in plain-text mode), or any code editor (VS Code, Sublime, etc.).

  2. Write the rules
    Decide what you want crawlers to do. Here are some example scenarios:

    • Allow everything (default behavior):

      User-agent: * Disallow: Sitemap: https://www.yoursite.com/sitemap.xml

    • Disallow a private/admin area:

      User-agent: * Disallow: /admin/

    • Disallow all bots completely (use only if site is under development):

      User-agent: * Disallow: /

    • Different rules for different bots:

      User-agent: Googlebot Allow: / User-agent: * Disallow: /beta/

  3. Save the file
    As robots.txt, ensure it’s plain-text, UTF-8 encoded.

  4. Upload to your server root
    Using FTP, SFTP, control panel, or your hosting provider’s file manager. The path should make the file reachable at https://yourdomain.com/robots.txt.

  5. Test your robots.txt

    • Visit https://yourdomain.com/robots.txt in browser to verify it appears.

    • Use tools like Google Search ConsoleRobots Testing Tool to see whether rules are working as intended.

  6. Monitor and update as needed
    If you add new sections, URLs, or change your site structure, revisit your robots.txt rules. Use Search Console to detect crawl issues.


Best Practices & Common Mistakes

Best Practice Why It’s Important
Only block what you really need to block Overblocking (e.g. disallowing CSS/JS) can prevent Google from rendering pages correctly.
Don’t use robots.txt to prevent indexing of URLs you want hidden via search results – use noindex meta tags instead If a page is disallowed from crawling, Google might still index the URL (without content snippet).
Include your sitemap(s) Helps crawlers discover content more efficiently.
Keep the file small and simple To stay under size limits and avoid misconfigurations.
Use comments to document purpose of rules Helps long-term maintenance.

Common Mistakes

  • Placing robots.txt in the wrong location (subfolder instead of root) → crawlers ignore it.

  • Naming the file incorrectly (e.g. robot.txt, robots.text) → won’t be recognized.

  • Blocking essential CSS or JS so pages can’t render properly. This can harm SEO.

  • Relying only on robots.txt to hide sensitive content → it’s public and not enforced by all bots.


Advanced Use Cases & Recent Updates

  • Google’s interpretation of the standard is continually updated. As of 2025, Google supports the standard directive set (User-agent, Allow, Disallow, Sitemap). It ignores unused or invalid lines.

  • Caching: Google caches robots.txt for up to ~24 hours (but may be longer in some conditions).

  • Blocking AI or Data-Collection Bots: There is increasing attention on how AI bots versus search engine bots respect robots.txt. Some platforms or services are introducing content-signal policies (e.g., via Cloudflare) to allow site owners to state whether content can be used for AI-training, etc. But many crawlers may ignore such signals. This is still evolving.


Example robots.txt Files

Here are several example files depending on different scenarios:

  1. All pages crawlable, sitemap included

    User-agent: * Disallow: Sitemap: https://www.example.com/sitemap.xml

  2. Block entire site (development mode) except for sitemap

    User-agent: * Disallow: / Sitemap: https://www.example.com/sitemap.xml

  3. Block admin areas, allow everything else

    User-agent: * Disallow: /admin/ Disallow: /user-settings/ # allow specific AJAX script even if inside admin Allow: /admin/ajax-script.js Sitemap: https://www.example.com/sitemap.xml

  4. Different rules for specific bots

    User-agent: Googlebot Allow: / User-agent: Bingbot Disallow: /private-bing/ User-agent: * Disallow: /not-for-any-bot/ Sitemap: https://www.example.com/sitemap.xml


Summary

  • A robots.txt file is essential for guiding crawlers — telling them what to crawl and what to ignore.

  • Place it at the root, name it properly, use the correct syntax, and keep it small and clean.

  • Use other tools (meta tags, sitemaps, secure access) when needed.

  • Always test via Search Console and monitor for crawl issues.

  • Be aware of new developments (AI-related policies, etc.) and adjust if required.