How to create robots.txt file for Website

Published on 22 July 2026
Reading Time 4
Number of Words 864

Learning SEO SEO for Beginners and Advanced

What is `robots.txt`

A robots.txt file is a plain text file placed in the root of your website that gives instructions to web crawlers (like Googlebot) about what pages or sections of your site should or should not be crawled.
It follows the Robots Exclusion Standard (sometimes also called the Robots Exclusion Protocol).
It’s not a security measure. Crawlers may ignore it, and it doesn’t prevent pages from being indexed under certain conditions. If you need to keep something private or prevent indexing, use meta tags (noindex), password protection, or other methods.

Here are common reasons you might need one:

To prevent crawling of duplicate content (e.g. print views, staging sites, admin or backend pages) so you don’t waste crawl budget.
To speed up crawling by telling bots to ignore parts of your site that aren’t useful for search (e.g. scripts, styles, images if you serve them via CDN or don’t need them crawled).
To point to your sitemap(s) so crawlers can discover all your important content.

Name and Location
- The file must be named exactly robots.txt.
- It must be placed at the root of the host (e.g. https://www.yoursite.com/robots.txt). If it’s in a subfolder, many crawlers will ignore it.
Encoding and Format
- Plain text, UTF-8 encoded.
- Use simple ASCII characters; avoid fancy quotes or formatting that could break parsing.
Validity Scope
- The file affects only the host, protocol (HTTP or HTTPS), and port it’s placed on. For example, rules in https://www.example.com/robots.txt won’t apply to http://example.com/ or https://subdomain.example.com/.
Size Limitation
- Google limits robots.txt to 500 KiB (about 512,000 bytes). Anything beyond that is ignored.

Here are the main directives you’ll see / use in robots.txt:

Directive	Purpose	Example syntax
`User-agent:`	Specifies which crawler(s) the rule applies to, e.g. `Googlebot`, `*`.	`User-agent: *`
`Disallow:`	Paths the user-agent cannot crawl.	`Disallow: /private/`
`Allow:`	Paths the user-agent can crawl, even if parts of parent directory are disallowed.	`Allow: /private/public-info.html`
`Sitemap:`	Where your sitemap is located (helps crawlers find pages faster).	`Sitemap: https://www.example.com/sitemap.xml`

Here’s a practical workflow:

Open a plain text editor
Use something simple like Notepad (Windows), TextEdit (Mac in plain-text mode), or any code editor (VS Code, Sublime, etc.).
Write the rules
Decide what you want crawlers to do. Here are some example scenarios:
- Allow everything (default behavior):
  
  User-agent: * Disallow: Sitemap: https://www.yoursite.com/sitemap.xml
- Disallow a private/admin area:
  
  User-agent: * Disallow: /admin/
- Disallow all bots completely (use only if site is under development):
  
  User-agent: * Disallow: /
- Different rules for different bots:
  
  User-agent: Googlebot Allow: / User-agent: * Disallow: /beta/
Save the file
As robots.txt, ensure it’s plain-text, UTF-8 encoded.
Upload to your server root
Using FTP, SFTP, control panel, or your hosting provider’s file manager. The path should make the file reachable at https://yourdomain.com/robots.txt.
Test your robots.txt
- Visit https://yourdomain.com/robots.txt in browser to verify it appears.
- Use tools like Google Search Console → Robots Testing Tool to see whether rules are working as intended.
Monitor and update as needed
If you add new sections, URLs, or change your site structure, revisit your robots.txt rules. Use Search Console to detect crawl issues.

Best Practice	Why It’s Important
Only block what you really need to block	Overblocking (e.g. disallowing CSS/JS) can prevent Google from rendering pages correctly.
Don’t use `robots.txt` to prevent indexing of URLs you want hidden via search results – use `noindex` meta tags instead	If a page is disallowed from crawling, Google might still index the URL (without content snippet).
Include your sitemap(s)	Helps crawlers discover content more efficiently.
Keep the file small and simple	To stay under size limits and avoid misconfigurations.
Use comments to document purpose of rules	Helps long-term maintenance.

Placing robots.txt in the wrong location (subfolder instead of root) → crawlers ignore it.
Naming the file incorrectly (e.g. robot.txt, robots.text) → won’t be recognized.
Blocking essential CSS or JS so pages can’t render properly. This can harm SEO.
Relying only on robots.txt to hide sensitive content → it’s public and not enforced by all bots.

Google’s interpretation of the standard is continually updated. As of 2025, Google supports the standard directive set (User-agent, Allow, Disallow, Sitemap). It ignores unused or invalid lines.
Caching: Google caches robots.txt for up to ~24 hours (but may be longer in some conditions).
Blocking AI or Data-Collection Bots: There is increasing attention on how AI bots versus search engine bots respect robots.txt. Some platforms or services are introducing content-signal policies (e.g., via Cloudflare) to allow site owners to state whether content can be used for AI-training, etc. But many crawlers may ignore such signals. This is still evolving.

Here are several example files depending on different scenarios:

All pages crawlable, sitemap included

User-agent: * Disallow: Sitemap: https://www.example.com/sitemap.xml
Block entire site (development mode) except for sitemap

User-agent: * Disallow: / Sitemap: https://www.example.com/sitemap.xml
Block admin areas, allow everything else

User-agent: * Disallow: /admin/ Disallow: /user-settings/ # allow specific AJAX script even if inside admin Allow: /admin/ajax-script.js Sitemap: https://www.example.com/sitemap.xml
Different rules for specific bots

User-agent: Googlebot Allow: / User-agent: Bingbot Disallow: /private-bing/ User-agent: * Disallow: /not-for-any-bot/ Sitemap: https://www.example.com/sitemap.xml

A robots.txt file is essential for guiding crawlers — telling them what to crawl and what to ignore.
Place it at the root, name it properly, use the correct syntax, and keep it small and clean.
Use other tools (meta tags, sitemaps, secure access) when needed.
Always test via Search Console and monitor for crawl issues.
Be aware of new developments (AI-related policies, etc.) and adjust if required.