Why did Google still index a URL I disallowed?

Robots.txt blocks crawling, not indexing. If other sites link to a blocked URL, Google can still include it in the index (with just the URL and no snippet). To truly de-index, use a meta robots noindex tag and allow crawling to see it.

What's the difference from Google's Search Console tester?

Search Console tests against your LIVE robots.txt on Google's servers. This tool tests against any robots.txt you paste — useful for testing drafts, competitors, or historical versions you've exported.

Does this handle Crawl-delay?

No — Google ignores Crawl-delay entirely (set crawl speed in Search Console instead). Bing and Yandex honor it. The directive parses but has no effect on allow/block decisions.

Why is my wildcard rule not matching?

Most common bug: forgetting that patterns anchor to the start of the path. 'Disallow: /pdf' matches /pdf and /pdfs but NOT /my/pdf. Use '/*pdf' to match anywhere in the path.

Technical SEO

Robots.txt Tester

Paste a robots.txt and any URL to test whether it's allowed or blocked — for any user-agent. Handles wildcards, end-of-URL markers, and most-specific-wins rule precedence just like Google.

How to use this tool3 quick steps

Get your robots.txt
Visit https://yoursite.com/robots.txt in a browser and copy the contents. Or paste the rules you are about to deploy.
Pick a URL to test
The full URL you want to know whether crawlers can fetch. Include the protocol (https://).
Read the verdict
We apply the User-agent + Allow/Disallow rules in priority order and tell you exactly which rule matched (or that no rule applied — meaning crawlable by default).

Inputrobots.txt + URL to test· Paste robots.txt and the URL you want to check

robots.txt contents

Paste the full robots.txt text. We parse User-agent groups, Allow/Disallow rules, and Sitemap directives.

URL or path to test

User agent

OutputCrawl verdict· Awaiting robots.txt + URL

Use this with

See all 9 tools

Robots.txt Generator

Build crawler rules visually

Sitemap Generator

Build valid sitemap.xml

XML Sitemap Parser

Analyze sitemap structure & issues

Robots.txt Testing Guide

Catch crawl bugs before they kill your rankings

A broken robots.txt is the fastest way to disappear from Google. One misplaced Disallow: / line wipes millions of URLs from the index. Test changes before pushing to production — every senior SEO has a story about an intern shipping a robots.txt change at 5pm on a Friday.

Most-specific rule wins

When multiple rules match, Google picks the one with the longest pattern. `Allow: /api/public/` beats `Disallow: /api/` because it's more specific. This tool uses Google's exact precedence rules.

Wildcards (*)

Asterisks match any sequence — including zero chars. 'Disallow: /*.pdf$' blocks every PDF on the site. Use carefully; overly broad wildcards can block far more than you intend.

End-of-URL markers ($)

The dollar sign anchors to the end of the URL. 'Disallow: /*.pdf$' blocks /report.pdf but NOT /report.pdfa or /report.pdf?v=2. Essential for file-type rules.

User-agent matching

Google picks the most specific UA block that matches the crawler. 'User-agent: Googlebot' beats 'User-agent: *' for Googlebot requests. If no block matches, default is allow-all.

Empty Disallow = allow-all

'Disallow:' with no value is an explicit allow-all (a legacy way to declare 'we have no restrictions for this UA'). 'Disallow: /' is the opposite — block everything.

Case sensitivity

Robots rules are case-sensitive. 'Disallow: /Admin' does NOT block /admin. Mirror your actual URL casing — or use wildcards like '/*admin' to catch both.

Pro Tips

Test before deploying

Any edit to robots.txt should be tested against your top 10 URLs before going live. This tool works without a network round-trip — safe to use pre-commit.

Watch for trailing slashes

'Disallow: /admin' blocks both /admin and /admin/login. 'Disallow: /admin/' blocks only paths under /admin/ but allows /admin itself. Know which you want.

Robots.txt is NOT a security tool

Disallow hides URLs from crawlers but does NOT hide them from humans. Sensitive paths should require authentication — not robots.txt blocking.

Frequently Asked Questions

Why did Google still index a URL I disallowed?: Robots.txt blocks crawling, not indexing. If other sites link to a blocked URL, Google can still include it in the index (with just the URL and no snippet). To truly de-index, use a meta robots noindex tag and allow crawling to see it.
What's the difference from Google's Search Console tester?: Search Console tests against your LIVE robots.txt on Google's servers. This tool tests against any robots.txt you paste — useful for testing drafts, competitors, or historical versions you've exported.
Does this handle Crawl-delay?: No — Google ignores Crawl-delay entirely (set crawl speed in Search Console instead). Bing and Yandex honor it. The directive parses but has no effect on allow/block decisions.
Why is my wildcard rule not matching?: Most common bug: forgetting that patterns anchor to the start of the path. 'Disallow: /pdf' matches /pdf and /pdfs but NOT /my/pdf. Use '/*pdf' to match anywhere in the path.