Crawlers & indexing · 9 tools
Crawlers & indexing tools
The plumbing that decides whether Google crawls, indexes, and consolidates your pages correctly. These 9 tools generate, test, and validate: robots.txt (per RFC 9309), XML sitemaps (per sitemaps.org), canonical tags, noindex directives, hreflang matrices, and redirect rule syntax for Apache/Nginx/Next.js/Netlify.
Crawlers & indexing
Robots.txt Generator
Build crawler rules visually
Crawlers & indexing
Robots.txt Tester
Is this URL allowed?
Crawlers & indexing
Sitemap Generator
Build valid sitemap.xml
Crawlers & indexing
XML Sitemap Parser
Analyze sitemap structure & issues
Crawlers & indexing
Sitemap Validator
Validate sitemap.xml
Crawlers & indexing
Canonical Tag
rel=canonical link tag
Crawlers & indexing
Noindex Checker
Find all indexability directives
Crawlers & indexing
hreflang Generator
Multi-region language tags
Crawlers & indexing
Redirect Rule Generator
.htaccess, Nginx, Next.js & more
About these tools
Crawlers & indexing questions
- What is RFC 9309 and why does it matter for robots.txt?
- RFC 9309 (published September 2022) formalized the Robots Exclusion Protocol as an official internet standard after 28 years as a de-facto convention. Key requirements it codifies: (1) file at /robots.txt, served text/plain, UTF-8; (2) max 500KiB; (3) specific Disallow/Allow precedence rules; (4) User-agent matching is case-insensitive. Our Robots.txt Tester and Generator both honor RFC 9309 strictly.
- Should I use robots.txt or noindex to block a page?
- Noindex (meta or X-Robots-Tag) — always, if the goal is 'not in the index.' Robots.txt only blocks crawling, not indexing: if the page has inbound links, Google can still index it based on anchor text alone (no content). Use robots.txt for: reducing crawl budget on low-value URLs, blocking sensitive endpoints. Use noindex for: filter pages, staging, low-quality content.
- How do I implement hreflang for 20+ regional variants?
- Every page must reference every other regional variant — including itself — plus an x-default fallback. For 20 variants, that's 21 <link> tags per page (20 regions + x-default). At scale, use HTTP headers instead of <link> tags (served from Cloudflare Workers or Next.js middleware). Our hreflang Generator outputs both formats for any number of URL+region pairs. Codes must be ISO 639-1 + ISO 3166-1 (IETF BCP 47).
- What's the right canonical setup for pagination?
- Self-referencing: each page 2, 3, etc. has rel=canonical pointing to itself (NOT to page 1). Google confirmed in 2019 they ignore rel=prev/next and handle pagination automatically via other signals. Canonical to page 1 causes Google to drop all paginated content from the index. Exception: if pages 2+ are near-duplicate thin content (which shouldn't exist), then consolidate.
More in Technical SEO