Question 1

What is RFC 9309 and why does it matter for robots.txt?

Accepted Answer

RFC 9309 (published September 2022) formalized the Robots Exclusion Protocol as an official internet standard after 28 years as a de-facto convention. Key requirements it codifies: (1) file at /robots.txt, served text/plain, UTF-8; (2) max 500KiB; (3) specific Disallow/Allow precedence rules; (4) User-agent matching is case-insensitive. Our Robots.txt Tester and Generator both honor RFC 9309 strictly.

Question 2

Should I use robots.txt or noindex to block a page?

Accepted Answer

Noindex (meta or X-Robots-Tag) — always, if the goal is 'not in the index.' Robots.txt only blocks crawling, not indexing: if the page has inbound links, Google can still index it based on anchor text alone (no content). Use robots.txt for: reducing crawl budget on low-value URLs, blocking sensitive endpoints. Use noindex for: filter pages, staging, low-quality content.

Question 3

How do I implement hreflang for 20+ regional variants?

Accepted Answer

Every page must reference every other regional variant — including itself — plus an x-default fallback. For 20 variants, that's 21 tags per page (20 regions + x-default). At scale, use HTTP headers instead of tags (served from Cloudflare Workers or Next.js middleware). Our hreflang Generator outputs both formats for any number of URL+region pairs. Codes must be ISO 639-1 + ISO 3166-1 (IETF BCP 47).

Question 4

What's the right canonical setup for pagination?

Accepted Answer

Self-referencing: each page 2, 3, etc. has rel=canonical pointing to itself (NOT to page 1). Google confirmed in 2019 they ignore rel=prev/next and handle pagination automatically via other signals. Canonical to page 1 causes Google to drop all paginated content from the index. Exception: if pages 2+ are near-duplicate thin content (which shouldn't exist), then consolidate.

Crawlers & indexing tools

Robots.txt Generator

Robots.txt Tester

Sitemap Generator

XML Sitemap Parser

Sitemap Validator

Canonical Tag

Noindex Checker

hreflang Generator

Redirect Rule Generator

Crawlers & indexing questions

Related sub-groups