ElderIndex
← Back to Glossary
Technical SEO

Robots.txt

A file that tells search engines which parts of your site they can or cannot crawl.

Robots.txt is a text file placed in the root directory of a website (at /robots.txt) that tells search engine crawlers which parts of the site they are permitted or forbidden to crawl. It is not a security mechanism, as it avoids preventing access to files and only instructs crawlers that respect the Robots Exclusion Standard to skip specified paths. Google's Googlebot respects robots.txt instructions by default.

A correct robots.txt for a care home website should allow crawlers to access all publicly visible pages and explicitly disallow access to any admin interfaces, staging environments, or private directories. Blocking pages that should be indexed, which is a common misconfiguration during website migrations, prevents Google from seeing content that should rank and is one of the more impactful technical SEO errors a site can have.

The robots.txt file also specifies the location of the XML sitemap, which helps crawlers find the sitemap without needing to guess its URL. A typical well-configured robots.txt for a care home website might contain two lines: a rule allowing all user agents, and a Sitemap reference pointing to the sitemap location.

Unlike the noindex tag, which prevents a page from being included in search results even if it has been crawled, robots.txt prevents crawling entirely. This distinction matters: if a page is disallowed in robots.txt but linked to from external sites, Google may still include the URL in search results, showing a result with no description, because it knows the URL exists from the external link, even though it cannot read the page content. For pages that should remain absent from search results, a noindex tag is more reliable than a robots.txt disallow.