How can you remove a URL from Google’s search results?
There are a number of cases where you may not want pages appearing in the SERPS, and this blog post discusses the different ways we can do this.
The main ways to keep a page out of the search results are:
- Noindex tags
- Blocking on robots.txt
- Deleting the page
- Google Search Console’s Removals Tool.
- Canonical tags
What kind of content would we not want to appear in the SERPs?
There are a number of different types of pages we would not want to be searchable on Google or other search engines.
- PPC landing pages
- Thank you pages
- Admin pages
- Internal search results
We may also want to hide pages from Google for a number of reasons including:
- Page duplication – To prevent numerous versions of the same page from appearing in the search results.
- Keyword cannibalisation – To stop two or more similar pages from competing with each other for a particular keyword
- Crawl budget wastage – I will discuss Crawling in this section, but this refers to Google spending too much time discovering lower value pages on your site, rather than prioritising the important stuff.
How does Google find content to appear in the search results?
Before we dive into the different ways we can prevent pages from appearing in the search results, it’s worth understanding the process that Google uses to find and ultimately rank pages.
1) Crawling – This is Google’s way of discovering new content. Using programs, often referred to as spiders or crawlers, Google visits different web pages and follows the links on them to find new pages. Each site has a certain “crawl budget” or amount of resources it allocates to each site.
2) Indexing – Once Google has found the content, it maintains a copy of that content and stores it in what is called an index.
3) Rankings – The ordering of these different pages in the search results is known as ranking. Google gets a query, figures out the search intent behind that query, and then looks to the index to return the best possible results.
Google uses a range of different calculations, known as algorithms, to determine which are the best results to serve and orders them from most relevant to least relevant.
How can we control what pages rank in the search results?
Noindex tags are a directive which tells Google “I do not want this page to be indexed and therefore do not want it to appear in the search results.”
When Google next crawls that page and sees the noindex directives, it will remove that page from it’s index and therefore the search results.
These noindex tags can be implemented in two ways:
- By including them in the page’s HTML code
- By returning a noindex header in the HTTP request.
Noindex tags implemented in the HTML would look something like this:
<meta name="robots" content="noindex">
Noindex tags implemented via HTTP header would look like this:
HTTP/... 200 OK
CMS platforms, such as WordPress, allow you to add noindex tags to pages, which means you wouldn’t need a developer to implement this.
Importantly, Google will need to be able to crawl these pages in order to see the “noindex” tag and then remove the page from it’s index.
When to use noindex tags – If there are pages on your site that still serve a purpose, but you do not want to appear in the search results this is a good option.
Blocking in Robots txt
Robots.txt is a text file used to instruct web robots how to behave when they visit your site and can be used to dictate to search engine crawlers whether they can or cannot crawl parts of a website.
See the below example of Nike’s robots.txt file that lives at https://www.nike.com/robots.txt
Using robots.txt to block certain page paths such as /admin/, for example, means that Googlebot or other search crawlers won’t even visit these pages – hence they won’t appear in the search results. This can preserve crawl budget for more important pages rather than focusing on less important pages.
Note: blocking a page path in robots.txt stops Google from saving the page in the first place, but it doesn’t delete or change what has been saved. Therefore, if a page is already appearing in the search results, then Google has already crawled and then indexed this page.
If you need a page deleted, then blocking it in robots.txt will actively prevent that from happening. In that case, the best thing to do is add a noindex tag to remove these pages from Google’s index and once they are all removed, you can then block in robots.txt.
More details can be found on this Google Search Central page.
When to block pages in robots.txt – When you have specific page paths or larger sections of your site that you do not want Google to crawl, this is your best bet.
If a page or collection of pages is already appearing in the SERPs, though, you’ll need to noindex them first and wait for them to be removed before adding the robots.txt file.
Deleting the page
The most obvious answer, you may have thought, would be to simply delete the page whether that’s by giving it a 404 or a 410 status code.
Both status codes serve the same function in that Google will remove the page from it’s index when it next crawls that page, though a 410 status may be slightly quicker according to Google’s John Mueller.
From an SEO perspective, if these pages hold value, whether that be through backlinks or traffic, it would be worth 301 redirecting to a relevant page in order to consolidate that link equity on the site.
Alternatively, if the page has internal links and you do not have an appropriate page to redirect to, these internal links should be removed or replaced with a 200 status code page.
When to delete a page – If the page serves no purpose and has little value in terms of backlinks or traffic it may be worth deleting. If there is some value either from a user perspective or an SEO perspective, consider keeping it with a noindex tag or 301 redirecting to a relevant page.
Google Search Console’s Removals Tool
Google Search Console’s Removal Tool can be used to temporarily block search results from your site for sites that you own on Google Search Console. It’s worth noting that this is not a permanent fix.
If you want to quickly remove a page from the search results, this is a good option. If you want to permanently remove a page Google recommends either giving it a 404 or 410 status, blocking access to the content by using a password or giving the page a noindex tag.
More details can be found on this Google Webmasters page.
When to use Google Search Console’s Removals Tool – When you need to get rid of a page quickly. If you need to remove the page permanently, use a noindex tag or give it a 404 or 410 status.
A canonical tag is a snippet of HTML code that lives in the <head> of the page and is used to define the primary version for pages that are similar or duplicates. Canonical tags help prevent issues caused by duplicate or near duplicate content appearing on multiple URLs.
See the example below of a canonical tag on the Brainlabs homepage:
If you canonicalise one page to another, you are saying that you do not want that page appearing in the search results and you would prefer another version of that page to appear instead.
As opposed to noindex tags which are orders, canonical tags can be ignored by Google. Google can still crawl these pages, see the canonical tags, and then decide whether or not the page should appear in the search results or not.
When to use canonical tags – Canonical tags should be used when there are several duplicate or similar pages ranking. You will want to canonicalise the non master versions to one primary version of a page to indicate to Google that the master version is the only version you would like in the search results. This will also consolidate the signals from each of these URLs onto the one master page.
A prime example for using canonical tags is for pages which have parameters. These pages can have exactly the same content but different URLs due to these parameters. Canonical tags can help to ensure the correct version of a page ranks, not any of the other versions.
There are a number of ways to remove or control what content appears in the search results. The key is ensuring that you are choosing the best option for your particular situation, not attempting to do them all at once!