Google Scholar SEO: Indexation & Ranking

In this post, we will be looking at Google Scholar, how to optimise for it and how this Google platform can contradict traditional SEO practices.

Previously, my knowledge of Google Scholar was limited to my university experience and so when I was handed the task of improving indexation and ranking for a client in Google Scholar,  I didn’t know where to start. I took to Google and my fellow Distillers to enlighten me. I bombarded John Mueller at Brighton SEO in hope of some directional guidance. There wasn’t a nice guide out there on how to improve presence and rankings in Google Scholar. Really,  it was down to me to figure this one out.

And so, I embarked on my exploration of the world of academic search engine optimisation (also known as ASEO).

What is Google Scholar?

Google Scholar is Google’s academic platform that allows researchers to publish papers and facilitates academic research and learning.  It publishes various sources of academic research, for example, books, journal articles, reports, universities and professional societies. It indexes scholarly articles from across the web bringing them all into one convenient place with related works, additional citations and author information.

For academic sites, publishers and researchers, Google Scholar is key to traffic performance. This platform requires traditional SEO optimisation balanced with Google Scholar specific optimisations.

Current visibility on Google Scholar

Unlike with standard SEO, there aren’t any magical tools out there for Google Scholar with ranking/indexation data.

The “site:” operator can provide a ballpark figure for indexation. It is important to note that using this method is not 100% accurate but if you want to compare your relative visibility to competitors, it can provide some insight.

For example, below we have two academic sites in Google Scholar. Using the “site:” operative, we can see oup.com has approximately 1,770,00 results indexed whereas, plos.org have 252,000 results indexed. This is a relative comparison and not fully representative of indexation in Google Scholar.

oup.com

Primary Technical SEO Checks

Before getting started with Google Scholar optimisations, ensure that your site is healthy from a technical SEO perspective. I won’t delve into all the technical checks in this blog post, you can use our checklist to technical SEO and read more about specific elements from our blog network.

Some fundamental SEO hygiene issues that are key to Google Scholar’s indexation:

  • Canonical tags
    • These should be used to associate old versions of an article and duplicate content.
    • See this post from Moz for more information.
  • Metadata
    • Metadata should be optimised for a given article or portal page.
    • Ensure your site does not have duplicated metadata or is missing metadata. Metadata is a clear signal to Google as to what a page is about.
    • Yoast has a nice post which takes you through key metadata elements.
  • H1
    • Similar to metadata, ensure H1s are unique and specific to a page. There should only ever be one H1 on a page.
  • Sitespeed
    • Google Scholar has that “an overly slow response from your website” can negatively impact indexation. You can test a site’s speed using Google’s Pagespeed Insights tool.
    • This guide from Moz explains what page speed is and provides some recommendations for SEO.

It is crucial for a site to be in a healthy state for standard Google search before it can tackle Google Scholar – this is worth the time investment.

How to optimise for Google Scholar?

Content

Google Scholar will only consider indexing content that is scholarly in nature. This could include:

  • Articles
  • Research papers
  • Conference papers
  • Technical reports
  • Abstracts
  • Dissertations

Either the full text of the article or the abstract has to be accessible to users and robots from the URL displayed in Google Search results. Access to the article/abstract needs to be free of charge, without a login and interstitial ads and/or further click-throughs.

Google Scholar can index both PDFs and HTML pages. In the case of PDFs, the abstract (or the entire article) should be accessible on the HTML version of that PDF.

Quick SEO Reminders

  • Place each article (along with associated abstract)  in its own, unique HTML page or PDF file.

Crawlability

As with standard Google Search, it is pertinent that Google Scholar can crawl your site, find article pages and decipher important content.

To ensure article pages can be reached, it is necessary to have a browser interface. The architecture of the site should ensure each page can be reached from the homepage using internal links. This can be done by:

  • Having author-specific pages listing all articles written by a given author
  • Ordering articles by dates
  • Using tags to group articles by topics or specific keywords

Find how to conduct an information architecture audit.

Quick SEO Reminders

  • Ensure crawlers have access to the site and article pages. Checking robots.txt file would be a good starting point.

Indexation: Scholarly Meta Tags

It is key that Google Scholar can identify bibliographic data for indexation. Citations are an influential ranking factor (which will be discussed later in this article) and therefore, Google Scholar needs to also understand the references made between articles. This is done using academic meta tags.

What are Scholarly Meta Tags?

Unlike standard SEO, scholarly articles require a number of academic specific HTML <meta> tags in the HTML source code of a page that enables Google Scholar to extract bibliographic data. These are equivalent to meta tags that are used in standard SEO (e.g <meta name=”description” content=”…”>).

There are various meta-tag schemes that these meta tags can be presented in, each format essentially does the same thing, what is important is that key information has been marked up by these tags for Google to identify – read more about these tags in Google’s guide.

Google Scholar accepts the following formats:

  • Highwire Press (citation_title)
  • Eprints (eprints.title)
  • BE Press  (bepress_citation_title)
  • PRISM (prism.title)
  • Dublin Core (DC.title)

These tags, at least, should be used  for:

  • The title tag (citation_title)
  • The author tag (citation_author)
  • Publication date (citation_publication_date)
  • An associated file such as a PDF file  (citation_pdf_url)

Throughout my research, I came across multiple academic sites using more than one meta tag format. For example, below we can see that Nature.com are using Highwire Press, PRISM and Dublin Core tags within the source code.

Nature.com

At present there doesn’t seem to be a one size fits all solution on the is matter. If your competitors are doing it (and they have greater visibility than you do) then this technique may help ensure that Google Scholar has access to all key information.

These meta tags are essential for indexation in Google Scholar. It is also important to link to associated PDF files using citation_pdf_url or similar. Without it, Google Scholar may incorrectly index PDFs as metadata cannot be pulled from the HTML version of the page.

Quick Tips

  • Ensure references are listed using numbers (1. – 2. – 3. -) in PDFs and are formatted in a <ol> list In the HTML source code.

Citations

Google Scholar has stated that citations are an influential factor for indexation and ranking.

Google Scholar aims to rank documents the way researchers do, weighing the full text of each document …  as well as how often and how recently it has been cited in other scholarly literature

Google Scholar

We can use the “site:” operative again here to understand citation performance. It seems that Google Scholar, when using the “site:” operative, ranks results in by citations.

For example, oup.com has approximately 1,770,000 results with citations reaching 25,963 (top ranking article). Whereas, elifesciences.org has approximately 6,560 results with citations reaching 1,583. It seems that there is some correlation between the number of results and citations.

Oup.com

Elifesciences.org

Visibility, ranking improvements and citations work hand in hand, as visibility and ranking increase so should citations.

There are a few things you can do to push your citation count:

  • Encourage your writers to cross-reference articles that your platform has already published
  • Consider sharing data on data sharing websites or contributing to Wikipedia and cite your articles as frequently as possible. Read more about this here.

SEO Reminders

  • Ensure titles and metadata have been optimised for a given article. This will help the article’s visibility and therefore, encourage citations.
  • Add schema markup for scholarly articles in the HTML source code of an article. This won’t affect how the result is displayed in Google Scholar but will markup citations, author and related article details in Google’s SERPs.

Contradictions with Traditional Google Search

Sitemaps: PDFs

In a discussion with a Google Scholar representative, in an attempt to increase the number of articles indexed in Google Scholar, they suggested submitting version-less PDF URLs in an XML sitemap. SEO practice dictates that similar/duplicate content should be associated back to the original content, usually with a canonical tag.

This would suggest that the PDF duplicates of the HTML should have a canonical tag pointing to the HTML page (these will be X-robot canonical tags that live in the header of a page). However, Google said the PDFs should be in a sitemap but best practice dictates that only status code 200, non-canonical URLs should be in the sitemap.

At this point, weigh up your priorities. If the priority is to increase visibility then submitting the PDF URLs in a supplementary sitemap may be more effective. If you are taking this route, it is best to submit a supplementary sitemap to limit the risk of affecting your entire site.

Limitations

Reflecting Changes

Google Scholar has stated that “once you update your website, it can take anywhere from a few days to 6-9 months for these changes to be reflected in Google Scholar Search results”.

It is important to keep this in mind when optimising for Google Scholar, changes take time to take effect, and it seems Google Scholar will take longer to update and reflect changes.

Monitoring performance will help you understand the effect of any changes made to your site. Changes will be reflected in Google Analytics (GA). You will find this data under Acquisition – All Traffic – Source/Medium. This is based on the assumption that your GA is correctly set up, see this Google Analytics audit checklist to ensure your Google Analytics is in tip-top condition.

Trial and Error

The lack of resources and strategies behind Google Scholar optimisation does mean that there will be a level of trial and error. Each site behaves differently, any key quirks of your site should be considered when creating a strategy. Of course, the period of time to test and experiment is elongated by Google Scholar’s delayed response to reflecting the change.

Key Learnings

  • Before attempting to optimise for Google Scholar, ensure fundamental SEO changes have been made
  • This will be a long process, don’t expect to see immediate changes or success
  • Include scholarly specific meta tags in the source code – various academic sites use more than one format
  • Add scholarly article schema markup to articles
  • To boost indexation, submit versionless PDFs URLs in a supplementary sitemap
  • Monitor when changes were made and monitor changes in indexation and rankings

Do you have any insights into the running of Google Scholar? Ask us questions on LinkedIn.