4 Practical SEO impacts of Google’s machine learning advances
We know that Google has long been an incredible leader in the field of artificial intelligence and machine learning. There are a lot of innovations that they’ve been responsible for that have caused a stir in the SEO industry, but in my opinion the vast majority have not been as impactful to us as SEO practitioners as you might have thought based on the amount of coverage they have received.
For example, while their progress in developing language models rightfully garnered a huge amount of attention, my view is that these improvements don’t particularly change how we structure SEO activities and decide strategically where to focus our attention, or cause us to write content differently. Better language models have changed the search results in some cases – for example by understanding synonyms, stemming and plurals better than they used to – but these have generally resulted in better results for users, and only harmed individual sites’ performances if they were ranking in areas where a different result would actually be better.
There will be counterexamples to this, but they aren’t particularly actionable. They all consist of specific cases that look something like a page with a very specific turn of phrase being beaten out by a less specific page that the model erroneously thinks is more relevant. There isn’t a lot you can do about that, because rewriting your page to be more general is unlikely to be effective, makes your page worse, and damages other long tail queries that it might have been continuing to rank for. Overall, as webmasters, we have little choice but to trust that the language modelling will work out well on average across our portfolio of pages and search queries, and not just on average across the population of all sites on the web!
(That might not be true, of course, but there’s little we can do about it – one of the many reasons why I wish there was more genuine competition in the search engine space).
Having said all that, I think there are some areas of machine learning that either have changed or soon might change how we actually do SEO – either in tipping specific tactical decisions in a different direction, or by opening up new fronts of strategic opportunity. These are the areas I want to dive into today.
For each area, I’m going to look at the innovation through three lenses:
What it means now – the impacts we are already seeing in the search results and hence in tactics today
What it might mean in the near future – technology that is already good enough, but perhaps not rolled out yet
What to watch out for – what future innovations could mean for marketers
For the most part, up until now, we have thought about Google as mainly consuming our content and displaying it to users with our own summarisation intact in the form of meta information, structured data, URLs and page titles. In fact, they have been using their own technology to rewrite titles and meta descriptions for some time under certain circumstances. In recent times, though, they have increased the rate of this kind of change, and also moved further towards active summarisation and content creation in place of simply selecting the most relevant parts of the page wholesale.
I don’t believe we have yet reached the tipping point where we believe that Google’s technology is consistently better at writing titles than humans, but SearchPilot experiments have shown that we have likely reached that tipping point for meta descriptions under many circumstances (see, for example, this experiment that reached statistical significance at 95% confidence and this experiment that was suggestive that it might have been true here as well).
I believe the primary explanation for this is the dynamic nature of meta description rewriting. A page can only have a single actual meta description that has to serve as the organic “advert” in the search results for every query that page ranks for. In contrast, when rewriting the description, Google is free to pick different elements of the page to summarise for different queries – for example pulling out opening hours if the query indicates an interest in those or the address or phone number if the query is more about those. Since keywords in the search query are typically bolded in the search results as well, this adds an additional advantage to the dynamic approach that Google takes.
What this means for SEO, is that it is probably already worth experimenting with removing meta descriptions and seeing if Google (with dynamic advantages) is better than your single best static meta descriptions at winning the clicks in the search results.
It could well be that the changes to the ways Google rewrites titles become significantly more impactful than meta descriptions ever were. At SearchPilot, we have found that title tag tests are some of our biggest winners (and biggest losers!). I predict that in the short to medium term, the majority of titles will either be left intact or will be so heavily influenced by the actual page title that HTML title will remain the strongest influence on the display snippet in the search results (I wrote a bit more about this here in the context of whether it will remain possible to continue testing titles for SEO).
If, however, Google believes that their technology is better than webmasters at writing accurate and compelling page titles, it could be that the percentage getting rewritten without inspiration from the current titles grows, and in these circumstances it would change SEO recommendations to focus more on title tags as ranking factors and less for their clickthrough rate impact.
The scary future that publishers should be aware of is the possibility that Google gets good at ingesting information from multiple sources and outputting coherent and compelling content summarising information from a variety of different places. There are copyright issues here, for sure, but having seen the aggressive rollout of featured snippets, one-box answers, and vertical search features in general, I wouldn’t bet against Google’s lawyers finding a way to do this.
If they do, we enter a world where the featured snippet is uncredited and doesn’t come with a link (currently, our tests have shown that winning the featured snippet is beneficial). This would have a dramatic impact on content strategies (and many publishers’ business models!). For many non-publishing businesses, it would likely push towards strategies that focused on branding more queries (think about the difference between searches for [ipad] vs [tablet]). By doing this, they would seek to be the only place with authoritative results for these queries, and to benefit from the visibility even if featured snippets weren’t attributed..
From the earliest days of my SEO experience, it was a common refrain that “Google can’t see anything in an image”. It was common in early 2000s web design to use images for page elements that should really have been text (like menu buttons) because the styling and layout techniques available at the time were underpowered, confusing and complicated. We used to talk about replacing text in images with real HTML text. We knew that Google couldn’t understand anything in an image.
Over the years, with the launch of image search, Google got better at understanding images – but almost entirely through context rather than through anything fundamental to the image file itself. Only the filename was directly relevant, and everything else was about the surrounding text, meta data, and context of the pages where it was embedded and linked-to.
We know, however, that Google has the technology to evaluate image contents on their own terms as they offer the ability to search by object or place name even for unlabelled, context-less photos in the Google Photos app.
As far as I know, we don’t have evidence of Google evaluating image contents at web scale, but we have seen tests showing that page elements which aren’t user-visible (like alt attributes) do not have a measurable impact on organic search performance. This seems believable to me, as the better Google gets at understanding context, the less they will want to use or rely on anything that isn’t user-visible due to the ease with which it can be manipulated without risking user experience.
Even though I’m not aware of anyone having detected Google using the image recognition technologies they have deployed in Google Photos in web search, it would not surprise me if they were quietly doing so in certain circumstances. There is some evidence of optical character recognition (OCR) – i.e. reading text from images – in situations where they have some reason to think there would be value to doing so. Most likely, this will be where they have some heuristics indicating that the content is valuable and / or not available anywhere else.
The sci-fi future is full web-scale image recognition on all indexed images. Consider not only improvements in image recognition, but also huge increases in processing power and efficiency. If Google reached a point where they were routinely labelling images across the web, you can easily imagine them adding quality heuristics to determine “good” images of various things, and in principle delivering much higher quality image search results that didn’t rely on context or text. If this was successful, we could see much more integration of images into regular search results, because it would be possible to operate much further down the long-tail. Web scale image search is currently only really effective in the head of demand, but if they could scale it, we could see dramatic changes to search results pages for many queries, with all the tactical and strategic impacts that would imply.
I remember going to a Microsoft event in 2008 or 2009 where they demonstrated some incredible R&D technology including automatic video highlights (I think they were doing sports highlights based on audio analysis of crowd noise). The mocked-up user interfaces they showed included deep links to the interesting parts of videos in search results.
Google links to key moments in search results based on uploader-provided meta information and transcripts.
But might we reach the point where it’s possible to do “video recognition” in the same way as “image recognition” works at the moment – enabling the searching within unstructured and unlabelled video data?
As I understand it there is really nothing like this happening at present – all in-video searching is either searching meta data, or at best searching possibly-automatically-generated transcripts matched up with timestamps.
Certain specific domains are seeing a rapid expansion of video databases with associated structured data. In professional basketball, for example, coaches have access to databases that enable them to find all video clips of a particular player executing a particular move, with a specific result (e.g. turnover). I think it’s quite likely that this kind of domain-specific video searching, relying heavily on structured meta data will become more common.
In the almost-sci-fi future, you can imagine this kind of capability coming to video search just as it has to Google Photos for personal image search, and as I predict might come to web-scale image search in the not-too-distant future. The raw computational difficulty suggests to me that this will not be coming to web-scale video search any time soon.
Despite its own efforts to improve machine-writing capabilities – most notably in translation – Google has maintained that “text generated through automated processes” is an example of “automatically generated content” that they see as “intended to manipulate search rankings and not help users” and against which they “may take actions”.
I have written before about my guesses for the future of Google’s position on automated content but the summary is:
- I think they will eventually allow for high quality automated content to be allowed within their guidelines
- I’m not holding my breath for them to make that change to the guidelines
What it means now
I have seen tests of content generated using tools like GPT-3 and am aware of cases where they outperformed the status quo. I believe a range of companies are already using this kind of technology in production.
The biggest risks with generated content don’t come from Google. They come from brand safety and PR. At the moment, the state of the art tools don’t have enough built-in protections against biases present in training data, or against nonsensical, embarrassing or damaging output. There is a lot of academic and commercial research taking place on these problems, however, and if we see progress here, I would expect to see many more companies looking to experiment with the technologies.
I expect to see an explosion in the use of generated content as the technologies for writing increasingly-high-quality text improve and become more widely-accessible. It’s very likely the growth rate is already very high. Key things to watch out for include:
- Any official statements from Google on the subject
- Highly visible acknowledgements from high profile websites admitting their use of automation in this way
- Case studies showing user acceptance of automated content