Voice search marketing tactics for SEO

I was recently preparing a presentation and came across a presentation I gave to a small meetup in London in 2013. While there were only 100 or so people in the live audience that day, the presentation has now probably been seen by a hundred thousand people – between Slideshare, and a video of a webinar version, and the blog post I wrote about it at the time. When I stumbled back across it, I found it interesting to look back on because it made a bunch of predictions about the next 10 years and now, in 2018, we are halfway through those 10 years.

I was struck by how time has flown and I thought it would be interesting to do a midway-point review of what I was thinking in 2013. I also thought I could use some of the information it gives us about the pace of technology change and user behaviour change to attempt to understand current trends better – particularly around voice interfaces and voice search.

Preview of the punchline: voice isn’t as disruptive as many seem to think

I’m going to run through my predictions and how I think they’re coming along, but I also wanted to give you a preview of where my argument is going. Ultimately, while I think that voice recognition technology has become incredibly good at recognising words and sentences, there are a variety of things that will prevent it quickly cannibalising the rest of search in the short term. This is true of voice interaction generally in my opinion, but is especially true in search where I believe voice is mainly incremental (and isn’t even responsible for anything like all the incremental query growth).

The bulk of the general argument is made very well by Ben Evans in his article voice and the uncanny valley of AI (though these response and rebuttal articles are worth a read too).

I particularly loved the simple way of describing availability and appropriateness as two big issues for voice that I came across in this intercom article: what voice UI is good for (and what it isn’t) credited to Bill Buxton where he talks about what he calls “placeonas”:

real world scenarios of hands, eyes, ears, and voice being free

I’m not sure about the “placeona” language (a placeona being an adaptation of a persona that focuses on location changing your preferences or behaviour). For reasons that will not surprise regular readers, I distilled it into a couple of 2x2s:

How do we want to consume information?

consuming information when screen or spoken word is best

How do we want to enter information?

entering information hen hands or voice are available

In my view, the constraints that voice isn’t always a convenient input, and speech isn’t always a great output place a natural ceiling on the usefulness of voice search – even beyond the issues Evans identified – and they are heightened for what I’m calling real searches. My view is that the majority (if not the vast majority) of what are currently being called “voice searches” in the stats aren’t much like what search marketers think of as searches.

When Sundar said in 2016 that 20% of mobile searches in the Google app and on Android were voice searches, my bet is that 75%+ of those were incremental and not “real” searches. They were things you couldn’t do via “search” before and that are naturally done by voice – such as “OK Google, set a timer for 20 minutes”. The interesting thing about these “searches” and the reason I’m classifying them differently is that they are utterly uncommercial. Not only are you never going to “rank” for them, there is literally no intent to discover any kind of information or learn anything at all. They’re only really called searches because you’re doing them with / through Google.

The pace of change: revisiting some old predictions

Before I finish making those arguments, let’s look back at the presentation I opened with. I started by putting my 10-year predictions into context by looking back 10 years (to 2003 – this was 2013, remember).

The 10 years before 2013

I reminded myself and the audience that in 2003 we were on the cusp of:

Scoring my 10-year predictions from 2013 halfway through

Now, I put this initial presentation together for a relatively small meet-up, so I didn’t turn them into completely quantitative and falsifiable projections – though if anyone thinks I’m substantially wrong, I’m still up for hashing out more quantifiable versions of them for the next five years. In that context, here are my main predictions for 2023. We’re now halfway there. How do you think I’m doing?

I said that in 2023 we will:

  • Still be doing email on our phone
  • Still be using keyboards
  • Still be reading text

I’m feeling pretty good about those three. Despite the growth of new input technologies, the growth of video, and the convenience of hardware like airpods making it easier and easier to listen to bits of audio in more places, it doesn’t seem likely to me that any of these are going anywhere.

  • Pay for more [digital] things

I mean. This was kinda cheating. Hard to imagine it going the other way. But the growth of everything from Netflix to the New York Times has continued apace.

I’m not 100% sure what the end-game looks like for media subscriptions. I feel that there has to be some bundling on the horizon somewhere, as I would definitely pay something for a subscription to my second, third, and fourth preference news sources, but there is no good way to do this right now where it’s a primary subscription or nothing.

  • Dumb pipes continue acting dumb

I think that the whole net neutrality issue (interesting take) is pretty good evidence of the continuing ambitions (and, so far, failures) of the “pipes” of the internet to be much more. Having said that, I didn’t get into anything nearly granular enough to count as a falsifiable prediction.

  • Last mile no longer the issue – getting fibre to the exchange is the challenge

I think this is probably the biggest miss. Although there are some core network issues, home and mobile connection speeds have generally continued to improve, and where they haven’t, the problem actually does still lie in the last mile. I suspect that as we move through the next five years to 2023, we will see a continuing divide with speeds continuing to increase (and not being a blocker to advanced new services like 4K streaming) in urban / wealthy / dense enough areas, while rural and poorer areas will continue to lag. In the UK, the smaller size and higher density means that we are already seeing 4G mobile technology cover some areas that don’t have great wired broadband. This trend will no doubt continue, but the huge size and scale of the US means that there will continue to be some unique challenges there.

  • Watch practically no scheduled TV apart from some news, sports and actual live events

This was a bold one. I may have forgotten my own lesson about how fast (read: slowly) consumer habits change. In the accompanying blog post, I wrote:

“I am much less excited by an internet-connected fridge (a supposed benefit of the internet since the late ‘90s) than I am by instant-on, wireless display streaming (see for example AppleTV AirPlay mirroring) making it as easy to stick something on the TV via the web as via terrestrial / cable TV channels.”

This prediction was part of a broader hypothesis I developed and refined in 2013-2014 around the future of TV advertising. The key prediction of that was that $14-25bn /yr of TV ad spend will move out of TV in the US in the next 5 years. We’re about to see what the 2018 upfronts look like, but we’ve already seen a ~$6bn drop. It’ll be interesting to see what 2019 holds and then come back to this in 2023.

  • Have converged capabilities between mobiles and laptops – what I called “everything, everywhere”

That last one was possibly the most granular of the predictions – I envisaged specific enhancements to our mobile devices:

  • Faster than 2013 laptops
  • Easier to purchase on than laptops by being more personal
    • This is certainly an area where we have seen huge innovation with more to come

And specific ways that laptop-like devices would become more like 2013 mobiles, with:

  • Touch screens
  • App stores
  • The ability to turn on instantly

The majority of my predictions were directional and not that controversial, but the point I was seeking to make with the first few was that technology and usage generally changes a little slower than we anticipate. I think this is particularly true in voice, especially when it comes to search, and spectacularly true when it comes to commercially-interesting searches (including true informational searches).

What does all this tell us about voice “search”

At a high level, the same arguments I made in 2013 about the suitability of the different kinds of input and output apply to put some kind of cap or ceiling on the ultimate percentage of queries that will eventually shift to voice. Along with that, the experience of what things changed and what stayed the same 2003-2013 and again 2013-2018 remind us that certain kinds of behaviour always change more slowly than we might imagine they will.

All of that combines to remind us that even in the bullish predictions for voice search growth, most will be incremental and so little of it is to the detriment of existing search marketing channels.

So how many voice searches might there be? And how many are actually real searches (rather than voice controls)? Of those, how many are in any way competitive or commercial? And of those, how many give a significantly different result to the closest-equivalent text search, and hence need any kind of different marketing approach?

A bit of Fermi estimation

Google talked about 20% of mobile searches being voice in 2016. Let’s assume that’s up 50% since then. There are then another fraction of that which will be voice searches on other devices (smart speakers, watches?, laptops).

To make it concrete, let’s assume we have a trillion desktop searches / year and a trillion mobile non-voice searches / year to put very rough numbers against the argument. Then I believe that the new searches will mainly not cannibalise these (and to the extent they do, there will be natural growth in the underlying search volume). So then, taking the conservative assumption of no other growth, we get to something like the following annual search volume:

  • 1 trillion desktop
  • 1 trillion mobile non-voice
  • 300 million mobile voice
  • 300 million voice non-mobile
  • Still to come: 400 million (the rest of the “next trillion”): unfulfilled search demand – queries you can’t do yet. Image searches. New devices. New kinds of searches. Some fraction of these will be voice too.

So – 600 million voice “searches”.

After reading a range of sources, and building some estimation models, I think that total voice “search” volume breaks down roughly as:

  1. 50% (300 million / year): control actions
    1. [set a timer]
    2. [remind me]
    3. [play <song>]
    4. [add <product> to shopping list]
  2. 20% (120 million / year): informational repeated queries with no new discovery (i.e. you want it to do the same thing it did yesterday)
    1. [today’s weather]
    2. [traffic on my commute]
  3. 5-10% (30-60 million / year): personal searches of your own library / curated list
    1. [listen to <podcast>]
    2. [news headlines] (from previously set up list of sources)
  4. 20-25% (120-150 million / year): “real” searches – breaking down as
    1. 1-2% unanswerable
    2. 10% text snippets
    3. 5% other answers (local business name, list of facts, etc)
    4. 5% (where screen present) regular search results equivalent to similar typed search

Unfortunately, there is little “keyword” data for voice to validate this estimation. We simply don’t know how often people perform which different kinds of queries and controls. Most of the research (example 1example 2) has focused on questions such as “which of the following activities do you use voice search / control for?” or “what tasks do you perform on your smart speaker?” (neither of which capture frequency). While there is some clever estimation you can do with regular keyword research tools, there is little in the way of benchmarking.

The closest I have found is comScore research that talks about “top use cases”:

comscore char showing use of smart speakers in US

If we interpret this as capturing frequency (which isn’t clear from the presentation) we can categorise it the same way I did above:

comscore data recategorised

And then sum it to get ratios that fall roughly within my ranges:

  • Control: 51%
  • Informational: 23%
  • Personal: 6%
  • Search: 19%

It’s those 4. c & d that provide a marketing opportunity equivalent to most typed searches (and of course, just like on desktop, many of those are uncompetitive for various reasons – because they are branded, navigational, or have only one obvious “right” answer). But even with those included, we’re looking at global search volume of the order of 12-15 million queries / year.

Still interesting, you might think. But then strip out the queries it’s impossible to compete for, and look at the remaining set: what % of those return either the top organic result, a regular search result page (where a screen is present), or a version of the same featured snippet that appears for a typed search? 80%? 90%? I’m betting that the true voice search opportunity that needs a different activity, tactic or strategy to compete for, defined as:

  • Discovery searches you haven’t performed before (i.e. not [weather] and similar)
  • That return good results
  • That are not essentially the same as the result for the typed query

Is less than 1 million searches / year globally at this point.

What market share of ~100k searches / month across all industries do you think your organisation might be able to capture? How much effort is it worth putting into that?

Where I know I’m wrong

My analysis above is quite general and averaged across all industries. There are a few places where there might be specific actions that make sense to make the most of the improvement in and growth of voice control. For example:

  • News / media – might find that there is an opportunity in a growing demand for news summaries and headlines delivered as a result of a voice interaction rather than as e.g. morning TV news (see for example, this stat that the NYTimes news podcast The Daily has more listeners than they ever had print subscribers)
  • Data providers – if you offer proprietary (and defensible) data that has high value for answering certain kinds of queries (e.g. sports league statistics), there could API integration opportunities with attached commercial opportunities
  • Customer success / retention / happiness for consumer companies – there are a bunch of areas where skills / integrations can make sense as a way to keep your customers or users engaged with you / your service / your app. These might perform like searches that no-one else has access to once your users are using your skill. An example of this is grocery shopping.

At the same time, I would be tempted to argue that most of that is not truly search in any particularly meaningful sense.

Of course, it’s completely possible that I’m just wrong on the scale of the opportunity – Andrew Ng of Baidu (formerly of Google Brain) believes that 50% of all (not just mobile) searches will be voice by 2020 (or at least he did in 2016!). I haven’t seen an updated stat from him and while I am inclined to think that’s too high, you might disagree and I would understand if you thought Ng’s credentials and access to deeper data were stronger than mine here! (Note you’ll also see this prediction bandied around a lot attributed to comScore but as far as I can tell, they just repeated Ng’s assertion).

Disagree? Want to argue with me?

Please do – I’d love to hear other opinions on twitter where I’m @willcritchlow.