Convercon 2019 notes
I attended Convercon 2019 in September after missing last year's event. I'm so delighted I did as it was an opportunity to hear from some seriously smart speakers on the topic of conversation interfaces. And as I'M slightly crazy, I've decided to take this to heart and dictate this whole blog post. Yes, you read that right, I'm speaking to a computer dictating this whole Post. So whenever you see something weird like the Capital P in the word post, that's because I gave up trying to fix is using dictation. You see, the new dictation tool in the latest version of Apple is surprisingly good, even with my Irish accent. Although it's not without pain points and I'm taking notes of how many times I'm having to touch the keyboard. Three so far (including typing the name of the event).😀
Anyway, what did I learn from the event? We are still in the early days of this new era and has one of the speakers PUT it, there is yet to be the Johnny I VE of voice interfaces. You also see from this dictation–we're not there yet with voice dictation either! It's taken me five minutes to type this out. However it is getting there and I can see an end game. I also suspect there is better voice dictation tools than this one, but I'm still impressed that I can navigate to text and fix most errors.
Up to 5 errors now that unavoidably needed fixing (not including the ones I’ve left in).
An aside (you can speak most terms like “go back three words” and the computer understands what that means without typing on-screen). Getting smarter....
One thing I will notice, it's hard to keep train of thought when dictating as you have to leave Time for the computer to work out what you're saying. And no I have no idea why the T in time is capitalised (nor how hard it was to type that previous sentence).
Back to the event
Firstly, thanks to Paul Sweeney for making me aware the event was occurring, and congrats to Webio for pushing the event - exciting times to see this occurring in Ireland.
What I love about going to events such as this is I know almost nothing in-depth on the topic so I get to hear everything with a clear perspective. Only some biases (based on experience using voice assistants and various chatbots), no real deep thinking on the specific topic either. So, with brain set to full soak mode, it’s possible to just listen, take notes and absorb.
Personally, I got the most of the ‘global trends in voice adoption’ talk by Bret Kinsella (Twitter) who kicked off the event, Katie McMahon’s (Twitter) overiew talk, and the final two speakers who went deep on ‘Conversational Design’ with Grace Hughes, and ‘Good Conversations and Intimate Relationships’ with Dirk Songur (Twitter). (again, I had to use keyboard for this paragraph). As someone relatively new to the space, Bret'S initial talk was ideal as A 'State of the nation’ with a rough overview of the whole industry, showing the rapid adoption rate of the various smart speakers and voice assistants, as well as the numerous companies in the space. Personally I've just finished a role in the privacy sector so it was interesting to see the references to this topic also: now was that there has been wide adoption, as usual people are only starting to notice now. It will be interesting to see what occurs here: we've all been carrying microphones in our pocket for at least 10 to 20 years so what are the implications now that we have started processing that data to do something? I was also very intrigued to see the domain-specific activities, companies working on speech recognition for specific topics such as physics. As Brett mentioned, and I can attest to here, “the number one thing consumers say is that they would use assistants more if they understood then better”. You would not believe how many times I've had to reach to the keyboard to write all this!
Back to Typing
O.k., o.k., back to typing - I can’t handle the pace of entry with dictating but it was a fun experiment and I’m very, very, impressed at some of the smarts that were occurring while typing, sorry, speaking in at the beginning. I also suspect those who are reading will prefer the readability of what appears here. One additional piece to add with why there were some issues above: I’m on a very slow internet connection based remotely. And as Apple’s new tool for OS X Catalina, “Voice Control”, depends on the cloud, it means that some of the issues above, in particular, the length of time it took to type were due to various pieces of text not being actioned. I’d speak a sentence and nothing would happen. Or speak a specific phrase, and nothing either. It does show one interesting piece also, some of this has to be run locally on your device: there is no way you can depend on a network connection. Google even appears to have recognized this with its approach to its recent highly regarded tool on its latest Pixel phones, Recorder, which runs locally and by all accounts is amazing.
So we’re getting there steadily.
Katie McMahon brought up a great perspective, which is something many people in industry experienced with the Internet. There is now a large proportion of people in the workforce who grew up with as Internet-natives having never known a time beforehand. We now have a generation of people being born that have had access to voice assistants since 2012. (Extrapolating from that, if we imagine a child getting a smartphone around 2010 as they went mainstream at the age of 12, that would mean they are now aged 19 and in college - they’ll be entering the workforce in around 2022 and I wonder if companies are ready for them. In the same manner, will they be ready for voice-assistant-native young adults by the time they enter the workforce?).
“There’s yet to a be a Johnny Ive of interfaces”. I loved this quote, and my experience trying to speak this post shows we’re getting there but still quite a way to go. As commented by another speaker, we’re in the Geocities era of chatbots (people of a certain age will understand what that means :) but as also confirmed by Cormac O’Neill, CEO of Webio, voice interfaces are inevitable and will be normal within 5 years. I’m sure the timeframe of five years is up for debate (it could be 5 months, 5 years of 15), but the inevitability and normality can’t be argued.
The BBC showed the interesting challenges of the new voice paradigm. A very valuable piece was around the voice platforms mediating between the BBC (or any company) and the customers. In essence, control is taken out of the hands of the BBC and hence they’re looking to build their own voice assistant ‘that will be for public services’.
TiVo showed their ‘new life’ where they are building the search and recommendation platforms for various companies, including Vodafone and SkyQ (disclosure: I personally worked on the introduction of the Vodafone TV platform and wasn’t aware of this fact). For TiVo, it’s all about personalization as they’ve shown it reduces churn.
To finish off, the closing talks were as much about the ‘relationship’ and the meaning we want from our future assistants and technology. The second-to-last talk was great, delving into Conversational Design, and the nuances of the topic (people having different responses to male or female voices, the mechanics of how humans take in information, empathy, ethics and more). This piece I find truly fascinating as it’s something new and particular to voice assistants which we haven’t required for keyboards, mouse and touch. As the speaker, Grace Hughes, pointed out for a long time it’s only been technology people in this space and we now need conversational designers, linguistics and more.
Lastly, and with a complete curveball (to my eyes!), Dirk Songur gave a very interesting closing talk on ‘good conversations and intimate relationships’. Not the sort of talk I was expecting, but ideal after the previous talk referencing the need for individuals outside of technology to get involved. Already, you can hand an Amazon/Google smart speaker to a non-techie and they’ll work out some of the basics on their own. As we continue to move past the early phases and expand their capabilities with more nuance and subtlety, Dirk dropped the great question on whether your voice assistant should be able to call your best friend when it hears you crying, or if it notices you haven’t been in touch with anyone recently? This is a whole new realm that takes us past just neat technology. Again, it goes back to ethics and privacy as other speakers have also remarked on and Dirk surmised it that these should be baked into the baseline capabilities of these new assistants. (I’d agree).
Conclusion (dictated)
I must be mad, as I'm dictating these final paragraphs. My voice assistant is here, quietly listening, and putting this on screen for me. Amazing times! But does it have any more capabilities? And as you will have just seen if you watch the video is it is not so smart at times. However, this is a specific domain, just dictating text and it will get better. At the moment these feel like machines still however I do imagine that time where they can express something that resembles emotion or empathy. However as you will notice, we are not there yet.
Conclusion (typed)
The general consensus I got from the day was it was still early days in this industry, with lots of low-tech solutions (e.g. using voice to support MOT testers, chatbots for various companies) through to the cutting edge being considered (“I hear you crying, should I call a friend or family member?”), both from a technological as well as an ethic and philosophical perspective. Already, many of us will have seen the tantalizing vision of the future and like the period of time when using 2006-era ‘smartphones’, there were infrequent glimmers of what was possible and where we were going. We’re in the early days of the new technology rollout and already you can see that the Amazon’s, Googles believe this a platform war, and however owns it first wins, in much the same way as what occurred with the transition to the smartphone. However, there’s also a whole other aspect to this that I hadn’t truly considered prior to the event, and it’s the domain-specific systems that will quietly filter out. Like databases, I expect to see these in some form, invisibly, in the background of many companies, all serving different functions. A mechanic may love an assistant just to take notes on servicing or testing jobs, whereas an office worker may need one to assist in data entry, and a researcher may use another type to support it in research. And all alongside, these consumers will have their own personal assistants to assist in all range of activities. You can already see some examples where it’s ‘assistant-like’ when your phone asks you do you want to create an event from an email or a flight ticket.
How big is the potential?
I don’t know, but I’m excited to find out.
