Google Duplex: Exciting or Worrisome? Part I


In the last few years, a lot of progress has been made in the way of getting computers to understand and emulate natural language, thanks in large part to “deep neural networks” such as WaveNet technology. In the old days, it was you who had to adjust to the computer. Now, it’s the other way around. Google has reached an unprecedented milestone in human-Artificial Intelligence interface with an invention called Duplex. Nothing has come as close to simulating natural human speech as Duplex, which was unveiled at Google I/O 2018.

According to Google, Duplex is capable of “conducting natural conversations to carry out ‘real world’ tasks over the phone.” Based on the technology’s public performance, this claim would appear to have been borne out. The highlight of Google’s demo consisted of Duplex scheduling appointments over the phone in speech that felt so natural that the human interlocutors at the other end of the line didn’t realize they were talking to a computer.

Admittedly, at this early stage in Duplex’s artificial life, we’ve only seen it operate in conversations that are confined to specifically functional topics. Google has stated that by establishing these parameters, it can research in-depth communication that occurs within a limited context, and program A.I. accordingly. Needless to say, plans are afoot for bigger and better things. For now, though, let’s take a closer look at how Duplex functions.

Duplex and the puzzle of natural language


Daily conversation, which is something we take for granted, has long been enshrouded in mystery and complexity – at least insofar as tech experts bent on emulating it artificially are concerned. Natural verbal communication goes well beyond the words we string together in sentences and the primary line of thought they’re intended to convey. Pauses, changes in tonality, mistakes, self-corrections, implied meanings which prove ascertainable largely if not exclusively through context, interference, interjections, and speed of delivery are just some of the variables that can turn a simple utterance into an intricate puzzle. Yet with a few clarifications here and there, the average human being can make sense of such an utterance.


Duplex relies on a device called Recurrent Neural Network (RNN), which is built on top of WaveNet technology, to emulate speech; this is done by pasting together very short sentences, each one allocated to a different subsection’s memory cache. Just like humans would do, when parts of speech are omitted, RNN figures out the gist of the conversation by piecing together certain of its fragments and other useful data. As a result, Duplex can carry out considerably realistic conversations, even if pertaining only – for the time being – to limited topics.


Before going any further, let’s split open a neural network and peer inside. Each neural network features a large mathematical matrix whose smaller parts are artificial neurons known in the industry as nodes. Nodes are sorted into layers, and each one processes input before pushing it out further through the interconnected matrix. During this time, the neural network is trying to minimize its error margin and fine-tune its answer. 


The actual training of Duplex consisted of Google exposing it to several analogue conversations. The process started with the conversation being fed via audio to Duplex, which would then run this audio through its Automatic Speech Recognition (ASR) system. ASR converts speech into a text that the neural network can understand.

More to come about Duplex

google assistantDuplex is intended to be an upgrade to Google Assistant. As soon as it’s perfected, it will feature in Google Android phones worldwide. At that stage, you will be able to use it to make calls and schedule appointments, as well as gather information about times and schedules. In part two of our article, we’ll look at some of the practical uses of Duplex, as well as the responses and controversy this new tech milestone has sparked.

