Hearables, Voice, and AI at the Edge of Ambient Intelligence

“As the IoT landscape populates with smarter devices and the data they generate, ambient computing is the fabric that knits them together. It is the intelligent synthesis and analysis of many disparate elements, generating insights and taking action based on those insights.”
-Technology in the fabric of everyday life, Innovation Journal, Issue 2: Spring 2016, www8.hp.com

Can you feel it? That almost-imperceptible motion, like a ride on an airport people-mover so slow and steady it doesn’t feel like you’re moving at all and the changing scenery is a perfectly normal course of events. That is, until the hallway widens to an atrium, the ceiling vaults and fills with stories-high escalators, the vending machines become branded restaurants and there are so many new corridors to follow you need to check your ticket for the gate number. That’s what progress feels like sometimes.

You could say that ambient intelligence is the new paradigm, the future. Our tangible experience with computing technology changes a little every day, so it’s equally true that we’re already en route. Since I can’t say when we’ll get there, I’ll talk about three concepts that are relevant today, and how they’re evolving.

The first is the computer we interact with, which might be a wearable, a desktop, or a mobile device. The second is the mode of communication – how we interact with the computer-mind in our presence. The third is what that computing power can do for us. Factors that determine this depend on the device, such as its program, its sensing capacity, and its network-related intelligence. These dependencies are neither immediate nor transparent to us, the users, except as far as what they allow the device to do, but they constitute the essence of our experience.

Opinion is growing that in-ear computing devices are taking us one large step closer to the ambient experience. The consumer market has recently introduced a range of new hearables – earbuds and earphones that function as smart devices, with voice commands, voice-to-text, and even bone conduction feedback. Most come equipped with their own CPU, WiFi and 3G connectivity, local storage, and sensors. Some earbuds, like Here One and the Dash, use voice-mediated information for services that range from mobile communication, coaching, and language translation to device control. Many use digital virtual assistants, and all virtual assistants use some aspect of voice UI.

Voice UI feels natural to the user – there’s no need to translate thought to another medium, and a hands-free state also allows for multi-tasking. Natural language recognition and processing promises that we could interact with a computer as easily as with another human being.

We’ve had virtual assistants with voice-based commands in our smart phones since 2011. By 2016, they were in our cars and homes too. What’s the difference with hearables?

Ask Bragi Dash CEO Nikolaj Hviid, and he’ll point out that a hearable is contextually aware in a way that a smartphone isn’t. Hearables can be equipped to collect biodata like heartrate, or even brainwaves, from the wearer. They also use gestures, voice, and audio feedback, as opposed to the visual interface of a smartphone or smartwatch. This keeps the user aware of her surroundings too.

“Wearable computing will take over everything…the ears are in the right place for wearables.” -Bragi Dash CEO Nikolaj Hviid
Source: https://www.wareable.com/meet-the-boss/bragi-dash-hearables

Hviid maintains that a device that is contextually-aware gives better assistance: it knows where you are, what you’re doing, and how you’re feeling. In Hviid’s view, smartphones will become relics.

 

Voice UI and Digital Virtual Assistants

The virtual assistants most popular among smartphone owners are Apple Siri, Microsoft Cortana, and Google Now. They’re used mainly for simple tasks, like retrieving information, relaying commands to other smart devices, and controlling the device itself. According to Gartner, virtual assistants will soon be taking over apps the same way they now control smart phones, and phone apps aren’t the only automation in their domain. Alexa, a voice-activated virtual assistant from Amazon that lives in the Amazon Echo home hub, “talks” to other smart home devices, like the Philips Hue or the Harmony Pro universal remote, through integrations.

The Echo/Echo Dot from Amazon is a speaker with the virtual assistant Alexa and the foundation of a smart home platform. Echo needs to be tailored to your needs with “Skills” that can be downloaded. It has an open API, and many compatible devices and services. Amazon’s Alexa Voice Service allows developers to add intelligent voice UI to a connected product. The AVS API is open-source, and the freely downloadable Alexa “skills,” such as the ability to talk to a Roomba, are contributed by an active community.
The Echo/Echo Dot from Amazon is a speaker with the virtual assistant Alexa and the foundation of a smart home platform. Echo needs to be tailored to your needs with “Skills” that can be downloaded. It has an open API, and many compatible devices and services.
Amazon’s Alexa Voice Service allows developers to add intelligent voice UI to a connected product. The AVS API is open-source, and the freely downloadable Alexa “skills,” such as the ability to talk to a Roomba, are contributed by an active community.

 

Google Home and Alexa both work with Harmony Pro, a universal remote that controls over 270,000 entertainment and smart home devices like TVs, cable and satellite receivers, Blu-Ray players, AV receivers, media streamers, and non-entertainment smart home solutions from Nest, Philips, Hue, and Honeywell. Source: https://www.logitech.com/en-us/product/harmony-pro?crid=60
Google Home and Alexa both work with Harmony Pro, a universal remote that controls over 270,000 entertainment and smart home devices like TVs, cable and satellite receivers, Blu-Ray players, AV receivers, media streamers, and non-entertainment smart home solutions from Nest, Philips, Hue, and Honeywell.
Source: https://www.logitech.com/en-us/product/harmony-pro?crid=60

 

In addition, Alexa is the pioneer of digital assistants engaging with services, such as requesting a ride from Uber, ordering a pizza from Dominos or a café latte from Starbucks – and it’s already mobile.

Natural Language Processing from Amazon Amazon Lex – Alexa’s speech recognition and natural language processing will be available for developers under the name Amazon Lex. It’s a web service that will connect to Facebook Messenger, Slack, and Twillio. Image source: https://aws.amazon.com/lex/
Natural Language Processing from Amazon
Amazon Lex – Alexa’s speech recognition and natural language processing will be available for developers under the name Amazon Lex. It’s a web service that will connect to Facebook Messenger, Slack, and Twillio.
Image source: https://aws.amazon.com/lex/

The stage is set for the competition among digital voice assistants. Apple leads in wearables, thanks to the best-selling AirPods and Apple Watch. Google has the most intelligent software. No surprise – they have more data than anyone else. But Amazon has the most friends: Amazon Fire Stick, Apple TV, Google Chromecast, GE’s WiFi Connect appliances, electronics from LG and Lenovo, a host of startups like Bragi and Blocks that are taking Alexa mobile, not to mention services like Pizza Hut and Starbucks that put the “ambient” in ambient computing.

The key is voice UI, and the intelligence needed to interact effectively with a user through spoken signals. It takes considerable resources to develop AI for voice recognition and natural language processing . For this reason, it’s considered unlikely than a fourth company would enter the competition.

Voice UI for Makers The Google Assistant SDK, launched in April 27, 2017, allows developers to run the Google Assistant SDK on their own hardware prototypes. The intention is that any smart hardware can provide Google Assistant functions. The Actions on Google Assistant work like Alexa Skills and have an open API, so developers can create voice commands and actions for their device. Source: https://arstechnica.com/gadgets/2017/04/the-google-assistant-opens-up-to-third-party-hardware-launches-sdk/
Voice UI for Makers
The Google Assistant SDK, launched in April 27, 2017, allows developers to run the Google Assistant SDK on their own hardware prototypes. The intention is that any smart hardware can provide Google Assistant functions. The Actions on Google Assistant work like Alexa Skills and have an open API, so developers can create voice commands and actions for their device.
Source: https://arstechnica.com/gadgets/2017/04/the-google-assistant-opens-up-to-third-party-hardware-launches-sdk/

Voice capture and recognition is positioned to become a key leveraging agent for the ambient computing experience of the future. Interpretation (What does it mean in context?) and application (What should I do with this information?) are the competitive focus now. Both factors depend on contextual awareness.

 

What is Ambient Intelligence?

It is the collective of smart devices that live around us, so well-integrated and finely tuned that we no longer think about connecting, syncing, or controlling. The concept of device disappears, along with the notion of a UI – anything can be a button, screen or microphone. We’re not there yet, but that’s the vision.

According to Eli Zehkha, an early pioneer who coined the term, it’s a state of technology characterized by ubiquity, invisibility, and a distributed architecture. Ambient intelligence resides everywhere, sensing, predicting, and responding to our needs.

Most voice interfaces today are primarily interfaces with services. They take in audio commands, parse, and execute. This requires intelligence about natural language processing, which Alexa does particularly well. But machine learning and neural networks offer pattern-matching and rule-established behavior. Virtual assistants need this kind of AI to learn about their humans, and to perform the predicting and responding that characterize ambient intelligence.

 

Machine learning

AI applications draw on a large database of information. This is the knowledgebase, or “brain” of the application – what provides the answers to the questions that get decoded by the language recognition and processing algorithm. The content of the knowledgebase, and how well the information is structured, determine how intelligent the application is.

This is the difference between Google Assistant and Alexa or Siri – Assistant simply has a bigger brain. Google has a massive knowledge graph in the form of its content databases, and its search algorithms, more sophisticated than any other, form the structural relation within the content.

 

Google Assistant is conceptually aware – it has a different kind of intelligence than Alexa or Siri, and it’s better equipped for conversational AI. It retrieves information from other apps to give you personalized service. For example, it can “remember” your preferences, or retrieve your location before making a suggestion. Assistant can learn, which means that as its history with a certain user grows, the more intelligent it becomes, and the better it serves that user.

Alexa has no ability to learn; its behavior doesn’t change from experience. Alexa can order me a pizza from Dominos if I ask it to. What if it could do so without my asking? What would that take?

The ability to learn from experience, so that it knows I like pizza, and I often order pizza on movie nights, and tonight is movie night. Access to my connected apps, so it knows from an earlier chat that I’ve invited four friends over this evening. Enough common sense to know that my friends’ pizza preferences are relevant to the task, and the ability to remember that one of them is a vegetarian. Contextual awareness about my location, so it knows I’m leaving the office and headed for my car, and I’ll be passing the Dominos in my neighborhood 40 minutes from now. Knowing all this, it orders two large pizzas for takeout – one MeatZZa pizza and one vegetarian.

A smart device with voice UI is equipped with enough intelligence to gather data and respond to natural language. The device then relies on a defined instruction set (its program) to act. That’s the intelligent performance we see now. However, to take in data – from its sensors, from other devices and their history – and then make it actionable is a large scale analytical task.

For a computer to take action on its own, that is to generate a new command based on the data it gathers, requires far more relational data input and processing power than does relying on a program. The possibilities for what might be relevant to any given context are unbounded, and must be selected and compared in real time. We need AI to manage the process.

This involves cloud computing, big data analytics, a large web of interconnected devices, and advanced machine learning strategies. It also moves the human-computer interface away from any one device and into the environment.

 

The IoV (Internet of Voice)

The notion of an ambient interface is already on the horizon. Viv is described as a next-generation AI assistant: In addition to speech recognition and conversational AI, Viv uses self-learning algorithms to write its own programs. Think of it as a machine learning algorithm that creates machine learning algorithms. For this reason, it’s expected to scale better than anything yet.

Viv isn’t a device function like Siri, Assistant, and Alexa – it’s a platform that remains connected to a user like an online account. It was conceived to be independent of any device and powered by any service, like a personal interface in an ambient computing world.

Viv was purchased in October 2016 by Samsung, who have been quiet about how they’ll offer the technology, or when.

Just wait for Viv in a hearable.

Leave a Reply

Your email address will not be published. Required fields are marked *