By Jeremy Holleman
With the explosion of audio content over the last 15 years, headphones have evolved from passive wired devices to intelligent wireless devices, with a large role in a user’s experience with a mobile device. True Wireless Stereo (TWS) represents the latest step in this evolution.
Syntiant’s NDP10x series of ultra-low-power audio recognition chips provide the neural inference capability to give users a convenient voice interface while preserving valuable battery life.
TWS earbuds work without a wired connection to an audio source, each other or a power source. In a TWS earbud pair, the two devices operate in a master/slave arrangement. The master device will connect to a phone or other audio source over Bluetooth, using the A2DP (Advanced Audio Distribution) Bluetooth profile. The master device relays the audio signal to the slave, still using A2DP, and can also transmit control commands (play, pause, etc.) back to the phone using Bluetooth’s AVRCP (Audio/Video Remote Control Profile). The small size and complete lack of wired connections impose severe constraints on components in terms of physical size, functionality and power consumption.
The user interface presents one of the biggest challenges to a good TWS earbud design and an enjoyable customer experience. The physical space for buttons is limited, but a tactile interface remains most common, sometimes without explicit buttons. For example, Apple’s Airpods are controlled via sequences of taps on the device body, essentially using the entire device as a button. Even if buttons can be fitted onto the device, the user can’t see them and is stuck with sequences of taps or presses -- nostalgic for Morse code enthusiasts, but cumbersome for others.
Voice control offers the possibility of a far richer interface than taps or button-presses. However, without adequate recognition algorithms and hardware, a speech interface can bring more frustration than convenience. People wear earphones in all kinds of noisy environments and false alarms are annoying and disruptive. False negatives render the device unusable; after a couple of failed attempts at “Hey Siri, answer the call” the caller has hung up.
A subtle but significant point of nomenclature merits clarification. Wakeword or hotword detection is the recognition of a word or phrase that wakes a device from an idle state, such as “Hey Siri,” “OK Google” or “Alexa.” A command word or command phrase communicates an action to be executed by the device, such as “volume up” or “pause.” Keyword spotting is variably used to refer to wakeword or command word detection.
Typically, wakeword detection is more difficult and more sensitive to false positives than command word detection because it is always running, and may be falsely triggered by a large open set of similar sounds, or because false positive often incur a cost in battery life or cloud computing costs to service a request. Command-word recognition, in contrast, can assume that the input is part of a limited vocabulary and that the device has already been put in a listening state. Command-word and wakeword detection are both examples of small-vocabulary speech recognition, distinct from and far easier than large-vocabulary automatic speech recognition (LVASR). Here we will primarily focus on wakeword spotting because of its impact on the battery life and user experience for TWS earbuds.
Deep learning models have demonstrated the best performance in wakeword spotting (WWS) in recent years, but they are computationally expensive. Given the power required to run adequate detectors on standard hardware (see this previous post for data), the temptation will be great to run an undersized network and utilize a lightweight microcontroller (MCU). The risk here is a “worst-of-both-worlds” compromise where the battery life is excessively sacrificed and the speech interface still performs poorly. To determine how much power we can afford to spend on a speech interface, we need to look more closely at the power budget of a typical device.
A tiny form factor is the defining characteristic in TWS earbuds and, to paraphrase Stan Lee, “with little size comes little power.” When choosing components for a TWS design, the power consumption and size of every item is critical. Battery capacity in these devices is generally in the 50-200 mWh range.
To gain a sense of where that energy goes and how much we have to work with for new features, we can use Apple’s Airpods as a working example. Using information from Apple, iFixit’s teardown, and a review, we can estimate that the Airpods use about 4 mW in idle, 19 mW while listening to music and 31 mW during a Bluetooth call.
Currently available digital microphones consume about 500uW. Estimating 3 mW for an MCU-based wakeword detector, it’s easy to see that the battery life in idle falls from 24 hours to about 13 hours. If we assume a light usage (0.5 hours call time, 1 hour music) or a heavy user (1 hour call 2 hours music), we see a 42-46 percent drop in battery life. In contrast, an efficient wakeword detector, consuming 140uW or less, will reduce battery life by less than 15 percent across scenarios. Given the attention on audio interfaces from startups and established technology companies, it is a good bet that sub-100uW digital microphones will be available in the near future, bringing the battery life of a wakeword-enabled device to within 94 percent of baseline. The table below summarizes the effect of use case and wakeword spotting (WWS) inference power on battery life.
The figures shown here likely underestimate the impact of an MCU-based solution, as most solutions in the 3 mW range exhibit mediocre accuracy, requiring low thresholds for usable responsiveness, which in turn results in high false alarm rates that must be transmitted over Bluetooth and double-checked on a larger device or cloud server. False alarm clean-up is not trivial, especially if a cloud service is being activated. The built-in latency of cloud access leaves little time for layers of double-checking before an excessive delay hurts the user experience. And excessive requests to the remote service can result in expensive cloud computing charges.
Syntiant’s NDP10x wakeword solution runs at under 140uW and provides high-accuracy wakeword detection in realistic environments. It is AVS-qualified for close-talk applications (the relevant standard for in-ear devices) with fewer than three false alarms per day, so essentially no power is wasted transmitting audio for a second check. The 2.5 mm2 WLBGA package supports the ultra-dense designs required for earbuds and other tiny devices.
Power costs can often hide in the details of system integration. For example, a 20 MHz clock distributed over a 5 pF trace at 1.8 V adds at least 320 uW to the power total. The NDP10x is built for easy integration and system efficiency. The included clock multiplier allows it to run from a 32 kHz system clock, so that same 5 pF trace would add less than 1 uW. The audio interface decodes PDM signals directly from a microphone without the need for a separate audio codec.
Lastly, integration effort is critical in a rapidly evolving market like earbuds. The NDP comes with all of the required software support in an interface library requiring less than 2 kB of code space. The interface is simple and standard, comprising an interrupt line and a SPI bus. The complete detection process is executed on the NDP without intervention from the host processor. No machine learning or acoustics expertise is required because the expertise is embedded within the models, which are easily loaded onto the device just like firmware.
Syntiant’s NDP10x series offers highly accurate wakeword detection in a tiny package with near-zero power consumption. Production-ready software support makes integration painless. For reliable voice control and the best customer experience, the NDP10x offers a power-performance combination unmatched by any other solution.
Jeremy Holleman, Ph.D., is the chief technology officer of Syntiant Corp. He is an expert on ultra-low power integrated circuits and directs the Integrated Silicon Systems Laboratory at the University of North Carolina, Charlotte, where he is an associate professor.