EMASS achieves AI function fusion with voice recognitionand real-time keyword detection

Nanoveu, a technology company specialising in advanced semiconductor, visualisation and materials sciences, announced EMASS has had a breakthrough demonstration in real-time keyword detection and voice recognition fused as two AI functions, running entirely on its ECS-DoT edge AI co-processor.

Both capabilities run at sub-milliwatt average power and around 4 ms latency directly on ECS-DoT, bringing accurate, low-latency voice intelligence to computing devices such as tablets and PCs while the host processor remains asleep until a command or an authorised user is detected.

Voice is becoming a primary control surface for computing devices. Manufacturers increasingly seek those experiences to run on-device for responsiveness, privacy and reliability, rather than depending on the cloud, with Voice AI Agents market projected to reach USD 47.5 billion by 2034.

Delivering always-on voice on a compact, power-constrained device has traditionally been limited by power and thermal budgets. Keeping a microphone pipeline and a capable processor continuously active to listen for commands can increase power consumption and thermal load. Conversely, duty cycling those processors to save power can introduce latency and increase the risk of missed activations.

Delivering always-on, instantly responsive voice processing has traditionally involved a three-way trade-off between power, latency and privacy:

Power: keeping microphones and a capable processor continuously active delivers
responsiveness, but drains the battery and adds thermal load.
Latency: duty-cycled and wake-on-demand schemes preserve power, but introduce delay
and depend on repeatedly waking higher-power processors.
Privacy: an always-listening microphone pipeline that processes audio off the sensor creates
an always-on privacy surface.

Sub-Milliwatt AI Function Fusion: Recognition and Keyword Detection All in One Chip
EMASS has demonstrated two complementary voice capabilities running entirely on the ECS-DoT edge AI co-processor. The first is keyword detection: recognising spoken commands and wake words in real time. The second is voice recognition: confirming whether the person speaking is the enrolled, authorised user.

Together, they let a device respond to voice instantly and personalise or gate that
response to the right person, all on-chip. This is the first time ECS-DoT has been demonstrated running two AI functions simultaneously. Rather than choosing between recognition or keyword detection, the chip performs both at once, on-device, with audio never leaving the device.

Low power: each application ran always-on at under half a milliwatt of average power (400
500 µW), enabling continuous operation without materially affecting battery life.

Real-time response: keyword detection and voice recognition were demonstrated at the co
processor, without first waking a higher-power processor.

Privacy: audio does not leave the device, materially reducing the privacy surface of an
always-on voice interface.

Both capabilities were implemented on the ECS-DoT evaluation board using a standard PCM digital MEMS microphone, with no additional specialised hardware required. The keyword-detection model is a compact 8-bit network that detects the most likely spoken and was tested over 1,500 times to measure how reliably it identifies the correct spoken keyword.

92% Top-1 and 97% Top-3 Accuracy
When the model makes a prediction, it produces a ranked list of likely words rather than a single answer. Two industry-standard measures describe how often it gets things right. The first, known as top-1 accuracy, counts a result as correct only when the model’s single best guess exactly matches the spoken word. This is the strictest measure and the one the machine learning community treats as the benchmark. On this measure the model reached 92%.

The second, top-3 accuracy, counts a result as correct when the right word appears among the model’s three best guesses, and on this measure the model reached 97%. The voice-recognition model produces a per-user voice signature matched against an enrolled
template to distinguish the authorised user from others. The sub-milliwatt result was achieved through aggressive model compression and tuning tailored to the ECS-DoT architecture, combined with the chip’s on-chip energy management.

These are two AI models running together on a single low-power co-processor, which is itself a demonstration of ECS-DoT’s multi-model design. The same engine that runs these voice models is built to run image and other sensor models, so the capability demonstrated here forms the foundation for broader on-device, multimodal intelligence rather than a single-function voice block.