Quickly Voice Your Apps with NVIDIA NIM Microservices for Speech and Translation

NVIDIA NIM, part of NVIDIA AI Enterprise, provides containers to self-host GPU-accelerated inferencing microservices for pretrained and customized AI models across clouds, data centers, and workstations. NIM microservices for speech and translation are now available.

The new speech and translation microservices leverage NVIDIA Riva and provide automatic speech recognition (ASR), neural machine translation (NMT), and text-to-speech (TTS) services.

Integrating multilingual voice capabilities into your applications with NVIDIA speech and translation NIM microservices offers beyond just advanced ASR, NMT, and TTS for enhanced global user experience and accessibility. Whether you’re building customer service bots, interactive voice assistants, or multilingual content platforms, these NIM microservices are optimized for high-performance AI inference at scale, and provide accuracy and flexibility to voice-enable your applications with minimal development effort.

In this post, you’ll learn how to perform basic inference tasks—such as transcribing speech, translating text, and generating synthetic voices—directly through your browser by using interactive speech and translation model interfaces in the NVIDIA API catalog. You’ll also learn how to run these flexible microservices on your infrastructure, access them through APIs, and seamlessly integrate them into your applications.

Video 1. Watch a demo on deploying speech NIM microservices and connecting them to a simple retrieval-augmented generation pipeline for voice-to-voice interaction

GPU	NIM_MANIFEST_PROFILE
Generic	9136dd64-4777-11ef-9f27-37cfd56fa6ee
NVIDIA H100	7f0287aa-35d0-11ef-9bba-57fc54315ba3
NVIDIA A100	32397eba-43f4-11ef-b63c-1b565d7d9a02
NVIDIA L40	40d7e326-43f4-11ef-87a2-239b5c506ca7

GPU	NIM_MANIFEST_PROFILE
Generic	3c8ee3ee-477f-11ef-aa12-1b4e6406fad5
NVIDIA H100	bbce2a3a-4337-11ef-84fe-e7f5af9cc9af
NVIDIA A100	5ae1da8e-43f3-11ef-9505-e752a24fdc67
NVIDIA L40	713858f8-43f3-11ef-86ee-4f6374fce1aa

Quickly Voice Your Apps with NVIDIA NIM Microservices for Speech and Translation

Quick inference with speech and translation NIM

Running speech and translation NIM microservices with NVIDIA Riva Python clients

Transcribing audio in streaming mode

Translating text from English to German

Generating synthetic speech

Running speech NIM locally with Docker

Integrating speech NIM microservices with a RAG pipeline

Set up the environment

Launch the ASR NIM

Launch the TTS NIM

Connect the speech NIM microservices to the RAG pipeline

Test the speech NIM and RAG integration

Get started adding multilingual speech AI to your apps

Ransomware Attacks Growing More Dangerous

AuthenticID Launches Velocity Checks to Combat Identity Fraud with Advanced Biometrics

Embracing diversity: GamesBeat’s Diversity in Gaming Lunch is just around the corner

The Powerhouse GPU Revolutionizing Deep Learning

New Spear-Phishing Campaign Deploys ‘More_eggs’ Backdoor

Related articles

The Endorfy Fortis 5 Dual Fan CPU Cooler Review: Towering Value

Bookshop Day: our 2024 recommendations

This What We Do in the Shadows Portrait Is Dead and Out of This World

Advancing the Accuracy-Efficiency Frontier with Llama-3.1-Nemotron-51B