Optimizing Microsoft Bing Visual Search with NVIDIA Accelerated Libraries

Microsoft Bing Visual Search enables people around the world to find content using photographs as queries. The heart of this capability is Microsoft’s TuringMM visual embedding model that maps images and text into a shared high-dimensional space. Operating on billions of images across the web, performance is critical.

This post details efforts to optimize the TuringMM pipeline using NVIDIA TensorRT and NVIDIA acceleration libraries like CV-CUDA and nvImageCodec. These efforts resulted in a 5.13x speedup and significant TCO reduction. We share how we worked with the Microsoft Bing team to tackle optimization of their core embeddings pipelines that power internet-scale visual search.

According to Andrew Stewart, PhD and senior data and applied scientist with Microsoft Bing Multimedia, “The Bing Visual Search team achieved a remarkable 5.13x end-to-end throughput improvement for an offline indexing pipeline running on billions of images using NVIDIA acceleration technology including TensorRT, CV-CUDA, and nvImageCodec, resulting in significant energy and cost savings. For online systems, like Visual Intent from Bing, such improvements mean quicker results for users, or the ability to incorporate additional functionality within the expected latency budget.”

Implementation	Throughput	Speedup
Baseline using OpenCV + ONNXRuntime-CUDA	88 QPS	–
Pipeline using OpenCV + ONNXRuntime-TensorRT	356 QPS	4.05
Pipeline using nvImageCodec + CV-CUDA + ONNXRuntime-TensorRT	452 QPS	5.14

		Average image size per batch
Library	Process	Small (~400×400)	Medium (~800×800)	Large (~1600×1600)
OpenCV Single-threaded CPU	Image decode	28.4 ms	162.3 ms	406.2 ms
	Preprocessing	6.3 ms	14.4 ms	24.0 ms
	Total	34.7 ms	176.7 ms	430.2 ms
nvImageCodec + CV-CUDA GPU-accelerated	Image decode	5.9 ms	29.4 ms	62.7 ms
	Preprocessing	3.2 ms	5.2 ms	6.3 ms
	Total	9.1 ms	34.6 ms	69.0 ms
GPU acceleration speedup		3.8x	5.1x	6.2x

Optimizing Microsoft Bing Visual Search with NVIDIA Accelerated Libraries

Multimodal AI and visual search

Microsoft Bing Visual Search

Model and pipeline prior to optimizations

Optimizing the visual embeddings model pipeline

Results

Conclusion

Realize the Full Potential of Generative AI by Overcoming Inferencing Roadblocks

Perplexity lets you search your internal enterprise files and the web

A Transformative Open-Source Language Model for Versatile Program Synthesis

Beyond the Buzzwords, Building a Resilient Future

ChatGPT’s Canvas now shows tracked changes

Related articles

Amazon Backs Nuclear with $500 Million for Small Modular Reactors

218 Layers With Superior Scaling

DapuStor and Memblaze Target Global Expansion with State-of-the-Art Enterprise SSDs

Borderlands Will Always Shoot to the Beat of Its Own Gun