UC Berkeley researchers have developed Language Embedded Radiance Fields (LERF), a technology that combines large language models with Neural Radiance Fields (NeRFs) to accurately recognize 3D objects in real-time virtual environments. NeRFs are a graphics technology that can quickly and efficiently transform the real world into a high-quality 3D representation. By integrating CLIP vectors into NeRFs, LERF can extract 3D relevance maps, which can be searched using natural language.
LERF enables accurate recognition of 3D objects without additional training. For instance, a user can search for a specific book in the virtual environment of a bookstore, and LERF can identify and tag the book with pixel accuracy on the first try. The technology doesn’t require region proposals, masks, or fine tuning, making it highly efficient.
Google is working on integrating NeRFs of real places like restaurants or stores into Google Maps. LERF technology can enable virtual searches in these scanned real-world locations, providing users with a fast and accurate search experience.
However, LERFs are static, which means that for a real-time search for the nearest supermarket, a multimodal search using normal 2D webcam images would be more appropriate. Nevertheless, for a guided virtual reality (VR) tour of a real store, the combination of LERF and NeRF would be sufficient.
LERF can bridge the gap between large language models and digital worlds that are very close to reality. One experiment involved using ChatGPT to generate a list of tasks for cleaning up a kitchen where coffee had been spilled. LERF could map all the actions suggested by ChatGPT to areas and objects relevant to the steps in a kitchen NeRF using its 3D relevance map. The system can distinguish between different types of donuts, such as chocolate and blueberry, and even identify the brand.
The research team sees potential applications in robotics, including visual robot training in simulations, better understanding of the capabilities of visual-language models, and interacting with and in 3D worlds.
Nerfstudio, a widely-used research codebase. A demonstration video exhibits a user inputting queries and observing real-time LERF results. The potential for creative utilization of natural language NeRF interaction is enthusiastic for all NeRFans.