Injecting the 3D World into Large Language Models

1 UCLA       2 SJTU       3 SCUT       4 UIUC       5 MIT       6 MIT-IBM Watson AI Lab       7 Umass Amherst


In this work, we propose to inject the 3D world into large language models, and introduce a whole new family of 3D-LLMs. Specifically, 3D-LLMs can take 3D point clouds and their features as input and perform a diverse set of 3D-related tasks, including captioning, dense captioning, 3D question answering, task decomposition, 3D grounding, 3D-assisted dialog, navigation, and so on.

Click the thumbnails below to load 3D scenes.

How to Inject the 3D World into Large Language Models?


What can 3D-LLM do?



Question Answering


A 3D model of a bed with a wooden frame and a mattress. Black and white table with stairs. A 3D model of a small, old, and ruined castle with a doorway and stairs

And more...



If you use this work or find it helpful, please consider citing: (bibtex)

 author = {Hong, Yining and Zhen, Haoyu and Chen, Peihao and Zheng, Shuhong and Du, Yilun and Chen, Zhenfang and Gan, Chuang},
 title = {3D-LLM: Injecting the 3D World into Large Language Models},
 journal = {arXiv},
 year = {2023},

Thanks to Justin Kerr for the website template.