1 UCLA      
2 SJTU      
3 SCUT      
4 UIUC      
5 MIT      
6 MIT-IBM Watson AI Lab      
7 Umass Amherst
     
In this work, we propose to inject the 3D world into large language models, and introduce a whole new family of 3D-LLMs. Specifically, 3D-LLMs can take 3D point clouds and their features as input and perform a diverse set of 3D-related tasks, including captioning, dense captioning, 3D question answering, task decomposition, 3D grounding, 3D-assisted dialog, navigation, and so on.
A 3D model of a bed with a wooden frame and a mattress. | Black and white table with stairs. | A 3D model of a small, old, and ruined castle with a doorway and stairs |
If you use this work or find it helpful, please consider citing: (bibtex)
@article{3dllm, author = {Hong, Yining and Zhen, Haoyu and Chen, Peihao and Zheng, Shuhong and Du, Yilun and Chen, Zhenfang and Gan, Chuang}, title = {3D-LLM: Injecting the 3D World into Large Language Models}, journal = {arXiv}, year = {2023}, }
Thanks to Justin Kerr for the website template.