3D-LLM

Injecting the 3D World into Large Language Models



1 UCLA       2 SJTU       3 SCUT       4 UIUC       5 MIT       6 MIT-IBM Watson AI Lab       7 Umass Amherst
     



Overview

In this work, we propose to inject the 3D world into large language models, and introduce a whole new family of 3D-LLMs. Specifically, 3D-LLMs can take 3D point clouds and their features as input and perform a diverse set of 3D-related tasks, including captioning, dense captioning, 3D question answering, task decomposition, 3D grounding, 3D-assisted dialog, navigation, and so on.

Click the thumbnails below to load 3D scenes.


How to Inject the 3D World into Large Language Models?

pipeline

What can 3D-LLM do?

Navigation



Grounding



Question Answering



Captioning

A 3D model of a bed with a wooden frame and a mattress. Black and white table with stairs. A 3D model of a small, old, and ruined castle with a doorway and stairs


And more...

pipeline

Citation

If you use this work or find it helpful, please consider citing: (bibtex)

@article{3dllm,
 author = {Hong, Yining and Zhen, Haoyu and Chen, Peihao and Zheng, Shuhong and Du, Yilun and Chen, Zhenfang and Gan, Chuang},
 title = {3D-LLM: Injecting the 3D World into Large Language Models},
 journal = {arXiv},
 year = {2023},
} 


Thanks to Justin Kerr for the website template.