Interactive Articulated Objects
Explore our generated articulated objects below. Use the controls to interact with different joints and observe the realistic articulation behavior.
This project investigates novel approaches to combining language and vision modalities for improved performance on multimodal tasks. By leveraging pre-trained language models and vision encoders, we develop systems that can better understand and reason about visual content through natural language.
Our approach introduces a language-augmented framework that seamlessly integrates linguistic understanding with visual perception. This enables more robust and generalizable vision-language models capable of handling complex reasoning tasks.
We demonstrate that LAM achieves state-of-the-art performance on various multimodal benchmarks, showing significant improvements in tasks such as visual question answering, image captioning, and cross-modal retrieval.
Our method consists of three key components:
Method Diagram Placeholder
We evaluate our method on multiple benchmark datasets and demonstrate significant improvements over existing approaches in various vision-language tasks. Below are interactive examples of articulated objects generated by our model.
Explore our generated articulated objects below. Use the controls to interact with different joints and observe the realistic articulation behavior.
@inproceedings{gao2025lam,
author = {Gao, Yipeng and others},
title = {LAM: Language-Augmented Model},
booktitle = {Conference},
year = {2025},
}