We introduce Constrained Human-AI Cooperation (CHAIC), an inclusive embodied social intelligence challenge designed to test social perception and cooperation in embodied agents. In CHAIC, the goal is for an embodied agent equipped with egocentric observations to assist a human who may be operating under physical constraints—e.g., unable to reach high places or confined to a wheelchair—in performing common household or outdoor tasks as efficiently as possible. To achieve this, a successful helper must: (1) infer the human's intents and constraints by following the human and observing their behaviors (social perception), and (2) make a cooperative plan tailored to the human user to solve the task as quickly as possible, working together as a team (cooperative planning).
To benchmark this challenge, we create four new agents with real physical constraints and eight long-horizon tasks featuring both indoor and outdoor scenes with various constraints, emergency events, and potential risks. We benchmark planning- and learning-based baselines on the challenge and introduce a new method that leverages Large Language Models and behavior modeling. Empirical evaluations demonstrate the effectiveness of our benchmark in enabling systematic assessment of key aspects of machine social intelligence.
The Constrained Human-AI Cooperation (CHAIC) Challenge seeks to study how embodied agents perform on the social perception of human users with diverse physical constraints. we design and implement four new agents with real physical constraints, and eight tasks featuring both indoor and outdoor scenes including emergency events.
For each task, there is a constrained agent mimicking a human user with capability constraints trying to find and transport some target objects to a specific goal location, and a helper agent trying to infer the constrained agent's goal and capability constraints through active perception of the constrained agent's behaviors.
The four kinds of agents new are named child agent, wheelchair agent, bicycle agent, and frail agent with specific constraints, and their example video clips are shown here:
The child agent has a limited height and may fail to reach high locations. Meanwhile, it may break fragile objects.
The wheelchair agent cannot go through obstacles, and the helper agent needs to remove the obstacles for the wheelchair agent to pass. Meanwhile, the wheelchair agent may fail to reach objects which is too low or too high.
The bicycle agent is slow to move and act. Sometimes its child may run away and the helper needs to catch it.
The frail agent may fail to pick up heavy objects, the heavier the object, the more likely the pick-up action fails. The helper agent can pick up objects with the frail agent together.
The eight kinds of tasks contain both indoor and outdoor scenes, and the task names are: No constraint, Low target, Obstacle, High target, High goal location, High container, Shopping, and Moving house. Here is their brief description:
We test six types of helpers: Random Helper, Rule-based Hierarchical Plan Helper (RHP), LLM+BM Helper, VLM Helper, RL Helper and SmartHelp Helper. LLM+BM Helper achieves the best performance in our benchmark. Figure 2 is an overview of the LLM+BM Helper, which is equipped with specific modules for Perception, Behavior Modeling, Decision, and Execution. (1) The perception module detects objects from raw RGB images; (2) the memory module builds the semantic map of the environment and records behaviors; (3) the behavior modeling module recognizes the action of the partner and localizes the object corresponding to the action; (4) the decision module decides plans for the next steps using foundation models; and (5) the execution module generates low-level actions.
The following video is a demonstration of the LLM+BM Helper's mechanism: