Semantic Fetch Robot
A ROS 2 mobile manipulation system that accepts a natural-language object request, navigates an indoor environment, locates the target using open-vocabulary vision, and physically retrieves it using a mounted robotic arm.
TurtleBot + OpenMANIPULATOR-X Gazebo Harmonic Differential Drive LiDAR + OAK-D RGB-D Active Development
On this page
- Project Statement
- The Problem We Are Solving
- Project Components
- Technical Specifications
- Current Status
Project Statement
The Semantic Fetch Robot is a ROS 2 mobile manipulator operating in a mapped indoor warehouse environment (TurtleBot default Gazebo Harmonic depot world). Given a text command such as “fetch the red bottle”, it autonomously navigates to the relevant region, locates the requested object using open-vocabulary visual detection, grasps it with the mounted OpenMANIPULATOR-X arm, and returns to the operator to deliver the item.
Success state: The robot correctly identifies, grasps, and delivers the requested object in ≥ 75% of trials within a pre-mapped simulation environment, without collisions.
The Problem We Are Solving
Most service robots either move well or manipulate well, but few can do both while also understanding flexible, natural-language requests. Semantic Fetch aims to bridge this gap by enabling a mobile robot to understand what object is being requested, infer where it is likely to be, navigate to it, and bring it back safely.
To achieve this, we combine SLAM-based (Simultaneous Localization and Mapping) navigation with semantically grounded object detection and arm motion planning in a single ROS 2 pipeline. The project focuses on building realistic simulations that serve as a proof of concept before deployment to real hardware.
Project Components
Semantic Mapping
SLAM-built occupancy grid enriched with object detections. Every item detected is registered in a queryable map with its 3D location and semantic label.
Open-Vocabulary Detection
YOLOWorld or CLIP-based detection on the OAK-D camera stream which allows the robot to find objects described in free text without a fixed class list.
Technical Specifications
| Parameter | Value |
|---|---|
| Robot Platform | TurtleBot Standard + OpenMANIPULATOR-X |
| Kinematic Model | Differential Drive (base) + Serial 4-DOF (arm) |
| Primary Sensors | OAK-D Spatial AI Stereo Camera, RPLIDAR A1 2D LiDAR, IMU |
| Simulation Engine | Gazebo Harmonic (gz-harmonic) |
| Simulation World | TurtleBot default depot world |
| ROS Version | ROS 2 Jazzy Jalisco |
| OS | Ubuntu 24.04 LTS |
Current Status
- Milestone 1 (proposal and architecture) has been completed; details are documented in Milestone 1.
- Milestone 2 (Implementation, integration, and evaluation in simulation) is completed: details are documented in Milestone 2.
- Object spawning + full pick-and-place pipeline are planned next.