MIT-Princeton at the Amazon Robotics Challenge

Humans possess a remarkable ability to grasp and recognize unfamiliar objects in the dynamic environments of everyday life. Inspired by this, the main goal of our research is to design robust and practical state-of-the-art solutions for robotic pick-and-place, a technology central to many applications: from picking packages in a logistics center to bin-picking in a manufacturing plant; from unloading groceries at home to clearing debris after a disaster.

In order to demonstrate the capabilities of our robot designs and algorithms, we put them to the test at the worldwide Amazon Robotics Challenge, competing aginst state-of-the-art solutions from world-class researchers and engineers from industry and academia (Mitsubishi, Panasonic, CMU, Duke, and more).

Here you will find links to our robotic pick-and-place solutions for the 2016 and 2017 edition of the Amazon Robotics Challenge. These research works came out of a wonderful collaboration between the MIT MCube Lab (robot manipulation) and the Princeton Vision and Robotics Group (robot perception).


Robotic Pick-and-Place of Novel Objects in Clutter
with Multi-Affordance Grasping and Cross-Domain Image Matching

★ 1st Place Winning Solution (Stow Task, 2017) ★

Under Review (arXiv 2018)

We present a robotic pick-and-place system that can grasp and recognize both known and novel objects in cluttered environments. The key new feature of the system is that it handles a wide range of object categories without needing any task-specific training data for novel objects. To achieve this, it learns to predict category-agnostic affordances for four grasping behaviors and recognizes picked objects by matching observed images to product images. Since product images are readily available for a wide range of objects (e.g., from the web), the system works out-of-the-box.


Multi-view Self-supervised Deep Learning for 6D Pose Estimation
in the Amazon Picking Challenge

★ 3rd Place Winning Solution (2016) ★

IEEE International Conference on Robotics and Automation (ICRA) 2017

We present a robot vision approach that recognizes objects and their 6D poses under a wide variety of scenarios. Our approach semantically segments and labels multiple RGB-D views of a scene with a fully convolutional neural network, and then fits pre-scanned 3D object models to the resulting segmentations to get 6D poses for all objects in the scene. We also propose a self-supervised method to generate a large labeled dataset for training segmentation deep neural networks that could be scaled up to more object categories easily without tedious manual annotations.