MIT-Princeton at the Amazon Robotics Challenge

Humans possess a remarkable ability to grasp and recognize objects in the dynamic environments of everyday life. Inspired by this, the main goal of our research is to design robust and practical state-of-the-art solutions for robotic pick-and-place, a technology central to many applications: from picking packages in a logistics center to bin-picking in a manufacturing plant; from unloading groceries at home to clearing debris after a disaster.

In order to demonstrate the capabilities of our robot designs and algorithms, we put them to the test at the worldwide Amazon Robotics Challenge, competing aginst state-of-the-art solutions from world-class researchers and engineers from industry and academia (Mitsubishi, Panasonic, CMU, Duke, and more).

Here you will find links to our robotic pick-and-place solutions for the 2016 and 2017 edition of the Amazon Robotics Challenge. These research works came out of a wonderful collaboration between the MIT MCube Lab (robot manipulation) and the Princeton Vision and Robotics Group (robot perception).


Robotic Pick-and-Place of Novel Objects in Clutter
with Multi-Affordance Grasping and Cross-Domain Image Matching

★ 1st Place Winning Solution (Stow Task, 2017) ★

IEEE International Conference on Robotics and Automation (ICRA) 2018

We present a robotic pick-and-place system that can grasp and recognize both known and novel objects in cluttered environments. The key new feature of the system is that it handles a wide range of object categories without needing any task-specific training data for novel objects. To achieve this, it learns to infer object-agnostic pixel-wise affordances for four grasping behaviors and recognizes picked objects by matching observed images to product images. Since product images are readily available for a wide range of objects (e.g., from the web), the system works out-of-the-box.


Multi-view Self-supervised Deep Learning for 6D Pose Estimation
in the Amazon Picking Challenge

★ 3rd Place Winning Solution (2016) ★

IEEE International Conference on Robotics and Automation (ICRA) 2017

We present a robot vision approach that recognizes objects and their 6D poses under a wide variety of scenarios. Our approach semantically segments and labels multiple RGB-D views of a scene with a fully convolutional neural network, and then fits pre-scanned 3D object models to the resulting segmentations to get 6D poses for all objects in the scene. We also propose a self-supervised method to generate a large labeled dataset for training segmentation deep neural networks that could be scaled up to more object categories easily without tedious manual annotations.