A Comparison of Imitation Learning Algorithms for Bimanual Manipulation

Michael Drolet^1,4, Simon Stepputtis*², Siva Kailas*², Ajinkya Jain ³, Jan Peters⁴, Stefan Schaal³, Heni Ben Amor¹

¹Interactive Robotics Lab, Arizona State University
² The Robotics Institute, Carnegie Mellon University
³ Intrinsic AI (An Alphabet Company)
⁴ Intelligent Autonomous Systems Lab, TU Darmstadt

Paper Code

Abstract

Amidst the wide popularity of imitation learning algorithms in robotics, their properties regarding hyperparameter sensitivity, ease of training, data efficiency, and performance have not been well-studied in high-precision industry-inspired environments. In this work, we demonstrate the limitations and benefits of prominent imitation learning approaches and analyze their capabilities regarding these properties. We evaluate each algorithm on a complex bimanual manipulation task involving an over-constrained dynamics system in a setting involving multiple contacts between the manipulated object and the environment. While we find that imitation learning is well suited to solve such complex tasks, not all algorithms are equal in terms of handling environmental and hyperparameter perturbations, training requirements, performance, and ease of use. We investigate the influence of these key characteristics by employing a carefully designed experimental procedure and learning environment.

Overview

Our experiments indicate that recent Imitation Learning methods, such as Diffusion and ACT, outperform traditional techniques (with respect to various metrics) on our high-precision adapter insertion task without the need for extensive hyperparameter tuning. The policy architectures proposed by ACT and Diffusion are effective when combined with action and observation chunking, helping to cope with the noisy (potentially non-Markovian) environment scenario. Interestingly, the online methods- DAgger and GAIL- exhibit high performance under noise perturbations without performing action and observation chunking. Such a result points to the important role these methods still play and demonstrates their ability to recover from unseen states via exploring during training. However, the need for extensive environment interaction (GAIL) and a pre-defined oracle (DAgger) impose real-world challenges that future works must address to make them more practical. Finally, we present a carefully designed environment that can serve as (i) a future benchmark for robot learning methods and (ii) a reference for practitioners in the industry.

A high-level overview of the key metrics and results from our study.

Learned Policies

Every method is trained using a subset of expert demonstrations (either 50, 100, or 200), 10 random seeds, and in three different environments (Zero Noise, Low Noise, or High Noise). The policies during training are presented below. Row 1 corresponds to 10% of training completion; row 2 corresponds to 50% of completion; and row 3 corresponds to 100% completion. Column 1 corresponds to the Zero Noise environment; column 2 corresponds to the Low Noise environment; and column 3 corresponds to the High Noise environment, where noise is applied to both the observations and actions.

It can be seen that vanilla Behavioral Cloning (BC), Implicit Behavioral Cloning (IBC), and DAgger struggle in terms of training stability, having relatively inconsistent performance over time. However, in the default environment (Zero Noise with 200 expert deomonstrations) the performance is often satisfactory when choosing the best policy out of 10 evenly-spaced checkpoints. On the other hand, ACT and Diffusion exhibit stable behavior throughout the training process and, comparatively, quickly obtain good policy parameters. Notably, the GAIL policies exhibit strong performance toward the end of training- in all environments, once the discriminator and generator/policy reach a stable state. However, the amount of time required for this to happen is significantly higher than the other methods.

Related Material

Our codebase inherits features from IRL Control, which supports:

Low-level Controllers including Operational Space Control and Admittance Control.
Demo Collection with PS Move controllers and 3D Connexion Space Mouse for teleoperation.
Configuration Files for tuning PID Gains, min/max velocities, and adding kinematic descriptions of the robot devices.

As our work focuses on learning in the operational/task space, we extend IRL Control to implicitly control the robot's torso through the nullspace. Thus, the action space can be characterized by using only the change in position/rotation of two end effectors. Some of the key features available are described below.

Data Collection and Teleoperation

The user teleoperates the robot for picking up objects in the scene. PS Move controllers allow for opening/closing the gripper and selecting a target which the robot moves to when the trigger is pressed. This setup can be used for collecting expert demonstrations, especially if the task is not sufficiently difficult.

Force/Torque Sensing

This example makes use of the robot's force/torque sensor to demonstrate how the admittance controller reacts to the external environment. The target position of the left arm is given, such that, upon hitting the wall, the arm must gently bounce backward insteading of pushing against the wall (with a larger force) to reach the desired position.

Single Arm Insertion Tasks

The robot performs adapter insertion, with each component spawned at a randomly generated angle and position. The robot must precisely align and insert the components with a small tolerance and execute a sequence of actions defined by a YAML file. This file defines instructions that can be executed on either arm, as well as: how much force the gripper should apply at each stage of the task, the velocity constraints, and the objects involved in the action (containing information on the proper location and orientation for picking/inserting).

PID Gain Tuning

This example evaluates the PID gains to ensure that the torso (which is explicitly controlled in this scenario) and passive arm are stable under high torques from the moving arm. It can be seen that the right arm is vigorously moving back-and-forth in order to apply a high torque to the rest of the robot's body. The torso and left arm are stable under these forces, suggesting that the chosen gains are adequate for tasks requiring a high-speed arm movement.

BibTeX

@article{drolet2024,
        author    = {Drolet, Michael and Stepputtis, Simon and Kailas, Siva and Jain, Ajinkya and Peters, Jan and Schaal, Stefan and Ben Amor, Heni},
        title     = {A Comparison of Imitation Learning Algorithms for Bimanual Manipulation},
        journal   = {IEEE Robotics and Automation Letters (RA-L)},
        year      = {2024},
      }