Back to Home

Mobile Robot Reinforcement Learning with Isaac Lab

Overview

Isaac Lab is a framework developed by NVIDIA for robot learning that integrates with Isaac Sim for photo-realistic simulations. The core idea is to develop "environments" using the Isaac Lab API. These environments define a task for a robot to learn, such as a quadruped learning how to walk forward. Then, when writing a script that defines a neural network architecture and a training function, this environment is passed in as a parameter when this training script is ran.
For this project, 3 Isaac Lab environments were developed for training a quadcopter and a rover on navigating to some goal, when given some input (x, y, z) waypoint.

Environment Creation Methodology

Workflow for creating and training Isaac Lab environments
Environment Design Workflow Graphic, from Isaac Lab Official Documentation

To engineer an Isaac Lab environment, these are the most vital functionalities that were implemented:

Environment Demonstrations

Iris Quadcopter Fly-To-And-Hover

Demonstration of multiple quadcopters trained to fly to and hover at a goal waypoint.

Explanation: The quadcopter observes its body frame linear velocity, body frame angular velocity, and the distance between itself and the goal for decision-making. The control policy is a simple feed-forward MLP.

Jackal Rover Grid-World Navigation

Demonstration of a Jackal rover trained to navigate to a goal landmark.

Explanation: The rover's initial pose has it starting off not being able to see the goal marker, so it explores the space until its camera is able to detect the goal. The control policy is a simple feed-forward CNN and MLP combination that uses 3-channel RGB images and GPS readings for decision-making.

Jackal Rover Variable-Height Terrain Navigation

Demonstration of a Jackal rover trained to navigate up an incline towards a goal.

Explanation: Just like with the grid-world, the rover explores its space until it is able to see the goal waypoint. The neural network architecture and sensor configurations remain unchanged. This control policy is able to navigate to (x,y,z) goals, rather than just (x,y) goals.