RLlib Integration

The RLlib integration brings support between the Ray/RLlib library and CARLA, allowing the easy use of the CARLA environment for training and inference purposes. Ray is an open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.

The RLlib integration allows users to create and use CARLA as an environment of Ray and use that environment for training and inference purposes. The integration is ready to use both locally and in the cloud using AWS.

In this guide we will outline the requirements needed for running the RLlib integration both locally and on AWS, the structure of the integration repository, an overview of how to use the library and then an example of how to set up a Ray experiment using CARLA as an environment.


Before you begin

  • Download the RLlib integration from GitHub or clone the repository directly:
    git clone https://github.com/carla-simulator/rllib-integration.git
  • Requirements vary depending on if you are running locally or on AWS:
Requirements for running locally
  • Install a package version of CARLA and import the additional assets. The recommended version is CARLA 0.9.11 as the integration was designed and tested with this version. Other versions may be compatible but have not been fully tested, so use these at your own discretion.
  • Navigate into the root folder of the RLlib integration repository and install the Python requirements:
            pip3 install -r requirements.txt
  • Set an environment variable to locate the CARLA package by running the command below or add CARLA_ROOT=path/to/carla to your .bashrc file:
            export CARLA_ROOT=path/to/carla
Requirements for running on AWS Cloud
  • The requirements for running on AWS are taken care of automatically in an install script found in the RLlib integration repository. Find more details in the section "Running on AWS".

RLlib repository structure

The repository is divided into three directories:

  • rllib_integration contains all the infrastructure related to CARLA and how to set up the CARLA server, clients and actors. This provides the basic structure that all training and testing experiments must follow.
  • aws has the files needed to run in an AWS instance. aws_helper.py provides several functionalities that ease the management of EC2 instances, including instance creation and sending and receiving data.
  • dqn_example and the dqn_* files in the root directory provide an easy-to-understand example on how to set up a Ray experiment using CARLA as its environment.

Creating your own experiment

This section provides a general overview on how to create your own experiment. For a more specific example, see the next section "DQN example".

You will need to create at least four files:

  • The experiment class
  • The environment configuration
  • The training and inference scripts

1. The experiment class

To use the CARLA environment you need to define a training experiment. Ray requires environments to return a series of specific information. You can see details on the CARLA environment in rllib-integration/rllib_integration/carla_env.py.

The information required by Ray is dependent on your specific experiment so all experiments should inherit from BaseExperiment. This class contains all the functions that need to be overwritten for your own experiment. These are all functions related to the actions, observations and rewards of the training.

2. The environment configuration

The experiment should be configured through a .yaml file. Any settings passed through the configuration file will override the default settings. The locations of the different default settings are explained below.

The configuration file has three main uses:

  1. Sets up most of the CARLA server and client settings, such as timeout or map quality. See the default values here.
  2. Sets up variables specific to your experiment as well as specifying town conditions and the spawning of the ego vehicle and its sensors. The default settings are found here and provide an example of how to set up sensors.
  3. Configures settings specific to Ray's training. These settings are related to the specific trainer used. If you are using a built-in model, you can apply settings for it here.

3. The training and inference scripts

The last step is to create your own training and inference scripts. This part is completely up to you and is dependent on the Ray API. If you want to create your own specific model, check out Ray's custom model documentation.


DQN example

This section builds upon the previous section to show a specific example on how to work with the RLlib integration using the BirdView pseudosensor and Ray's DQNTrainer.

The structure of the DQN example is as follows:

To run the example locally:

  1. Install pytorch:

    pip3 install -r dqn_example/dqn_requirements.txt
    
  2. Run the training file:

    python3 dqn_train.py dqn_example/dqn_config.yaml --name dqn
    

Note

The default configuration uses 1 GPU and 12 CPUs, so if your local machine doesn't have that capacity, lower the numbers in the configuration file.

If you experience out of memory problems, consider reducing the buffer_size parameter.


Running on AWS

This section explains how to use the RLlib integration to automatically run training and inference on AWS EC2 instances. To handle the scaling of instances we use the Ray autoscaler API.

Configure AWS

You will need to configure your boto3 environment correctly. Check here for more information.

Create the training AMI

Use the provided aws_helper.py script to automatically create the image needed for training by running the command below, passing in the name of the base image and the installation script install.sh found in rllib-integration/aws/install:

    python3 aws_helper.py create-image --name <AMI-name> --installation-scripts <installation-scripts> --instance-type <instance-type> --volume-size <volume-size>

Configure the cluster

Once the image is created, there will be an output with image information. To use the Ray autoscaler, update the <ImageId> and <SecurityGroupIds> settings in your autoscaler configuration file with the information from the output.

Run the training

With the image created, you can use Ray's API to run the training on the cluster:

  1. Initialize the cluster:

    ray up <autoscaler_configuration_file>
    
  2. (Optional) If the local code has been modified after the cluster initialization, run this command to update it:

    ray rsync-up <autoscaler_configuration_file> <path_to_local_folder> <path_to_remote_folder>
    
  3. Run the training:

    ray submit <autoscaler_configuration_file> <training_file>
    
  4. (Optional) Monitor the cluster status:

    ray attach <autoscaler_configuration_file>
    watch -n 1 ray status
    
  5. Shutdown the cluster:

    ray down <autoscaler_configuration_file>
    

Running the DQN example on AWS

To run the DQN example on AWS:

  1. Create the image by passing the dqn_example/dqn_autoscaler.yaml configuration to the following command:

    python3 aws_helper.py create-image --name <AMI-name> --installation-scripts install/install.sh --instance-type <instance-type> --volume-size <volume-size>
    
  2. Update the <ImageId> and <SecurityGroupIds> settings in dqn_autoscaler.yaml with the information provided by the previous command.

  3. Initialize the cluster:

    ray up dqn_example/dqn_autoscaler.yaml
    
  4. (Optional) Update remote files with local changes:

    ray rsync-up dqn_example/dqn_autoscaler.yaml dqn_example .
    ray rsync-up dqn_example/dqn_autoscaler.yaml rllib_integration .
    
  5. Run the training:

    ray submit dqn_example/dqn_autoscaler.yaml dqn_train.py -- dqn_example/dqn_config.yaml --auto
    
  6. (Optional) Monitor the cluster status:

    ray attach dqn_example/dqn_autoscaler.yaml
    watch -n 1 ray status
    
  7. Shutdown the cluster:

    ray down dqn_example/dqn_autoscaler.yaml
    

This guide has outlined how to install and run the RLlib integration on AWS and on a local machine. If you have any questions or ran into any issues working through the guide, feel free to post in the forum or raise an issue on GitHub.