Simplified Code Examples
The simplified code examples have been created for users who are new to the field of Deep Reinforcement Learning. Due to the flexible configuration, it is possible for even newcomers to train a wide variety of algorithms in different environments without having to make any changes to the code. This allows quick testing and experimentation with the option to easily adjust important settings if necessary.
In the following, examples are given to explain how the settings can be adjusted.
Train Agents
Run Default Code Example
To run the code example execute:
python code_examples/simplified_code_examples/run.py
This will execute the default code example, running PPO 1 on the OpenAI Gym 2 environment CartPole-v0
.
Train Different Agent
To change the default code example and train another agent there are two ways to adapt the code. Either you go into the overall config and change in cfg/conf.yaml
the default agent parameter to another agent e.g. Soft Actor-Critic 3 (SAC). Or you just override the default configuration by an additional terminal input that defines the agent new, e.g. training sac
on the default CartPole-v0
environment:
python code_examples/simplified_code_examples/run.py agent=sac
For the possible agents you can train visit the section Available Algorithms in the documentation.
Train On Different Environment
In the case you want to train on a different environment you can change that similar to the agent in two ways either in the default conf.yaml file or via the terminal input, e.g. training sac
on the PyBullet 4 Environments:
python code_examples/simplified_code_examples/run.py agent=sac environment=pybullet
Here the default task is set to AntBulletEnv-v0
. If you want to change that just add the depending environment ID to the input. For example if you want to train on the HalfCheetahBulletEnv-v0
:
python code_examples/simplified_code_examples/run.py agent=sac environment=pybullet environment.task=HalfCheetahBulletEnv-v0
For the possible environments you can train the PyTorchRL agents visit the section Available Environments in the documentation.
Train On Your Custom Environment
Will be updated soon!
Advanced Training Config Changes
In this section we cover the options if you want to on top of agent and training environment also want to adapt the training scheme and agent details like architecture or storage.
Change Agent Details
In case you want to change the default parameter of the selected agent you can have a look at your specific agent in the config what hyperparameters exist and how they are set as default. In the case of PPO check:
code_examples/simplified_code_examples/cfg/agent/ppo.yaml
If you decide you want to change for example the learning rate for PPO you can do it the following way:
python code_examples/simplified_code_examples/run.py agent=ppo agent.ppo_config.lr=1.0e-2
Similar you can change any other hyperparameter in PPO or of other agents in PyTorchRL.
Change Agent Actor Architecture
Similarly to the agent hyperparameter you can also change the overall architecture of the actors. Meaning, add additional layer to the policy network of PPO or change to a recurrent policy at all. You can see all possible parameters to change at:
code_examples/simplified_code_examples/cfg/agent/actor
Inside here you have a yaml file for off-policy algorithms like DDPG, TD3, SAC and a on-policy file for algorithms like PPO. That said, if you decide to change the PPO policy to be a recurrent neural network you can do so with:
python code_examples/simplified_code_examples/run.py agent=ppo agent.actor.recurrent_nets=True
Change Agent Storage
Currently changes regarding the storage types need to be done directly in the config files. But this will be changed and updated in the future!
Change Training Scheme
In this section we show you how you can change the training scheme so that you can scale up your experiments. Will be updated soon!
Config
This section visualizes the overal config structure in case you want to dont want to adapt your training run parameters via terminal inputs and specify new default parameters.
Overall Config Structure
cfg
│ README.md
│ conf.yaml
│
└───agent
| | ppo.yaml
│ │ ddpg.yaml
│ │ td3.yaml
│ │ sac.yaml
│ │ mpo.yaml
│ │
│ └───actor
│ | off_policy.yaml
│ | on_policy.yaml
│ |
| └───storage
| gae_buffer.yaml
| replay_buffer.yaml
| her_buffer.yaml
|
└───scheme
| a3c.yaml
| apex.yaml
| ddppo.yaml
| default.yaml
| impala.yaml
| r2d2.yaml
| rapid.yaml
|
└───environment
atari.yaml
causalworld.yaml
crafter.yaml
gym.yaml
mujoco.yaml
pybullet.yaml
Available Algorithms
In this section you can see all possible algorithms that can be utilized with the simplified code examples.
Off-Policy Algorithms
On-Policy Algorithms
Proximal Policy Optimisation 1 (PPO) in the config used as
ppo
- 1(1,2)
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
- 2
Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. Openai gym. CoRR, 2016. URL: http://arxiv.org/abs/1606.01540, arXiv:1606.01540.
- 3(1,2)
Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. CoRR, 2018. URL: http://arxiv.org/abs/1801.01290, arXiv:1801.01290.
- 4
Erwin Coumans and Yunfei Bai. Pybullet, a python module for physics simulation for games, robotics and machine learning. http://pybullet.org, 2016–2021.
- 5
Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. Continuous control with deep reinforcement learning. 2015. URL: https://arxiv.org/abs/1509.02971, doi:10.48550/ARXIV.1509.02971.
- 6
Scott Fujimoto, Herke van Hoof, and David Meger. Addressing function approximation error in actor-critic methods. 2018. URL: https://arxiv.org/abs/1802.09477, doi:10.48550/ARXIV.1802.09477.
- 7
Abbas Abdolmaleki, Jost Tobias Springenberg, Yuval Tassa, Remi Munos, Nicolas Heess, and Martin Riedmiller. Maximum a posteriori policy optimisation. 2018. URL: https://arxiv.org/abs/1806.06920, doi:10.48550/ARXIV.1806.06920.