pytorchrl
stable

Getting started:

  • Motivation
  • Installation

PyTorchRL API:

  • Agent Components
    • Actors
    • Algorithms
    • Available Environments
    • Environment Vectors
    • Storage
  • Training Components

Tutorials:

  • Breaking down PyTorchRL
  • Create a custom environment

Code examples:

  • Simplified Code Examples
  • Unity 3D Obstacle Tower Environment
pytorchrl
  • »
  • Agent Components
  • Edit on GitHub

Agent Components

  • Actors
    • Off-Policy Actor
    • On-Policy Actor
    • Model-Based Actor
    • Actor Components
      • Action prob. distributions
      • Feature Extractors
      • Memory Networks
      • Noise
      • Reward Functions
      • World Models
  • Algorithms
    • Off-policy
      • Double Deep Q-Learning (DDQN)
      • Deep Deterministic Policy Gradient (DDPG)
      • Twin Delayed Deep Deterministic (TD3)
      • Soft Actor Critic (SAC)
      • Maximum a Posteriori Policy Optimization (MPO)
    • On-policy
      • Advantage Actor Critic (A2C)
      • Proximal Policy Optimization (PPO)
      • Proximal Policy Optimization (PPO) with Random Network Distillation (RND)
    • Model-Based
      • Model Predictive Control (MPC) Random Shooting (RS)
      • Model Predictive Control (MPC) Cross-Entropy Method (CEM)
      • Model Predictive Control (MPC) Deep Dynamics Models (PDDM)
  • Available Environments
  • Environment Vectors
    • VecEnv
  • Storage
    • Off-policy
      • Replay buffer
      • N-Step Replay Buffer
      • Prioritized Experience Replay Buffer
      • Emphasizing Recent Experience Replay Buffer (ERE)
      • Hindsight experience replay buffer (HER)
    • On-Policy
      • Vanilla On-Policy Buffer
      • Generalized Advantage Estimator (GAE) Buffer
      • V-trace Buffer
      • Proximal Policy Optimization with Demonstrations Buffer (PPOD)
    • Model-Based
      • ModelBased Replay Buffer
Previous Next

© Copyright 2020, pytorchrl. Revision ec0c319c.

Built with Sphinx using a theme provided by Read the Docs.