Reward Functions

Gym CartPole

gym_reward_functions.cartpole(action: torch.Tensor, next_state: torch.Tensor) → torch.Tensor: Based on https://arxiv.org/pdf/1907.02057.pdf reward = cos(θ_t) - 0.01x²

Gym Pendulum

gym_reward_functions.pendulum(action: torch.Tensor, next_state: torch.Tensor) → torch.Tensor

MuJoCO Inverted Pendulum

gym_reward_functions.inverted_pendulum_mujoco(action: torch.Tensor, next_state: torch.Tensor) → torch.Tensor

Env info: https://github.com/openai/gym/blob/master/gym/envs/mujoco/inverted_pendulum.py Reward function based on: https://arxiv.org/pdf/1907.02057.pdf

reward = - theta², where theta = state[1]

MuJoCo HalfCheetah

gym_reward_functions.halfcheetah_mujoco(action: torch.Tensor, next_state: torch.Tensor) → torch.Tensor: First 8 values in the state are position data other 9 are position velocities (x,y,z) and rest angular -> idx 8 is x_velocitiy

PyBullet HalfCheetah

pybullet_reward_functions.halfcheetah_bullet(action: torch.Tensor, next_state: torch.Tensor) → torch.Tensor: HalfCheetahBulletEnv-v0 velocity is 3 idx: https://github.com/bulletphysics/bullet3/blob/478da7469a34074aa051e8720734287ca371fd3e/examples/pybullet/gym/pybullet_envs/robot_locomotors.py#L64