pytorchrl.agent.actors package
Subpackages
- pytorchrl.agent.actors.distributions package
- pytorchrl.agent.actors.feature_extractors package
- Submodules
- pytorchrl.agent.actors.feature_extractors.cnn module
- pytorchrl.agent.actors.feature_extractors.dictnet module
- pytorchrl.agent.actors.feature_extractors.ensemble_layer module
- pytorchrl.agent.actors.feature_extractors.fixup_cnn module
- pytorchrl.agent.actors.feature_extractors.mlp module
- pytorchrl.agent.actors.feature_extractors.utils module
- Module contents
- pytorchrl.agent.actors.memory_networks package
- pytorchrl.agent.actors.noise package
- pytorchrl.agent.actors.reward_functions package
- pytorchrl.agent.actors.world_models package
Submodules
pytorchrl.agent.actors.base module
- class pytorchrl.agent.actors.base.Actor(device, input_space, action_space, checkpoint=None, *args)[source]
Bases:
torch.nn.modules.module.Module,abc.ABCBase class for all Actors.
- Parameters
device (torch.device) – CPU or specific GPU where class computations will take place.
input_space (gym.Space) – Environment observation space.
action_space (gym.Space) – Environment action space.
checkpoint (str) – Path to a previously trained Actor checkpoint to be loaded.
- abstract actor_initial_states(obs, *args)[source]
Returns all policy inputs to predict the environment initial action.
- Parameters
obs (torch.tensor) – Initial environment observation.
- Returns
obs (torch.tensor) – Initial environment observation.
rhs (torch.tensor) – Initial recurrent hidden state.
done (torch.tensor) – Initial done tensor, indicating the environment is not done.
- abstract classmethod create_factory(device, input_space, action_space, checkpoint=None, *args)[source]
Returns a function that creates actor critic instances.
- Parameters
device (torch.device) – CPU or specific GPU where class computations will take place.
input_space (gym.Space) – Environment observation space.
action_space (gym.Space) – Environment action space.
checkpoint (str) – Path to a previously trained Actor checkpoint to be loaded.
- Returns
create_actor_instance – creates a new Actor class instance.
- Return type
func
- abstract property is_recurrent
Returns True if the actor network are recurrent.
Size of policy recurrent hidden state
- training: bool
pytorchrl.agent.actors.model_based_planner_actor module
- class pytorchrl.agent.actors.model_based_planner_actor.ModelBasedPlannerActor(device, horizon, n_planner, input_space, action_space, algorithm_name, checkpoint=None, world_model_class=None, world_model_kwargs={})[source]
Bases:
pytorchrl.agent.actors.base.ActorActor Planner class for MB agents.
- actor_initial_states(obs)[source]
Returns all actor inputs required to predict initial action.
- Parameters
obs (torch.tensor) – Initial environment observation.
- Returns
obs (torch.tensor) – Initial environment observation.
rhs (dict) – Initial recurrent hidden state (will contain zeroes).
done (torch.tensor) – Initial done tensor, indicating the environment is not done.
- classmethod create_factory(input_space, action_space, algorithm_name, horizon, n_planner, restart_model=None, world_model_class=None, world_model_kwargs={})[source]
Returns a function that creates actor critic instances.
- Parameters
horizon (int) – The horizon of online planning.
n_planner (int) – Number of parallel planned trajectories.
input_space (gym.Space) – Environment observation space.
action_space (gym.Space) – Environment action space.
algorithm_name (str) – Name of the RL algorithm used for learning.
restart_model (str) – Path to a previously trained Actor checkpoint to be loaded.
world_model_class (class) – PyTorch nn.Module to approximate world dynamics.
world_model_kwargs – Keyword arguments for the world model class.
- Returns
create_actor_instance – creates a new OffPolicyActor class instance.
- Return type
func
- create_world_dynamics_model(world_model_class, world_model_kwargs)[source]
Create a world model instance and define it as class attribute under the name world_wodel.
- Parameters
world_model_class (class) – WorldModel class
world_model_kwargs (dict) – WorldModel class arguments
- get_prediction(obs, act, rhs, done, deterministic=False)[source]
Predict and return next action, along with other information.
- Parameters
obs (torch.tensor) – Current environment observation.
act (torch.tensor) – Action to take given obs.
rhs (dict) – Current recurrent hidden states.
done (torch.tensor) – Current done tensor, indicating if episode has finished.
deterministic (bool) – Whether to randomly sample action from predicted distribution or take the mode.
- Returns
next_states (torch.Tensor) – Next states.
rewards (torch.Tensor) – Reward prediction.
- property is_recurrent
Returns True if the actor network are recurrent.
Size of policy recurrent hidden state
- training: bool
pytorchrl.agent.actors.off_policy_actor module
- class pytorchrl.agent.actors.off_policy_actor.OffPolicyActor(device, input_space, action_space, algorithm_name, noise=None, checkpoint=None, sequence_overlap=0.5, recurrent_net=None, recurrent_net_kwargs={}, obs_feature_extractor=None, obs_feature_extractor_kwargs={}, act_feature_extractor=None, act_feature_extractor_kwargs={}, common_feature_extractor=None, common_feature_extractor_kwargs={}, num_critics=2)[source]
Bases:
pytorchrl.agent.actors.base.ActorActor critic class for Off-Policy algorithms.
It contains a policy network (actor) to predict next actions and one or two Q networks.
- Parameters
device (torch.device) – CPU or specific GPU where class computations will take place.
input_space (gym.Space) – Environment observation space.
action_space (gym.Space) – Environment action space.
algorithm_name (str) – Name of the RL algorithm used for learning.
checkpoint (str) – Path to a previously trained Actor checkpoint to be loaded.
noise (str) – Type of exploration noise that will be added to the deterministic actions.
obs_feature_extractor (nn.Module) – PyTorch nn.Module to extract features from observation in all networks.
obs_feature_extractor_kwargs (dict) – Keyword arguments for the obs extractor network.
act_feature_extractor (nn.Module) – PyTorch nn.Module to extract features from actions in all networks.
act_feature_extractor_kwargs (dict) – Keyword arguments for the act extractor network.
common_feature_extractor (nn.Module) – PyTorch nn.Module to extract joint features from the concatenation of action and observation features.
common_feature_extractor_kwargs (dict) – Keyword arguments for the common extractor network.
recurrent_net (bool) – Whether to use a RNNs as feature extractors.
sequence_overlap (float) – From 0.0 to 1.0, how much consecutive rollout sequences will overlap.
recurrent_net_kwargs (dict) – Keyword arguments for the memory network.
num_critics (int) – Number of Q networks to be instantiated.
Examples
- actor_initial_states(obs)[source]
Returns all actor inputs required to predict initial action.
- Parameters
obs (torch.tensor) – Initial environment observation.
- Returns
obs (torch.tensor) – Initial environment observation.
rhs (dict) – Initial recurrent hidden state (will contain zeroes).
done (torch.tensor) – Initial done tensor, indicating the environment is not done.
- burn_in_recurrent_states(data_batch)[source]
Applies a recurrent burn-in phase to data_batch as described in (https://openreview.net/pdf?id=r1lyTjAqYX). Initial B steps are used to compute on-policy recurrent hidden states. data_batch is then updated, discarding B first steps in all tensors.
- Parameters
data_batch (dict) – data batch containing all required tensors to compute Algorithm loss.
- Returns
data_batch – Updated data batch after burn-in phase.
- Return type
dict
- create_critic(name)[source]
Create a critic q network and define it as class attribute under the name name. This actor defines defines q networks as:
obs_feature_extractor
- q = + common_feature_extractor + memory_net + q_prediction_layer
act_feature_extractor
- Parameters
name (str) – Critic network name.
- classmethod create_factory(input_space, action_space, algorithm_name, noise=None, restart_model=None, sequence_overlap=0.5, recurrent_net_kwargs={}, recurrent_net=None, obs_feature_extractor=None, obs_feature_extractor_kwargs={}, act_feature_extractor=None, act_feature_extractor_kwargs={}, common_feature_extractor=<class 'pytorchrl.agent.actors.feature_extractors.mlp.MLP'>, common_feature_extractor_kwargs={}, num_critics=2)[source]
Returns a function that creates actor critic instances.
- Parameters
input_space (gym.Space) – Environment observation space.
action_space (gym.Space) – Environment action space.
algorithm_name (str) – Name of the RL algorithm used for learning.
noise (str) – Type of exploration noise that will be added to the deterministic actions.
obs_feature_extractor (nn.Module) – PyTorch nn.Module to extract features from observation in all networks.
obs_feature_extractor_kwargs (dict) – Keyword arguments for the obs extractor network.
act_feature_extractor (nn.Module) – PyTorch nn.Module to extract features from actions in all networks.
act_feature_extractor_kwargs (dict) – Keyword arguments for the act extractor network.
common_feature_extractor (nn.Module) – PyTorch nn.Module to extract joint features from the concatenation of action and observation features.
common_feature_extractor_kwargs (dict) – Keyword arguments for the common extractor network.
recurrent_net (bool) – Whether to use a RNNs as feature extractors.
sequence_overlap (float) – From 0.0 to 1.0, how much consecutive rollout sequences will overlap.
recurrent_net_kwargs (dict) – Keyword arguments for the memory network.
num_critics (int) – Number of Q networks to be instantiated.
restart_model (str) – Path to a previously trained Actor checkpoint to be loaded.
- Returns
create_actor_instance – creates a new OffPolicyActor class instance.
- Return type
func
- create_policy(name)[source]
Create a policy network and define it as class attribute under the name name. This actor defines policy network as:
policy = obs_feature_extractor + common_feature_extractor + memory_net + action distribution
- Parameters
name (str) – Policy network name.
- evaluate_actions(obs, rhs, done, action)[source]
Evaluate log likelihood of action given obs and the current policy network. Returns also entropy distribution.
- Parameters
obs (torch.tensor) – Environment observation.
rhs (dict) – Recurrent hidden states.
done (torch.tensor) – Done tensor, indicating if episode has finished.
action (torch.tensor) – Evaluated action.
- Returns
logp_action (torch.tensor) – Log probability of action according to the action distribution predicted with current version of the policy_net.
entropy_dist (torch.tensor) – Entropy of the action distribution predicted with current version of the policy_net.
dist (torch.Distribution) – Predicted probability distribution over next action.
- get_action(obs, rhs, done, deterministic=False)[source]
Predict and return next action, along with other information.
- Parameters
obs (torch.tensor) – Current environment observation.
rhs (dict) – Current recurrent hidden states.
done (torch.tensor) – Current done tensor, indicating if episode has finished.
deterministic (bool) – Whether to randomly sample action from predicted distribution or take the mode.
- Returns
action (torch.tensor) – Next action sampled.
clipped_action (torch.tensor) – Next action sampled, but clipped to be within the env action space.
logp_action (torch.tensor) – Log probability of action within the predicted action distribution.
rhs (dict) – Updated recurrent hidden states.
entropy_dist (torch.tensor) – Entropy of the predicted action distribution.
dist (torch.Distribution) – Predicted probability distribution over next action.
- get_q_scores(obs, rhs, done, actions=None)[source]
Return Q scores of the given observations and actions.
- Parameters
obs (torch.tensor) – Environment observation.
rhs (dict) – Current recurrent hidden states.
done (torch.tensor) – Current done tensor, indicating if episode has finished.
actions (torch.tensor) – Evaluated actions.
- Returns
output – Dict containing value prediction from each critic under keys “q1”, “q2”, etc as well as the recurrent hidden states under the key “rhs”.
- Return type
dict
- property is_recurrent
Returns True if the actor network are recurrent.
Size of policy recurrent hidden state
- training: bool
pytorchrl.agent.actors.on_policy_actor module
- class pytorchrl.agent.actors.on_policy_actor.OnPolicyActor(device, input_space, action_space, algorithm_name, checkpoint=None, recurrent_net=None, recurrent_net_kwargs={}, feature_extractor_network=None, feature_extractor_kwargs={}, shared_policy_value_network=True)[source]
Bases:
pytorchrl.agent.actors.base.ActorActor critic class for On-Policy algorithms.
It contains a policy network to predict next actions and a critic value network to predict the value score of a given obs.
- Parameters
device (torch.device) – CPU or specific GPU where class computations will take place.
input_space (gym.Space) – Environment observation space.
action_space (gym.Space) – Environment action space.
algorithm_name (str) – Name of the RL algorithm used for learning.
checkpoint (str) – Path to a previously trained Actor checkpoint to be loaded.
recurrent_net (bool) – Whether to use a RNNs on top of the feature extractors.
recurrent_net_kwargs – Keyword arguments for the memory network.
feature_extractor_network (nn.Module) – PyTorch nn.Module used as the features extraction block in all networks.
feature_extractor_kwargs (dict) – Keyword arguments for the feature extractor network.
shared_policy_value_network (bool) – Whether or not to share weights between policy and value networks.
- actor_initial_states(obs)[source]
Returns all actor inputs required to predict initial action.
- Parameters
obs (torch.tensor) – Initial environment observation.
- Returns
obs (torch.tensor) – Initial environment observation.
rhs (dict) – Initial recurrent hidden states.
done (torch.tensor) – Initial done tensor, indicating the environment is not done.
- create_critic(name)[source]
Create a critic value network and define it as class attribute under the name name. This actor defines defines value networks as:
value = obs_feature_extractor + memory_net + v_prediction_layer
- and defines shared policy-value network as:
action_distribution
- value = obs_feature_extractor + memory_net +
v_prediction_layer
- Parameters
name (str) – Critic network name.
- classmethod create_factory(input_space, action_space, algorithm_name, restart_model=None, recurrent_net=None, recurrent_net_kwargs={}, feature_extractor_kwargs={}, feature_extractor_network=None, shared_policy_value_network=True)[source]
Returns a function that creates actor critic instances.
- Parameters
input_space (gym.Space) – Environment observation space.
action_space (gym.Space) – Environment action space.
algorithm_name (str) – Name of the RL algorithm_name used for learning.
restart_model (str) – Path to a previously trained Actor checkpoint to be loaded.
feature_extractor_network (nn.Module) – PyTorch nn.Module used as the features extraction block in all networks.
feature_extractor_kwargs (dict) – Keyword arguments for the feature extractor network.
recurrent_net (nn.Module) – PyTorch nn.Module to use after the feature extractors.
recurrent_net_kwargs – Keyword arguments for the memory network.
shared_policy_value_network (bool) – Whether or not to share weights between policy and value networks.
- Returns
create_actor_critic_instance – creates a new OnPolicyActor class instance.
- Return type
func
- create_policy(name)[source]
Create a policy network and define it as class attribute under the name name. This actor defines policy network as:
policy = obs_feature_extractor + memory_net + action_distribution
- Parameters
name (str) – Policy network name.
- evaluate_actions(obs, rhs, done, action)[source]
Evaluate log likelihood of action given obs and the current policy network. Returns also entropy distribution.
- Parameters
obs (torch.tensor) – Environment observation.
rhs (dict) – Recurrent hidden states.
done (torch.tensor) – Done tensor, indicating if episode has finished.
action (torch.tensor) – Evaluated action.
- Returns
logp_action (torch.tensor) – Log probability of action according to the action distribution predicted with current version of the policy_net.
entropy_dist (torch.tensor) – Entropy of the action distribution predicted with current version of the policy_net.
dist (torch.Distribution) – Predicted probability distribution over next action.
- get_action(obs, rhs, done, deterministic=False)[source]
Predict and return next action, along with other information.
- Parameters
obs (torch.tensor) – Current environment observation.
rhs (dict) – Current recurrent hidden states.
done (torch.tensor) – Current done tensor, indicating if episode has finished.
deterministic (bool) – Whether to randomly sample action from predicted distribution or take the mode.
- Returns
action (torch.tensor) – Next action sampled.
clipped_action (torch.tensor) – Next action sampled, but clipped to be within the env action space.
logp_action (torch.tensor) – Log probability of action within the predicted action distribution.
rhs (dict) – Updated recurrent hidden states.
entropy_dist (torch.tensor) – Entropy of the predicted action distribution.
dist (torch.Distribution) – Predicted probability distribution over next action.
- get_value(obs, rhs, done)[source]
Return all value scores of given observation.
- Parameters
obs (torch.tensor) – Environment observation.
rhs (dict) – Recurrent hidden states.
done (torch.tensor) – Done tensor, indicating if episode has finished.
- Returns
output – Dict containing value prediction from each critic under keys “value_net1”, “value_net2”, etc as well as the recurrent hidden states under the key “rhs”.
- Return type
dict
- get_value_specific_net(obs, rhs, done, value_net_name)[source]
Return value score for a single value network.
- Parameters
obs (torch.tensor) – Environment observation.
rhs (dict) – Recurrent hidden states.
done (torch.tensor) – Done tensor, indicating if episode has finished.
- Returns
value (torch.tensor) – Predicted value score.
rhs (dict) – Updated recurrent hidden states.
- property is_recurrent
Returns True if the actor network are recurrent.
Size of policy recurrent hidden state
- training: bool
pytorchrl.agent.actors.utils module
- class pytorchrl.agent.actors.utils.Scale(space)[source]
Bases:
torch.nn.modules.module.ModuleMaps inputs from [space.low, space.high] range to [-1, 1] range.
- Parameters
space (gym.Space) – Space to map from.
- low
Lower bound for unscaled Space.
- Type
torch.tensor
- high
Upper bound for unscaled Space.
- Type
torch.tensor
- forward(x)[source]
Maps x from [space.low, space.high] to [-1, 1].
- Parameters
x (torch.tensor) – Input to be scaled
- training: bool
- class pytorchrl.agent.actors.utils.Unscale(space)[source]
Bases:
torch.nn.modules.module.ModuleMaps inputs from [-1, 1] range to [space.low, space.high] range.
- Parameters
space (gym.Space) – Space to map from.
- low
Lower bound for unscaled Space.
- Type
torch.tensor
- high
Upper bound for unscaled Space.
- Type
torch.tensor
- forward(x)[source]
Maps x from [-1, 1] to [space.low, space.high].
- Parameters
x (torch.tensor) – Input to be unscaled
- training: bool
- pytorchrl.agent.actors.utils.init(module, weight_init, bias_init, gain=1)[source]
- Parameters
module (nn.Module) – nn.Module to initialize.
weight_init (func) – Function to initialize module weights.
bias_init (func) – Function to initialize module biases.
- Returns
module – Initialized module
- Return type
nn.Module