Action prob. distributions
Categorical
- class pytorchrl.agent.actors.distributions.categorical.Categorical(num_inputs, num_outputs)[source]
Bases:
torch.nn.modules.module.ModuleCategorical probability distribution.
- Parameters
num_inputs (int) – Size of input feature maps.
num_outputs (int) – Number of options in output space.
- evaluate_pred(x, pred)[source]
Return log prob of pred under the distribution generated from x (obs features). Also return entropy of the generated distribution.
- Parameters
x (torch.tensor) – obs feature map obtained from a policy_net.
pred (torch.tensor) – Prediction to evaluate.
- Returns
logp (torch.tensor) – Log probability of pred according to the predicted distribution.
entropy_dist (torch.tensor) – Entropy of the predicted distribution.
dist (torch.Distribution) – Action probability distribution.
- forward(x, deterministic=False)[source]
Predict distribution parameters from x (obs features) and return predictions (sampled and clipped), sampled log probability and distribution entropy.
- Parameters
x (torch.tensor) – Feature maps extracted from environment observations.
deterministic (bool) – Whether to randomly sample from predicted distribution or take the mode.
- Returns
pred (torch.tensor) – Predicted value.
clipped_pred (torch.tensor) – Predicted value (clipped to be within [-1, 1] range).
logp (torch.tensor) – Log probability of pred according to the predicted distribution.
entropy_dist (torch.tensor) – Entropy of the predicted distribution.
dist (torch.Distribution) – Action probability distribution.
- training: bool
Gaussian
- class pytorchrl.agent.actors.distributions.gaussian.DiagGaussian(num_inputs, num_outputs, predict_log_std=False)[source]
Bases:
torch.nn.modules.module.ModuleIsotropic gaussian probability distribution.
- Parameters
num_inputs (int) – Size of input feature maps.
num_outputs (int) – Number of dims in output space.
predict_log_std (bool) – Whether to use a nn.linear layer to predict the output std.
- evaluate_pred(x, pred)[source]
Return log prob of pred under the distribution generated from x (obs features). Also return entropy of the generated distribution.
- Parameters
x (torch.tensor) – obs feature map obtained from a policy_net.
pred (torch.tensor) – Prediction to evaluate.
- Returns
logp (torch.tensor) – Log probability of pred according to the predicted distribution.
entropy_dist (torch.tensor) – Entropy of the predicted distribution.
dist (torch.Distribution) – Action probability distribution.
- forward(x, deterministic=False)[source]
Predict distribution parameters from x (obs features) and return predicted values (sampled and clipped), sampled log probability and distribution entropy.
- Parameters
x (torch.tensor) – Feature maps extracted from environment observations.
deterministic (bool) – Whether to randomly sample from predicted distribution or take the mode.
- Returns
pred (torch.tensor) – Predicted value.
clipped_pred (torch.tensor) – Predicted value (clipped to be within [-1, 1] range).
logp (torch.tensor) – Log probability of pred according to the predicted distribution.
entropy_dist (torch.tensor) – Entropy of the predicted distribution.
dist (torch.Distribution) – Action probability distribution.
- training: bool
Squashed Gaussian
- class pytorchrl.agent.actors.distributions.squashed_gaussian.SquashedGaussian(num_inputs, num_outputs, predict_log_std=True)[source]
Bases:
torch.nn.modules.module.ModuleSquashed Gaussian probability distribution.
- Parameters
num_inputs (int) – Size of input feature maps.
num_outputs (int) – Number of dims in output space.
predict_log_std (bool) – Whether to use a nn.linear layer to predict the output std.
- evaluate_pred(x, pred)[source]
Return log prob of pred under the distribution generated from x (obs features). Also return entropy of the generated distribution.
- Parameters
x (torch.tensor) – obs feature map obtained from a policy_net.
pred (torch.tensor) – Prediction to evaluate.
- Returns
logp (torch.tensor) – Log probability of pred according to the predicted distribution.
entropy_dist (torch.tensor) – Entropy of the predicted distribution.
dist (torch.Distribution) – Action probability distribution.
- forward(x, deterministic=False)[source]
Predict distribution parameters from x (obs features) and return predicted values (sampled and clipped), sampled log probability and distribution entropy.
- Parameters
x (torch.tensor) – Feature maps extracted from environment observations.
deterministic (bool) – Whether to randomly sample from predicted distribution or take the mode.
- Returns
pred (torch.tensor) – Predicted value.
clipped_pred (torch.tensor) – Predicted value (clipped to be within [-1, 1] range).
logp (torch.tensor) – Log probability of pred according to the predicted distribution.
entropy_dist (torch.tensor) – Entropy of the predicted distribution.
dist (torch.Distribution) – Action probability distribution.
- training: bool