Action prob. distributions

Categorical

class pytorchrl.agent.actors.distributions.categorical.Categorical(num_inputs, num_outputs)[source]

Bases: torch.nn.modules.module.Module

Categorical probability distribution.

Parameters
  • num_inputs (int) – Size of input feature maps.

  • num_outputs (int) – Number of options in output space.

evaluate_pred(x, pred)[source]

Return log prob of pred under the distribution generated from x (obs features). Also return entropy of the generated distribution.

Parameters
  • x (torch.tensor) – obs feature map obtained from a policy_net.

  • pred (torch.tensor) – Prediction to evaluate.

Returns

  • logp (torch.tensor) – Log probability of pred according to the predicted distribution.

  • entropy_dist (torch.tensor) – Entropy of the predicted distribution.

  • dist (torch.Distribution) – Action probability distribution.

forward(x, deterministic=False)[source]

Predict distribution parameters from x (obs features) and return predictions (sampled and clipped), sampled log probability and distribution entropy.

Parameters
  • x (torch.tensor) – Feature maps extracted from environment observations.

  • deterministic (bool) – Whether to randomly sample from predicted distribution or take the mode.

Returns

  • pred (torch.tensor) – Predicted value.

  • clipped_pred (torch.tensor) – Predicted value (clipped to be within [-1, 1] range).

  • logp (torch.tensor) – Log probability of pred according to the predicted distribution.

  • entropy_dist (torch.tensor) – Entropy of the predicted distribution.

  • dist (torch.Distribution) – Action probability distribution.

training: bool

Gaussian

class pytorchrl.agent.actors.distributions.gaussian.DiagGaussian(num_inputs, num_outputs, predict_log_std=False)[source]

Bases: torch.nn.modules.module.Module

Isotropic gaussian probability distribution.

Parameters
  • num_inputs (int) – Size of input feature maps.

  • num_outputs (int) – Number of dims in output space.

  • predict_log_std (bool) – Whether to use a nn.linear layer to predict the output std.

evaluate_pred(x, pred)[source]

Return log prob of pred under the distribution generated from x (obs features). Also return entropy of the generated distribution.

Parameters
  • x (torch.tensor) – obs feature map obtained from a policy_net.

  • pred (torch.tensor) – Prediction to evaluate.

Returns

  • logp (torch.tensor) – Log probability of pred according to the predicted distribution.

  • entropy_dist (torch.tensor) – Entropy of the predicted distribution.

  • dist (torch.Distribution) – Action probability distribution.

forward(x, deterministic=False)[source]

Predict distribution parameters from x (obs features) and return predicted values (sampled and clipped), sampled log probability and distribution entropy.

Parameters
  • x (torch.tensor) – Feature maps extracted from environment observations.

  • deterministic (bool) – Whether to randomly sample from predicted distribution or take the mode.

Returns

  • pred (torch.tensor) – Predicted value.

  • clipped_pred (torch.tensor) – Predicted value (clipped to be within [-1, 1] range).

  • logp (torch.tensor) – Log probability of pred according to the predicted distribution.

  • entropy_dist (torch.tensor) – Entropy of the predicted distribution.

  • dist (torch.Distribution) – Action probability distribution.

training: bool

Squashed Gaussian

class pytorchrl.agent.actors.distributions.squashed_gaussian.SquashedGaussian(num_inputs, num_outputs, predict_log_std=True)[source]

Bases: torch.nn.modules.module.Module

Squashed Gaussian probability distribution.

Parameters
  • num_inputs (int) – Size of input feature maps.

  • num_outputs (int) – Number of dims in output space.

  • predict_log_std (bool) – Whether to use a nn.linear layer to predict the output std.

evaluate_pred(x, pred)[source]

Return log prob of pred under the distribution generated from x (obs features). Also return entropy of the generated distribution.

Parameters
  • x (torch.tensor) – obs feature map obtained from a policy_net.

  • pred (torch.tensor) – Prediction to evaluate.

Returns

  • logp (torch.tensor) – Log probability of pred according to the predicted distribution.

  • entropy_dist (torch.tensor) – Entropy of the predicted distribution.

  • dist (torch.Distribution) – Action probability distribution.

forward(x, deterministic=False)[source]

Predict distribution parameters from x (obs features) and return predicted values (sampled and clipped), sampled log probability and distribution entropy.

Parameters
  • x (torch.tensor) – Feature maps extracted from environment observations.

  • deterministic (bool) – Whether to randomly sample from predicted distribution or take the mode.

Returns

  • pred (torch.tensor) – Predicted value.

  • clipped_pred (torch.tensor) – Predicted value (clipped to be within [-1, 1] range).

  • logp (torch.tensor) – Log probability of pred according to the predicted distribution.

  • entropy_dist (torch.tensor) – Entropy of the predicted distribution.

  • dist (torch.Distribution) – Action probability distribution.

training: bool