pytorchrl.agent.algorithms package
Subpackages
Submodules
pytorchrl.agent.algorithms.base module
- class pytorchrl.agent.algorithms.base.Algorithm[source]
Bases:
abc.ABCBase class for all algorithms
- abstract acting_step(obs, rhs, done, deterministic=False, *args)[source]
Algorithm acting function.
- Parameters
obs (torch.tensor) – Current world observation
rhs (torch.tensor) – RNN recurrent hidden state (if policy is not a RNN, rhs will contain zeroes).
done (torch.tensor) – 1.0 if current obs is the last one in the episode, else 0.0.
deterministic (bool) – Whether to randomly sample action from predicted distribution or take the mode.
- Returns
action (torch.tensor) – Predicted next action.
clipped_action (torch.tensor) – Predicted next action (clipped to be within action space).
rhs (torch.tensor) – Policy recurrent hidden state (if policy is not a RNN, rhs will contain zeroes).
other (dict) – Additional PPO predictions, value score and action log probability, which are not used in other algorithms.
- abstract apply_gradients(gradients=None, *args)[source]
Take an optimization step, previously setting new gradients if provided.
- Parameters
gradients (list of tensors) – List of actor_critic gradients.
- abstract compute_gradients(batch, grads_to_cpu=True, *args)[source]
Compute loss and compute gradients but don’t do optimization step, return gradients instead.
- Parameters
data (dict) – data batch containing all required tensors to compute PPO loss.
grads_to_cpu (bool) – If gradient tensor will be sent to another node, need to be in CPU.
- Returns
grads (list of tensors) – List of actor_critic gradients.
info (dict) – Dict containing current PPO iteration information.
- abstract property gamma
Returns discount factor gamma.
- abstract property mini_batch_size
Returns the number of mini batches per epoch.
- abstract property num_epochs
Returns the number of times the whole buffer is re-used before data collection proceeds.
- abstract property num_mini_batch
Returns the number of times the whole buffer is re-used before data collection proceeds.
- abstract property num_test_episodes
Returns the number of episodes to complete when testing.
- abstract set_weights(actor_weights, *args)[source]
Update actor with the given weights
- Parameters
actor_weights (dict of tensors) – Dict containing actor_critic weights to be set.
- abstract property start_steps
Returns the number of steps to collect with initial random policy.
- abstract property test_every
Number of network updates between test evaluations.
- abstract update_algorithm_parameter(parameter_name, new_parameter_value, *args)[source]
If parameter_name is an attribute of the algorithm, change its value to new_parameter_value value.
- Parameters
parameter_name (str) – Attribute name
new_parameter_value (int or float) – New value for parameter_name.
- abstract property update_every
Returns the number of data samples collected between network update stages.
pytorchrl.agent.algorithms.utils module
- pytorchrl.agent.algorithms.utils.gaussian_kl(mu1, mu2, cov1, cov2)[source]
Decoupled KL between two multivariate gaussian distribution
C_μ = KL(f(x|μi,Σi)||f(x|μ,Σi)) C_Σ = KL(f(x|μi,Σi)||f(x|μi,Σ))
Adapted from https://github.com/daisatojp/mpo/blob/master/mpo/mpo.py
- Parameters
mu1 (torch.tensor) – Mean distribution 1 - (B, n).
mu2 (torch.tensor) – Mean distribution 2 - (B, n).
cov1 (torch.tensor) – Covariance matrix distribution 1 - (B, n, n).
cov2 – Covariance matrix distribution 2 - (B, n, n)
- Returns
kl_mu (scalar) – Mean term of the KL.
kl_sigma (scalar) – Covariance term of the KL.
ref (https://stanford.edu/~jduchi/projects/general_notes.pdf page.13)