pytorchrl.agent.algorithms package

Subpackages

Submodules

pytorchrl.agent.algorithms.base module

class pytorchrl.agent.algorithms.base.Algorithm[source]

Bases: abc.ABC

Base class for all algorithms

abstract acting_step(obs, rhs, done, deterministic=False, *args)[source]

Algorithm acting function.

Parameters

obs (torch.tensor) – Current world observation
rhs (torch.tensor) – RNN recurrent hidden state (if policy is not a RNN, rhs will contain zeroes).
done (torch.tensor) – 1.0 if current obs is the last one in the episode, else 0.0.
deterministic (bool) – Whether to randomly sample action from predicted distribution or take the mode.

Returns

action (torch.tensor) – Predicted next action.
clipped_action (torch.tensor) – Predicted next action (clipped to be within action space).
rhs (torch.tensor) – Policy recurrent hidden state (if policy is not a RNN, rhs will contain zeroes).
other (dict) – Additional PPO predictions, value score and action log probability, which are not used in other algorithms.

abstract apply_gradients(gradients=None, *args)[source]

Take an optimization step, previously setting new gradients if provided.

Parameters: gradients (list of tensors) – List of actor_critic gradients.

abstract compute_gradients(batch, grads_to_cpu=True, *args)[source]

Compute loss and compute gradients but don’t do optimization step, return gradients instead.

Parameters

data (dict) – data batch containing all required tensors to compute PPO loss.
grads_to_cpu (bool) – If gradient tensor will be sent to another node, need to be in CPU.

Returns

grads (list of tensors) – List of actor_critic gradients.
info (dict) – Dict containing current PPO iteration information.

abstract classmethod create_factory()[source]: Returns a function to create new Algo instances

abstract property gamma: Returns discount factor gamma.

abstract property mini_batch_size: Returns the number of mini batches per epoch.

abstract property num_epochs: Returns the number of times the whole buffer is re-used before data collection proceeds.

abstract property num_mini_batch: Returns the number of times the whole buffer is re-used before data collection proceeds.

abstract property num_test_episodes: Returns the number of episodes to complete when testing.

abstract set_weights(actor_weights, *args)[source]

Update actor with the given weights

Parameters: actor_weights (dict of tensors) – Dict containing actor_critic weights to be set.

abstract property start_steps: Returns the number of steps to collect with initial random policy.

abstract property test_every: Number of network updates between test evaluations.

abstract update_algorithm_parameter(parameter_name, new_parameter_value, *args)[source]

If parameter_name is an attribute of the algorithm, change its value to new_parameter_value value.

Parameters

parameter_name (str) – Attribute name
new_parameter_value (int or float) – New value for parameter_name.

abstract property update_every: Returns the number of data samples collected between network update stages.

pytorchrl.agent.algorithms.utils module

pytorchrl.agent.algorithms.utils.bt(m)[source]

pytorchrl.agent.algorithms.utils.btr(m)[source]

pytorchrl.agent.algorithms.utils.gaussian_kl(mu1, mu2, cov1, cov2)[source]

Decoupled KL between two multivariate gaussian distribution

C_μ = KL(f(x|μi,Σi)||f(x|μ,Σi)) C_Σ = KL(f(x|μi,Σi)||f(x|μi,Σ))

Adapted from https://github.com/daisatojp/mpo/blob/master/mpo/mpo.py

Parameters

mu1 (torch.tensor) – Mean distribution 1 - (B, n).
mu2 (torch.tensor) – Mean distribution 2 - (B, n).
cov1 (torch.tensor) – Covariance matrix distribution 1 - (B, n, n).
cov2 – Covariance matrix distribution 2 - (B, n, n)

Returns

kl_mu (scalar) – Mean term of the KL.
kl_sigma (scalar) – Covariance term of the KL.
ref (https://stanford.edu/~jduchi/projects/general_notes.pdf page.13)

pytorchrl.agent.algorithms.utils.get_gradients(*nets, grads_to_cpu=False)[source]: Gets gradients for all parameters in nets.

pytorchrl.agent.algorithms.utils.set_gradients(*nets, gradients, device)[source]: Sets gradients as the gradient vaues for all parameters in nets.

pytorchrl.agent.algorithms package

Subpackages

Submodules

pytorchrl.agent.algorithms.base module

pytorchrl.agent.algorithms.utils module

Module contents