pytorchrl.agent.algorithms package

Subpackages

Submodules

pytorchrl.agent.algorithms.base module

class pytorchrl.agent.algorithms.base.Algorithm[source]

Bases: abc.ABC

Base class for all algorithms

abstract acting_step(obs, rhs, done, deterministic=False, *args)[source]

Algorithm acting function.

Parameters
  • obs (torch.tensor) – Current world observation

  • rhs (torch.tensor) – RNN recurrent hidden state (if policy is not a RNN, rhs will contain zeroes).

  • done (torch.tensor) – 1.0 if current obs is the last one in the episode, else 0.0.

  • deterministic (bool) – Whether to randomly sample action from predicted distribution or take the mode.

Returns

  • action (torch.tensor) – Predicted next action.

  • clipped_action (torch.tensor) – Predicted next action (clipped to be within action space).

  • rhs (torch.tensor) – Policy recurrent hidden state (if policy is not a RNN, rhs will contain zeroes).

  • other (dict) – Additional PPO predictions, value score and action log probability, which are not used in other algorithms.

abstract apply_gradients(gradients=None, *args)[source]

Take an optimization step, previously setting new gradients if provided.

Parameters

gradients (list of tensors) – List of actor_critic gradients.

abstract compute_gradients(batch, grads_to_cpu=True, *args)[source]

Compute loss and compute gradients but don’t do optimization step, return gradients instead.

Parameters
  • data (dict) – data batch containing all required tensors to compute PPO loss.

  • grads_to_cpu (bool) – If gradient tensor will be sent to another node, need to be in CPU.

Returns

  • grads (list of tensors) – List of actor_critic gradients.

  • info (dict) – Dict containing current PPO iteration information.

abstract classmethod create_factory()[source]

Returns a function to create new Algo instances

abstract property gamma

Returns discount factor gamma.

abstract property mini_batch_size

Returns the number of mini batches per epoch.

abstract property num_epochs

Returns the number of times the whole buffer is re-used before data collection proceeds.

abstract property num_mini_batch

Returns the number of times the whole buffer is re-used before data collection proceeds.

abstract property num_test_episodes

Returns the number of episodes to complete when testing.

abstract set_weights(actor_weights, *args)[source]

Update actor with the given weights

Parameters

actor_weights (dict of tensors) – Dict containing actor_critic weights to be set.

abstract property start_steps

Returns the number of steps to collect with initial random policy.

abstract property test_every

Number of network updates between test evaluations.

abstract update_algorithm_parameter(parameter_name, new_parameter_value, *args)[source]

If parameter_name is an attribute of the algorithm, change its value to new_parameter_value value.

Parameters
  • parameter_name (str) – Attribute name

  • new_parameter_value (int or float) – New value for parameter_name.

abstract property update_every

Returns the number of data samples collected between network update stages.

pytorchrl.agent.algorithms.utils module

pytorchrl.agent.algorithms.utils.bt(m)[source]
pytorchrl.agent.algorithms.utils.btr(m)[source]
pytorchrl.agent.algorithms.utils.gaussian_kl(mu1, mu2, cov1, cov2)[source]

Decoupled KL between two multivariate gaussian distribution

C_μ = KL(f(x|μi,Σi)||f(x|μ,Σi)) C_Σ = KL(f(x|μi,Σi)||f(x|μi,Σ))

Adapted from https://github.com/daisatojp/mpo/blob/master/mpo/mpo.py

Parameters
  • mu1 (torch.tensor) – Mean distribution 1 - (B, n).

  • mu2 (torch.tensor) – Mean distribution 2 - (B, n).

  • cov1 (torch.tensor) – Covariance matrix distribution 1 - (B, n, n).

  • cov2 – Covariance matrix distribution 2 - (B, n, n)

Returns

  • kl_mu (scalar) – Mean term of the KL.

  • kl_sigma (scalar) – Covariance term of the KL.

  • ref (https://stanford.edu/~jduchi/projects/general_notes.pdf page.13)

pytorchrl.agent.algorithms.utils.get_gradients(*nets, grads_to_cpu=False)[source]

Gets gradients for all parameters in nets.

pytorchrl.agent.algorithms.utils.set_gradients(*nets, gradients, device)[source]

Sets gradients as the gradient vaues for all parameters in nets.

Module contents