pytorchrl.agent.algorithms.policy_loss_addons package

Submodules

pytorchrl.agent.algorithms.policy_loss_addons.base module

class pytorchrl.agent.algorithms.policy_loss_addons.base.PolicyLossAddOn[source]

Bases: abc.ABC

Base class for all add ons to the policy loss.

abstract compute_loss_term(batch, dist_entropy=None)[source]

Calculates addon loss term.

abstract setup(actor, device)[source]

Initializes the class.

pytorchrl.agent.algorithms.policy_loss_addons.kv_similarity module

class pytorchrl.agent.algorithms.policy_loss_addons.kv_similarity.AttractionKL(behavior_factories, behavior_weights, loss_term_weight=1.0, eps=1e-08)[source]

Bases: pytorchrl.agent.algorithms.policy_loss_addons.base.PolicyLossAddOn

compute_loss_term(data, actor_dist, info)[source]
Calculate and add KL Attraction loss term.
  1. Calculate KL between actor policy and all behaviors.

  2. Compute biased KL similarities and select minimum value.

  3. Multiply the result by the loss_term_weight.

  4. Change sign of the loss term so KL between behaviors is minimized.

Parameters
  • actor_dist (torch.distributions.Distribution) – Actor action distribution for actions in data[prl.OBS]

  • data (dict) – data batch containing all required tensors to compute loss term.

  • info (dict) – Dictionary to store log information.

Returns

  • attraction_kl_loss_term (torch.tensor) – KL loss term.

  • info (dict) – Updated info dict.

setup(actor, device)[source]

Setup addon module by casting behavior weights to torch tensors and initializing agent behaviors.

class pytorchrl.agent.algorithms.policy_loss_addons.kv_similarity.RepulsionKL(behavior_factories, behavior_weights, loss_term_weight=1.0, eps=1e-08)[source]

Bases: pytorchrl.agent.algorithms.policy_loss_addons.base.PolicyLossAddOn

compute_loss_term(data, actor_dist, info)[source]
Calculate and add KL Repulsion loss term.
  1. Calculate KL between actor policy and all behaviors.

  2. Compute weighted sum of KL similarities.

  3. Multiply the result by the loss_term_weight.

  4. Keep sign of the loss term so KL between behaviors is maximized.

Parameters
  • actor_dist (torch.distributions.Distribution) – Actor action distribution for actions in data[prl.OBS]

  • data (dict) – data batch containing all required tensors to compute loss term.

  • info (dict) – Dictionary to store log information.

Returns

  • attraction_kl_loss_term (torch.tensor) – KL loss term.

  • info (dict) – Updated info dict.

setup(actor, device)[source]

Setup addon module by casting behavior weights to torch tensors and initializing agent behaviors.

Module contents