pytorchrl.agent.algorithms.policy_loss_addons package
Submodules
pytorchrl.agent.algorithms.policy_loss_addons.base module
pytorchrl.agent.algorithms.policy_loss_addons.kv_similarity module
- class pytorchrl.agent.algorithms.policy_loss_addons.kv_similarity.AttractionKL(behavior_factories, behavior_weights, loss_term_weight=1.0, eps=1e-08)[source]
Bases:
pytorchrl.agent.algorithms.policy_loss_addons.base.PolicyLossAddOn- compute_loss_term(data, actor_dist, info)[source]
- Calculate and add KL Attraction loss term.
Calculate KL between actor policy and all behaviors.
Compute biased KL similarities and select minimum value.
Multiply the result by the loss_term_weight.
Change sign of the loss term so KL between behaviors is minimized.
- Parameters
actor_dist (torch.distributions.Distribution) – Actor action distribution for actions in data[prl.OBS]
data (dict) – data batch containing all required tensors to compute loss term.
info (dict) – Dictionary to store log information.
- Returns
attraction_kl_loss_term (torch.tensor) – KL loss term.
info (dict) – Updated info dict.
- class pytorchrl.agent.algorithms.policy_loss_addons.kv_similarity.RepulsionKL(behavior_factories, behavior_weights, loss_term_weight=1.0, eps=1e-08)[source]
Bases:
pytorchrl.agent.algorithms.policy_loss_addons.base.PolicyLossAddOn- compute_loss_term(data, actor_dist, info)[source]
- Calculate and add KL Repulsion loss term.
Calculate KL between actor policy and all behaviors.
Compute weighted sum of KL similarities.
Multiply the result by the loss_term_weight.
Keep sign of the loss term so KL between behaviors is maximized.
- Parameters
actor_dist (torch.distributions.Distribution) – Actor action distribution for actions in data[prl.OBS]
data (dict) – data batch containing all required tensors to compute loss term.
info (dict) – Dictionary to store log information.
- Returns
attraction_kl_loss_term (torch.tensor) – KL loss term.
info (dict) – Updated info dict.