Model-Based

ModelBased Replay Buffer

class pytorchrl.agent.storages.model_based.mb_buffer.MBReplayBuffer(size, device, actor, algorithm, envs)[source]

Bases: pytorchrl.agent.storages.base.Storage

Storage class for Model Based algorithms.

Implements oll necessary functions to handle data storage and processing in model-based RL algorithms.

size

Storage capacity along time axis.

Type

int

device

CPU or specific GPU where data tensors will be placed and class computations will take place. Should be the same device where the actor model is located.

Type

torch.device

actor

Actor class instance.

Type

Actor

algo

Algorithm class instance

Type

Algorithm

after_gradients(batch, info)[source]

Steps required after updating actor policy model validation_percentage :param batch: Data batch used to compute the gradients. :type batch: dict :param info: Additional relevant info from gradient computation. :type info: dict

Returns

info – info dict updated with relevant info from Storage.

Return type

dict

before_gradients()[source]

Steps required before updating actor policy model.

classmethod create_factory(size)[source]

Returns a function that creates ReplayBuffer instances.

Parameters

size (int) – Storage capacity along time axis.

Returns

create_buffer_instance – creates a new MBReplayBuffer class instance.

Return type

func

generate_batches(num_mini_batch, mini_batch_size=256, num_epochs=1)[source]

Returns a batch iterator to update dynamics model.

Parameters
  • num_mini_batch (int) – Number mini batches per epoch. (not used since MB training)

  • mini_batch_size (int) – Number of samples contained in each mini batch.

  • num_epochs (int) – Number of epochs. (not used since MB training)

Yields

batch (dict) – Generated data batches.

get_all_buffer_data(data_to_cpu=False)[source]

Return all currently stored data. If data_to_cpu, no need to do anything since data tensors are already in cpu memory.

Parameters

data_to_cpu (bool) – Whether or not to move data tensors to cpu memory.

Returns

data – data currently stored in the buffer.

Return type

dict

get_data_slice(start_pos, end_pos)[source]

Makes a copy of all tensors in the buffer between steps start_pos and end_pos.

Parameters
  • start_pos (int) – initial slice position.

  • end_pos (int) – final slice position.

Returns

data – data slice copied from the buffer.

Return type

dict

init_tensors(sample)[source]

Lazy initialization of data tensors from a sample.

Parameters

sample (dict) – Data sample (containing all tensors of an environment transition)

insert_data_slice(new_data)[source]

Appends new_data to currently stored data.

Parameters

new_data (dict) – Dictionary of env transition samples to be added to self.data.

insert_single_tensor_slice(tensor_storage, tensor_key, tensor_values)[source]

Appends tensor_value to buffer dict using tensor_key as key.

Parameters
  • tensor_storage

  • tensor_key (str) – key to use to store the tensor.

  • tensor_values (np.ndarray) – tensor values.

Returns

l – length (time axe) of the tensor added to the buffer.

Return type

int

insert_transition(sample)[source]

Store new transition sample.

Parameters

sample (dict) – Data sample (containing all tensors of an environment transition)

reset()[source]

Set class size and step to zero. If self.actor uses RNNs, add overlap slice of last sequence before reset at the beginning of the storage.

storage_tensors = ('Observation', 'RecurrentHiddenStates', 'Done', 'Action', 'Reward', 'NextObservation', 'NextRecurrentHiddenStates', 'NextDone')
update_storage_parameter(parameter_name, new_parameter_value)[source]

If parameter_name is an attribute of the algorithm, change its value to new_parameter_value value.

Parameters
  • parameter_name (str) – Attribute name

  • new_parameter_value (int or float) – New value for parameter_name.