Model-Based

ModelBased Replay Buffer

class pytorchrl.agent.storages.model_based.mb_buffer.MBReplayBuffer(size, device, actor, algorithm, envs)[source]

Bases: pytorchrl.agent.storages.base.Storage

Storage class for Model Based algorithms.

Implements oll necessary functions to handle data storage and processing in model-based RL algorithms.

size

Storage capacity along time axis.

Type: int

device

CPU or specific GPU where data tensors will be placed and class computations will take place. Should be the same device where the actor model is located.

Type: torch.device

actor

Actor class instance.

Type: Actor

algo

Algorithm class instance

Type: Algorithm

after_gradients(batch, info)[source]

Steps required after updating actor policy model validation_percentage :param batch: Data batch used to compute the gradients. :type batch: dict :param info: Additional relevant info from gradient computation. :type info: dict

Returns: info – info dict updated with relevant info from Storage.
Return type: dict

before_gradients()[source]: Steps required before updating actor policy model.

classmethod create_factory(size)[source]

Returns a function that creates ReplayBuffer instances.

Parameters: size (int) – Storage capacity along time axis.
Returns: create_buffer_instance – creates a new MBReplayBuffer class instance.
Return type: func

generate_batches(num_mini_batch, mini_batch_size=256, num_epochs=1)[source]

Returns a batch iterator to update dynamics model.

Parameters

num_mini_batch (int) – Number mini batches per epoch. (not used since MB training)
mini_batch_size (int) – Number of samples contained in each mini batch.
num_epochs (int) – Number of epochs. (not used since MB training)

Yields

batch (dict) – Generated data batches.

get_all_buffer_data(data_to_cpu=False)[source]

Return all currently stored data. If data_to_cpu, no need to do anything since data tensors are already in cpu memory.

Parameters: data_to_cpu (bool) – Whether or not to move data tensors to cpu memory.
Returns: data – data currently stored in the buffer.
Return type: dict

get_data_slice(start_pos, end_pos)[source]

Makes a copy of all tensors in the buffer between steps start_pos and end_pos.

Parameters

start_pos (int) – initial slice position.
end_pos (int) – final slice position.

Returns

data – data slice copied from the buffer.

Return type

dict

init_tensors(sample)[source]

Lazy initialization of data tensors from a sample.

Parameters: sample (dict) – Data sample (containing all tensors of an environment transition)

insert_data_slice(new_data)[source]

Appends new_data to currently stored data.

Parameters: new_data (dict) – Dictionary of env transition samples to be added to self.data.

insert_single_tensor_slice(tensor_storage, tensor_key, tensor_values)[source]

Appends tensor_value to buffer dict using tensor_key as key.

Parameters

tensor_storage –
tensor_key (str) – key to use to store the tensor.
tensor_values (np.ndarray) – tensor values.

Returns

l – length (time axe) of the tensor added to the buffer.

Return type

int

insert_transition(sample)[source]

Store new transition sample.

Parameters: sample (dict) – Data sample (containing all tensors of an environment transition)

reset()[source]: Set class size and step to zero. If self.actor uses RNNs, add overlap slice of last sequence before reset at the beginning of the storage.

storage_tensors = ('Observation', 'RecurrentHiddenStates', 'Done', 'Action', 'Reward', 'NextObservation', 'NextRecurrentHiddenStates', 'NextDone')

update_storage_parameter(parameter_name, new_parameter_value)[source]

If parameter_name is an attribute of the algorithm, change its value to new_parameter_value value.

Parameters

parameter_name (str) – Attribute name
new_parameter_value (int or float) – New value for parameter_name.