pytorchrl.agent.storages.model_based package
Submodules
pytorchrl.agent.storages.model_based.mb_buffer module
- class pytorchrl.agent.storages.model_based.mb_buffer.MBReplayBuffer(size, device, actor, algorithm, envs)[source]
Bases:
pytorchrl.agent.storages.base.StorageStorage class for Model Based algorithms.
Implements oll necessary functions to handle data storage and processing in model-based RL algorithms.
- size
Storage capacity along time axis.
- Type
int
- device
CPU or specific GPU where data tensors will be placed and class computations will take place. Should be the same device where the actor model is located.
- Type
torch.device
- after_gradients(batch, info)[source]
Steps required after updating actor policy model validation_percentage :param batch: Data batch used to compute the gradients. :type batch: dict :param info: Additional relevant info from gradient computation. :type info: dict
- Returns
info – info dict updated with relevant info from Storage.
- Return type
dict
- classmethod create_factory(size)[source]
Returns a function that creates ReplayBuffer instances.
- Parameters
size (int) – Storage capacity along time axis.
- Returns
create_buffer_instance – creates a new MBReplayBuffer class instance.
- Return type
func
- generate_batches(num_mini_batch, mini_batch_size=256, num_epochs=1)[source]
Returns a batch iterator to update dynamics model.
- Parameters
num_mini_batch (int) – Number mini batches per epoch. (not used since MB training)
mini_batch_size (int) – Number of samples contained in each mini batch.
num_epochs (int) – Number of epochs. (not used since MB training)
- Yields
batch (dict) – Generated data batches.
- get_all_buffer_data(data_to_cpu=False)[source]
Return all currently stored data. If data_to_cpu, no need to do anything since data tensors are already in cpu memory.
- Parameters
data_to_cpu (bool) – Whether or not to move data tensors to cpu memory.
- Returns
data – data currently stored in the buffer.
- Return type
dict
- get_data_slice(start_pos, end_pos)[source]
Makes a copy of all tensors in the buffer between steps start_pos and end_pos.
- Parameters
start_pos (int) – initial slice position.
end_pos (int) – final slice position.
- Returns
data – data slice copied from the buffer.
- Return type
dict
- init_tensors(sample)[source]
Lazy initialization of data tensors from a sample.
- Parameters
sample (dict) – Data sample (containing all tensors of an environment transition)
- insert_data_slice(new_data)[source]
Appends new_data to currently stored data.
- Parameters
new_data (dict) – Dictionary of env transition samples to be added to self.data.
- insert_single_tensor_slice(tensor_storage, tensor_key, tensor_values)[source]
Appends tensor_value to buffer dict using tensor_key as key.
- Parameters
tensor_storage –
tensor_key (str) – key to use to store the tensor.
tensor_values (np.ndarray) – tensor values.
- Returns
l – length (time axe) of the tensor added to the buffer.
- Return type
int
- insert_transition(sample)[source]
Store new transition sample.
- Parameters
sample (dict) – Data sample (containing all tensors of an environment transition)
- reset()[source]
Set class size and step to zero. If self.actor uses RNNs, add overlap slice of last sequence before reset at the beginning of the storage.
- storage_tensors = ('Observation', 'RecurrentHiddenStates', 'Done', 'Action', 'Reward', 'NextObservation', 'NextRecurrentHiddenStates', 'NextDone')
- pytorchrl.agent.storages.model_based.mb_buffer.dim0_reshape(tensor, size)[source]
Reshapes tensor so indices are defined like this:
00, 01, 02, 03, 04, 05, 06, 07, 08, 09, size + 1, …, self.max_size 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, size + 1, …, self.max_size 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, size + 1, …, self.max_size