pytorchrl.scheme.collection package
Submodules
pytorchrl.scheme.collection.c_worker module
- class pytorchrl.scheme.collection.c_worker.CWorker(index_worker, index_parent, algo_factory, actor_factory, storage_factory, fraction_samples=1.0, compress_data_to_send=False, train_envs_factory=<function CWorker.<lambda>>, test_envs_factory=<function CWorker.<lambda>>, initial_weights=None, device=None)[source]
Bases:
pytorchrl.scheme.base.worker.WorkerWorker class handling data collection.
This class wraps an actor instance, a storage class instance and a train and a test vector environments. It collects data samples, sends them and and evaluates network versions.
- Parameters
index_worker (int) – Worker index.
index_worker – Index of gradient worker in charge of this data collection worker.
algo_factory (func) – A function that creates an algorithm class.
actor_factory (func) – A function that creates a policy.
storage_factory (func) – A function that create a rollouts storage.
fraction_samples (float) – Minimum fraction of samples required to stop if collection is synchronously coordinated and most workers have finished their collection task.
compress_data_to_send (bool) – Whether or not to compress data before sending it to grad worker.
train_envs_factory (func) – A function to create train environments.
test_envs_factory (func) – A function to create test environments.
initial_weights (ray object ID) – Initial model weights.
device (str) – “cpu” or specific GPU “cuda:number`” to use for computation.
- index_worker
Index assigned to this worker.
- Type
int
- fraction_samples
Minimum fraction of samples required to stop if collection is synchronously coordinated and most workers have finished their collection task.
- Type
float
- device
CPU or specific GPU to use for computation.
- Type
torch.device
- compress_data_to_send
Whether or not to compress data before sending it to grad worker.
- Type
bool
- algo
An algorithm class instance.
- Type
Algo
- iter
Number of times samples have been collected and sent.
- Type
int
- actor_version
Number of times the current actor version been has been updated.
- Type
int
- update_every
Number of data samples to collect between network update stages.
- Type
int
- obs
Latest train environment observation.
- Type
torch.tensor
- rhs
Latest policy recurrent hidden state.
- Type
torch.tensor
- done
Latest train environment done flag.
- Type
torch.tensor
- collect_data(listen_to=[], data_to_cpu=True)[source]
Perform a data collection operation, returning rollouts and other relevant information about the process.
- Parameters
listen_to (list) – List of keywords to listen to trigger early stopping during collection.
- Returns
data (dict) – Collected train data samples.
info (dict) – Additional relevant information about the collection operation.
- collect_train_data(num_steps=None, listen_to=[])[source]
Collect train data from interactions with the environments.
- Parameters
num_steps (int) – Target number of train environment steps to take.
listen_to (list) –
- Returns
col_time (float) – Time, in seconds, spent in this operation.
train_perf (float) – Average accumulated reward over recent train episodes.
- evaluate()[source]
Test current actor version in self.envs_test.
- Returns
mean_test_perf – Average accumulated reward over all tested episodes.
- Return type
float
- replace_agent_component(component_name, new_component_factory)[source]
If component_name is an attribute of c_worker, replaces it with the component created by new_component_factory.
- Parameters
component_name (str) – Worker component name
new_component_factory (func) – Function to create an instance of the new component.
- set_weights(actor_weights)[source]
Update the worker actor version with provided weights.
- Parameters
actor_weights (dict of tensors) – Dict containing actor weights to be set.
pytorchrl.scheme.collection.c_worker_set module
- class pytorchrl.scheme.collection.c_worker_set.CWorkerSet(num_workers, index_parent, algo_factory, actor_factory, storage_factory, local_device=None, initial_weights=None, fraction_samples=1.0, total_parent_workers=0, compress_data_to_send=False, train_envs_factory=<function CWorkerSet.<lambda>>, test_envs_factory=<function CWorkerSet.<lambda>>, worker_remote_config={'memory': 5368709120, 'num_cpus': 1, 'num_gpus': 0.2, 'object_store_memory': 2147483648})[source]
Bases:
pytorchrl.scheme.base.worker_set.WorkerSetClass to better handle the operations of ensembles of CWorkers.
- Parameters
num_workers (int) – Number of remote workers in the worker set.
index_parent (int) – Worker index of parent gradient worker.
total_parent_workers (int) – Total number of gradient worker in the training scheme.
algo_factory (func) – A function that creates an algorithm class.
actor_factory (func) – A function that creates a policy.
storage_factory (func) – A function that create a rollouts storage.
train_envs_factory (func) – A function to create train environments.
local_device (str) – “cpu” or specific GPU “cuda:number`” to use for computation.
initial_weights (ray object ID) – Initial model weights.
fraction_samples – Minimum fraction of samples required to stop if collection is synchronously coordinated and most workers have finished their collection task.
compress_data_to_send (bool) – Whether or not to compress data before sending it to grad worker.
test_envs_factory (func) – A function to create test environments.
worker_remote_config (dict) – Ray resource specs for the remote workers.
- worker_class
Worker class to be instantiated to create Ray remote actors.
- Type
python class
- remote_config
Ray resource specs for the remote workers.
- Type
dict
- worker_params
Keyword arguments of the worker_class.
- Type
dict
- num_workers
Number of remote workers in the worker set.
- Type
int
- classmethod create_factory(num_workers, algo_factory, actor_factory, storage_factory, test_envs_factory, train_envs_factory, total_parent_workers=0, col_fraction_samples=1.0, compress_data_to_send=False, col_worker_resources={'memory': 5368709120, 'num_cpus': 1, 'num_gpus': 0.2, 'object_store_memory': 2147483648})[source]
Returns a function to create new CWorkerSet instances.
- Parameters
num_workers (int) – Number of remote workers in the worker set.
algo_factory (func) – A function that creates an algorithm class.
actor_factory (func) – A function that creates a policy.
storage_factory (func) – A function that create a rollouts storage.
train_envs_factory (func) – A function to create train environments.
col_fraction_samples – Minimum fraction of samples required to stop if collection is synchronously coordinated and most workers have finished their collection task.
test_envs_factory (func) – A function to create test environments.
total_parent_workers (int) – Total number of gradient worker in the training scheme.
col_worker_resources (dict) – Ray resource specs for the remote workers.
compress_data_to_send (bool) – Whether or not to compress data before sending it to grad worker.
- Returns
collection_worker_set_factory – creates a new CWorkerSet class instance.
- Return type
func