pytorchrl.scheme.collection package

Submodules

pytorchrl.scheme.collection.c_worker module

class pytorchrl.scheme.collection.c_worker.CWorker(index_worker, index_parent, algo_factory, actor_factory, storage_factory, fraction_samples=1.0, compress_data_to_send=False, train_envs_factory=<function CWorker.<lambda>>, test_envs_factory=<function CWorker.<lambda>>, initial_weights=None, device=None)[source]

Bases: pytorchrl.scheme.base.worker.Worker

Worker class handling data collection.

This class wraps an actor instance, a storage class instance and a train and a test vector environments. It collects data samples, sends them and and evaluates network versions.

Parameters

index_worker (int) – Worker index.
index_worker – Index of gradient worker in charge of this data collection worker.
algo_factory (func) – A function that creates an algorithm class.
actor_factory (func) – A function that creates a policy.
storage_factory (func) – A function that create a rollouts storage.
fraction_samples (float) – Minimum fraction of samples required to stop if collection is synchronously coordinated and most workers have finished their collection task.
compress_data_to_send (bool) – Whether or not to compress data before sending it to grad worker.
train_envs_factory (func) – A function to create train environments.
test_envs_factory (func) – A function to create test environments.
initial_weights (ray object ID) – Initial model weights.
device (str) – “cpu” or specific GPU “cuda:number`” to use for computation.

index_worker

Index assigned to this worker.

Type: int

fraction_samples

Minimum fraction of samples required to stop if collection is synchronously coordinated and most workers have finished their collection task.

Type: float

device

CPU or specific GPU to use for computation.

Type: torch.device

compress_data_to_send

Whether or not to compress data before sending it to grad worker.

Type: bool

actor

An actor class instance.

Type: Actor

algo

An algorithm class instance.

Type: Algo

envs_train

A VecEnv class instance with the train environments.

Type: VecEnv

envs_test

A VecEnv class instance with the test environments.

Type: VecEnv

storage

A Storage class instance.

Type: Storage

iter

Number of times samples have been collected and sent.

Type: int

actor_version

Number of times the current actor version been has been updated.

Type: int

update_every

Number of data samples to collect between network update stages.

Type: int

obs

Latest train environment observation.

Type: torch.tensor

rhs

Latest policy recurrent hidden state.

Type: torch.tensor

done

Latest train environment done flag.

Type: torch.tensor

collect_data(listen_to=[], data_to_cpu=True)[source]

Perform a data collection operation, returning rollouts and other relevant information about the process.

Parameters

listen_to (list) – List of keywords to listen to trigger early stopping during collection.

Returns

data (dict) – Collected train data samples.
info (dict) – Additional relevant information about the collection operation.

collect_train_data(num_steps=None, listen_to=[])[source]

Collect train data from interactions with the environments.

Parameters

num_steps (int) – Target number of train environment steps to take.
listen_to (list) –

Returns

col_time (float) – Time, in seconds, spent in this operation.
train_perf (float) – Average accumulated reward over recent train episodes.

evaluate()[source]

Test current actor version in self.envs_test.

Returns: mean_test_perf – Average accumulated reward over all tested episodes.
Return type: float

replace_agent_component(component_name, new_component_factory)[source]

If component_name is an attribute of c_worker, replaces it with the component created by new_component_factory.

Parameters

component_name (str) – Worker component name
new_component_factory (func) – Function to create an instance of the new component.

set_weights(actor_weights)[source]

Update the worker actor version with provided weights.

Parameters: actor_weights (dict of tensors) – Dict containing actor weights to be set.

stop()[source]: Stop all processes

update_algorithm_parameter(parameter_name, new_parameter_value)[source]

If parameter_name is an attribute of self.algo, change its value to new_parameter_value value.

Parameters

parameter_name (str) – Algorithm attribute name
new_parameter_value (float) – Algorithm new parameter value.

update_storage_parameter(parameter_name, new_parameter_value)[source]

If parameter_name is an attribute of self.storage, change its value to new_parameter_value value.

Parameters

parameter_name (str) – Storage attribute name
new_parameter_value (float) – Storage new parameter value.

pytorchrl.scheme.collection.c_worker_set module

class pytorchrl.scheme.collection.c_worker_set.CWorkerSet(num_workers, index_parent, algo_factory, actor_factory, storage_factory, local_device=None, initial_weights=None, fraction_samples=1.0, total_parent_workers=0, compress_data_to_send=False, train_envs_factory=<function CWorkerSet.<lambda>>, test_envs_factory=<function CWorkerSet.<lambda>>, worker_remote_config={'memory': 5368709120, 'num_cpus': 1, 'num_gpus': 0.2, 'object_store_memory': 2147483648})[source]

Bases: pytorchrl.scheme.base.worker_set.WorkerSet

Class to better handle the operations of ensembles of CWorkers.

Parameters

num_workers (int) – Number of remote workers in the worker set.
index_parent (int) – Worker index of parent gradient worker.
total_parent_workers (int) – Total number of gradient worker in the training scheme.
algo_factory (func) – A function that creates an algorithm class.
actor_factory (func) – A function that creates a policy.
storage_factory (func) – A function that create a rollouts storage.
train_envs_factory (func) – A function to create train environments.
local_device (str) – “cpu” or specific GPU “cuda:number`” to use for computation.
initial_weights (ray object ID) – Initial model weights.
fraction_samples – Minimum fraction of samples required to stop if collection is synchronously coordinated and most workers have finished their collection task.
compress_data_to_send (bool) – Whether or not to compress data before sending it to grad worker.
test_envs_factory (func) – A function to create test environments.
worker_remote_config (dict) – Ray resource specs for the remote workers.

worker_class

Worker class to be instantiated to create Ray remote actors.

Type: python class

remote_config

Ray resource specs for the remote workers.

Type: dict

worker_params

Keyword arguments of the worker_class.

Type: dict

num_workers

Number of remote workers in the worker set.

Type: int

classmethod create_factory(num_workers, algo_factory, actor_factory, storage_factory, test_envs_factory, train_envs_factory, total_parent_workers=0, col_fraction_samples=1.0, compress_data_to_send=False, col_worker_resources={'memory': 5368709120, 'num_cpus': 1, 'num_gpus': 0.2, 'object_store_memory': 2147483648})[source]

Returns a function to create new CWorkerSet instances.

Parameters

num_workers (int) – Number of remote workers in the worker set.
algo_factory (func) – A function that creates an algorithm class.
actor_factory (func) – A function that creates a policy.
storage_factory (func) – A function that create a rollouts storage.
train_envs_factory (func) – A function to create train environments.
col_fraction_samples – Minimum fraction of samples required to stop if collection is synchronously coordinated and most workers have finished their collection task.
test_envs_factory (func) – A function to create test environments.
total_parent_workers (int) – Total number of gradient worker in the training scheme.
col_worker_resources (dict) – Ray resource specs for the remote workers.
compress_data_to_send (bool) – Whether or not to compress data before sending it to grad worker.

Returns

collection_worker_set_factory – creates a new CWorkerSet class instance.

Return type

func

pytorchrl.scheme.collection package

Submodules

pytorchrl.scheme.collection.c_worker module

pytorchrl.scheme.collection.c_worker_set module

Module contents