pytorchrl.scheme.collection package

Submodules

pytorchrl.scheme.collection.c_worker module

class pytorchrl.scheme.collection.c_worker.CWorker(index_worker, index_parent, algo_factory, actor_factory, storage_factory, fraction_samples=1.0, compress_data_to_send=False, train_envs_factory=<function CWorker.<lambda>>, test_envs_factory=<function CWorker.<lambda>>, initial_weights=None, device=None)[source]

Bases: pytorchrl.scheme.base.worker.Worker

Worker class handling data collection.

This class wraps an actor instance, a storage class instance and a train and a test vector environments. It collects data samples, sends them and and evaluates network versions.

Parameters
  • index_worker (int) – Worker index.

  • index_worker – Index of gradient worker in charge of this data collection worker.

  • algo_factory (func) – A function that creates an algorithm class.

  • actor_factory (func) – A function that creates a policy.

  • storage_factory (func) – A function that create a rollouts storage.

  • fraction_samples (float) – Minimum fraction of samples required to stop if collection is synchronously coordinated and most workers have finished their collection task.

  • compress_data_to_send (bool) – Whether or not to compress data before sending it to grad worker.

  • train_envs_factory (func) – A function to create train environments.

  • test_envs_factory (func) – A function to create test environments.

  • initial_weights (ray object ID) – Initial model weights.

  • device (str) – “cpu” or specific GPU “cuda:number`” to use for computation.

index_worker

Index assigned to this worker.

Type

int

fraction_samples

Minimum fraction of samples required to stop if collection is synchronously coordinated and most workers have finished their collection task.

Type

float

device

CPU or specific GPU to use for computation.

Type

torch.device

compress_data_to_send

Whether or not to compress data before sending it to grad worker.

Type

bool

actor

An actor class instance.

Type

Actor

algo

An algorithm class instance.

Type

Algo

envs_train

A VecEnv class instance with the train environments.

Type

VecEnv

envs_test

A VecEnv class instance with the test environments.

Type

VecEnv

storage

A Storage class instance.

Type

Storage

iter

Number of times samples have been collected and sent.

Type

int

actor_version

Number of times the current actor version been has been updated.

Type

int

update_every

Number of data samples to collect between network update stages.

Type

int

obs

Latest train environment observation.

Type

torch.tensor

rhs

Latest policy recurrent hidden state.

Type

torch.tensor

done

Latest train environment done flag.

Type

torch.tensor

collect_data(listen_to=[], data_to_cpu=True)[source]

Perform a data collection operation, returning rollouts and other relevant information about the process.

Parameters

listen_to (list) – List of keywords to listen to trigger early stopping during collection.

Returns

  • data (dict) – Collected train data samples.

  • info (dict) – Additional relevant information about the collection operation.

collect_train_data(num_steps=None, listen_to=[])[source]

Collect train data from interactions with the environments.

Parameters
  • num_steps (int) – Target number of train environment steps to take.

  • listen_to (list) –

Returns

  • col_time (float) – Time, in seconds, spent in this operation.

  • train_perf (float) – Average accumulated reward over recent train episodes.

evaluate()[source]

Test current actor version in self.envs_test.

Returns

mean_test_perf – Average accumulated reward over all tested episodes.

Return type

float

replace_agent_component(component_name, new_component_factory)[source]

If component_name is an attribute of c_worker, replaces it with the component created by new_component_factory.

Parameters
  • component_name (str) – Worker component name

  • new_component_factory (func) – Function to create an instance of the new component.

set_weights(actor_weights)[source]

Update the worker actor version with provided weights.

Parameters

actor_weights (dict of tensors) – Dict containing actor weights to be set.

stop()[source]

Stop all processes

update_algorithm_parameter(parameter_name, new_parameter_value)[source]

If parameter_name is an attribute of self.algo, change its value to new_parameter_value value.

Parameters
  • parameter_name (str) – Algorithm attribute name

  • new_parameter_value (float) – Algorithm new parameter value.

update_storage_parameter(parameter_name, new_parameter_value)[source]

If parameter_name is an attribute of self.storage, change its value to new_parameter_value value.

Parameters
  • parameter_name (str) – Storage attribute name

  • new_parameter_value (float) – Storage new parameter value.

pytorchrl.scheme.collection.c_worker_set module

class pytorchrl.scheme.collection.c_worker_set.CWorkerSet(num_workers, index_parent, algo_factory, actor_factory, storage_factory, local_device=None, initial_weights=None, fraction_samples=1.0, total_parent_workers=0, compress_data_to_send=False, train_envs_factory=<function CWorkerSet.<lambda>>, test_envs_factory=<function CWorkerSet.<lambda>>, worker_remote_config={'memory': 5368709120, 'num_cpus': 1, 'num_gpus': 0.2, 'object_store_memory': 2147483648})[source]

Bases: pytorchrl.scheme.base.worker_set.WorkerSet

Class to better handle the operations of ensembles of CWorkers.

Parameters
  • num_workers (int) – Number of remote workers in the worker set.

  • index_parent (int) – Worker index of parent gradient worker.

  • total_parent_workers (int) – Total number of gradient worker in the training scheme.

  • algo_factory (func) – A function that creates an algorithm class.

  • actor_factory (func) – A function that creates a policy.

  • storage_factory (func) – A function that create a rollouts storage.

  • train_envs_factory (func) – A function to create train environments.

  • local_device (str) – “cpu” or specific GPU “cuda:number`” to use for computation.

  • initial_weights (ray object ID) – Initial model weights.

  • fraction_samples – Minimum fraction of samples required to stop if collection is synchronously coordinated and most workers have finished their collection task.

  • compress_data_to_send (bool) – Whether or not to compress data before sending it to grad worker.

  • test_envs_factory (func) – A function to create test environments.

  • worker_remote_config (dict) – Ray resource specs for the remote workers.

worker_class

Worker class to be instantiated to create Ray remote actors.

Type

python class

remote_config

Ray resource specs for the remote workers.

Type

dict

worker_params

Keyword arguments of the worker_class.

Type

dict

num_workers

Number of remote workers in the worker set.

Type

int

classmethod create_factory(num_workers, algo_factory, actor_factory, storage_factory, test_envs_factory, train_envs_factory, total_parent_workers=0, col_fraction_samples=1.0, compress_data_to_send=False, col_worker_resources={'memory': 5368709120, 'num_cpus': 1, 'num_gpus': 0.2, 'object_store_memory': 2147483648})[source]

Returns a function to create new CWorkerSet instances.

Parameters
  • num_workers (int) – Number of remote workers in the worker set.

  • algo_factory (func) – A function that creates an algorithm class.

  • actor_factory (func) – A function that creates a policy.

  • storage_factory (func) – A function that create a rollouts storage.

  • train_envs_factory (func) – A function to create train environments.

  • col_fraction_samples – Minimum fraction of samples required to stop if collection is synchronously coordinated and most workers have finished their collection task.

  • test_envs_factory (func) – A function to create test environments.

  • total_parent_workers (int) – Total number of gradient worker in the training scheme.

  • col_worker_resources (dict) – Ray resource specs for the remote workers.

  • compress_data_to_send (bool) – Whether or not to compress data before sending it to grad worker.

Returns

collection_worker_set_factory – creates a new CWorkerSet class instance.

Return type

func

Module contents