pytorchrl.envs.atari package
Submodules
pytorchrl.envs.atari.atari_env_factory module
- pytorchrl.envs.atari.atari_env_factory.atari_test_env_factory(env_id, index_col_worker, index_grad_worker, index_env=0, seed=0, frame_stack=1, reward_delay=1, episodic_life=False, clip_rewards=False, max_episode_steps=4500, sticky_actions=False)[source]
Create test Atari environment.
- Parameters
env_id (str) – Environment name.
index_col_worker (int) – Index of the collection worker running this environment.
index_grad_worker (int) – Index of the gradient worker running the collection worker running this environment.
index_env (int) – Index of this environment withing the vector of environments.
seed (int) – Environment random seed.
frame_stack (int) – Observations composed of last frame_stack frames stacked.
reward_delay (int) – Only return accumulated reward every reward_delay steps to simulate sparse reward environment.
max_episode_steps (int) – Maximum number of steps per episode.
sticky_actions (bool) – Randomly repeat last action with probability 0.25.
- Returns
env – Test environment.
- Return type
gym.Env
- pytorchrl.envs.atari.atari_env_factory.atari_train_env_factory(env_id, index_col_worker, index_grad_worker, index_env=0, seed=0, frame_stack=1, reward_delay=1, episodic_life=True, clip_rewards=False, max_episode_steps=4500, sticky_actions=False, embeddings_shape=(11, 8), embeddings_num_values=8, use_domain_knowledge=False, domain_knowledge_embedding='default', double_state=False)[source]
Create train Atari environment.
- Parameters
env_id (str) – Environment name.
index_col_worker (int) – Index of the collection worker running this environment.
index_grad_worker (int) – Index of the gradient worker running the collection worker running this environment.
index_env (int) – Index of this environment withing the vector of environments.
seed (int) – Environment random seed.
frame_stack (int) – Observations composed of last frame_stack frames stacked.
reward_delay (int) – Only return accumulated reward every reward_delay steps to simulate sparse reward environment.
episodic_life (bool) – Whether or not simulate end of episode when losing a life.
clip_rewards (bool) – Whether or not to clip rewards between -1 and 1.
max_episode_steps (int) – Maximum number of steps per episode.
sticky_actions (bool) – Randomly repeat last action with probability 0.25.
embeddings_shape (tuple) – Shape of atari embeddings (if embedding wrappers are used).
embeddings_num_values (int) – Number of values for atari embeddings (if embedding wrappers are used).
use_domain_knowledge (bool) – Whether or not to create embeddings using domain knowledge.
domain_knowledge_embedding (str) – Type of domain knowledge embedding
double_state (boo,) – Whether or not to concatenate last 2 different embeddings.
- Returns
env – Train environment.
- Return type
gym.Env
pytorchrl.envs.atari.utils module
pytorchrl.envs.atari.wrappers module
wrappers from https://github.com/openai/baselines/blob/master/baselines/common/atari_wrappers.py
- class pytorchrl.envs.atari.wrappers.ClipRewardEnv(env)[source]
Bases:
gym.core.Wrapper- reset(**kwargs)[source]
Resets the environment to an initial state and returns an initial observation.
Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.
- Returns
the initial observation.
- Return type
observation (object)
- step(action)[source]
Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
- Parameters
action (object) – an action provided by the agent
- Returns
agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
- Return type
observation (object)
- class pytorchrl.envs.atari.wrappers.EpisodicLifeEnv(env)[source]
Bases:
gym.core.Wrapper- reset(**kwargs)[source]
Reset only when lives are exhausted. This way all states are still reachable even though lives are episodic, and the learner need not know about any of this behind-the-scenes.
- step(action)[source]
Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
- Parameters
action (object) – an action provided by the agent
- Returns
agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
- Return type
observation (object)
- class pytorchrl.envs.atari.wrappers.FireResetEnv(env)[source]
Bases:
gym.core.Wrapper- reset(**kwargs)[source]
Resets the environment to an initial state and returns an initial observation.
Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.
- Returns
the initial observation.
- Return type
observation (object)
- step(ac)[source]
Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
- Parameters
action (object) – an action provided by the agent
- Returns
agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
- Return type
observation (object)
- class pytorchrl.envs.atari.wrappers.MaxAndSkipEnv(env, skip=4)[source]
Bases:
gym.core.Wrapper- reset(**kwargs)[source]
Resets the environment to an initial state and returns an initial observation.
Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.
- Returns
the initial observation.
- Return type
observation (object)
- class pytorchrl.envs.atari.wrappers.MontezumaEmbeddingsEnv(env, embeddings_shape=(11, 8), embeddings_num_values=8, use_domain_knowledge=False, domain_knowledge_embedding='default', double_state=False)[source]
Bases:
gym.core.Wrapper- reset()[source]
Resets the environment to an initial state and returns an initial observation.
Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.
- Returns
the initial observation.
- Return type
observation (object)
- step(action)[source]
Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
- Parameters
action (object) – an action provided by the agent
- Returns
agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
- Return type
observation (object)
- class pytorchrl.envs.atari.wrappers.MontezumaVisitedRoomEnv(env, room_address)[source]
Bases:
gym.core.Wrapper- reset()[source]
Resets the environment to an initial state and returns an initial observation.
Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.
- Returns
the initial observation.
- Return type
observation (object)
- step(action)[source]
Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
- Parameters
action (object) – an action provided by the agent
- Returns
agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
- Return type
observation (object)
- class pytorchrl.envs.atari.wrappers.NoopResetEnv(env, noop_max=30)[source]
Bases:
gym.core.Wrapper- step(ac)[source]
Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
- Parameters
action (object) – an action provided by the agent
- Returns
agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
- Return type
observation (object)
- class pytorchrl.envs.atari.wrappers.PitfallEmbeddingsEnv(env, embeddings_shape=(11, 8), embeddings_num_values=8, use_domain_knowledge=False, double_state=False)[source]
Bases:
gym.core.Wrapper- reset()[source]
Resets the environment to an initial state and returns an initial observation.
Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.
- Returns
the initial observation.
- Return type
observation (object)
- step(action)[source]
Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
- Parameters
action (object) – an action provided by the agent
- Returns
agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
- Return type
observation (object)
- class pytorchrl.envs.atari.wrappers.ScaleRewardEnv(env, scaling=0.001)[source]
Bases:
gym.core.Wrapper- reset(**kwargs)[source]
Resets the environment to an initial state and returns an initial observation.
Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.
- Returns
the initial observation.
- Return type
observation (object)
- step(action)[source]
Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
- Parameters
action (object) – an action provided by the agent
- Returns
agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
- Return type
observation (object)
- class pytorchrl.envs.atari.wrappers.ScaledFloatFrame(env)[source]
Bases:
gym.core.ObservationWrapper
- class pytorchrl.envs.atari.wrappers.StickyActionEnv(env, p=0.25)[source]
Bases:
gym.core.Wrapper- reset()[source]
Resets the environment to an initial state and returns an initial observation.
Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.
- Returns
the initial observation.
- Return type
observation (object)
- step(action)[source]
Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
- Parameters
action (object) – an action provided by the agent
- Returns
agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
- Return type
observation (object)
- class pytorchrl.envs.atari.wrappers.TimeLimit(env, max_episode_steps=None)[source]
Bases:
gym.core.Wrapper- reset(**kwargs)[source]
Resets the environment to an initial state and returns an initial observation.
Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.
- Returns
the initial observation.
- Return type
observation (object)
- step(ac)[source]
Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
- Parameters
action (object) – an action provided by the agent
- Returns
agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
- Return type
observation (object)
- class pytorchrl.envs.atari.wrappers.WarpFrame(env, width=84, height=84, grayscale=True, dict_space_key=None)[source]
Bases:
gym.core.ObservationWrapper