pytorchrl.envs.atari package

Submodules

pytorchrl.envs.atari.atari_env_factory module

pytorchrl.envs.atari.atari_env_factory.atari_test_env_factory(env_id, index_col_worker, index_grad_worker, index_env=0, seed=0, frame_stack=1, reward_delay=1, episodic_life=False, clip_rewards=False, max_episode_steps=4500, sticky_actions=False)[source]

Create test Atari environment.

Parameters
  • env_id (str) – Environment name.

  • index_col_worker (int) – Index of the collection worker running this environment.

  • index_grad_worker (int) – Index of the gradient worker running the collection worker running this environment.

  • index_env (int) – Index of this environment withing the vector of environments.

  • seed (int) – Environment random seed.

  • frame_stack (int) – Observations composed of last frame_stack frames stacked.

  • reward_delay (int) – Only return accumulated reward every reward_delay steps to simulate sparse reward environment.

  • max_episode_steps (int) – Maximum number of steps per episode.

  • sticky_actions (bool) – Randomly repeat last action with probability 0.25.

Returns

env – Test environment.

Return type

gym.Env

pytorchrl.envs.atari.atari_env_factory.atari_train_env_factory(env_id, index_col_worker, index_grad_worker, index_env=0, seed=0, frame_stack=1, reward_delay=1, episodic_life=True, clip_rewards=False, max_episode_steps=4500, sticky_actions=False, embeddings_shape=(11, 8), embeddings_num_values=8, use_domain_knowledge=False, domain_knowledge_embedding='default', double_state=False)[source]

Create train Atari environment.

Parameters
  • env_id (str) – Environment name.

  • index_col_worker (int) – Index of the collection worker running this environment.

  • index_grad_worker (int) – Index of the gradient worker running the collection worker running this environment.

  • index_env (int) – Index of this environment withing the vector of environments.

  • seed (int) – Environment random seed.

  • frame_stack (int) – Observations composed of last frame_stack frames stacked.

  • reward_delay (int) – Only return accumulated reward every reward_delay steps to simulate sparse reward environment.

  • episodic_life (bool) – Whether or not simulate end of episode when losing a life.

  • clip_rewards (bool) – Whether or not to clip rewards between -1 and 1.

  • max_episode_steps (int) – Maximum number of steps per episode.

  • sticky_actions (bool) – Randomly repeat last action with probability 0.25.

  • embeddings_shape (tuple) – Shape of atari embeddings (if embedding wrappers are used).

  • embeddings_num_values (int) – Number of values for atari embeddings (if embedding wrappers are used).

  • use_domain_knowledge (bool) – Whether or not to create embeddings using domain knowledge.

  • domain_knowledge_embedding (str) – Type of domain knowledge embedding

  • double_state (boo,) – Whether or not to concatenate last 2 different embeddings.

Returns

env – Train environment.

Return type

gym.Env

pytorchrl.envs.atari.utils module

pytorchrl.envs.atari.utils.imdownscale(state, target_shape=(11, 8), max_pix_value=8)[source]

pytorchrl.envs.atari.wrappers module

wrappers from https://github.com/openai/baselines/blob/master/baselines/common/atari_wrappers.py

class pytorchrl.envs.atari.wrappers.ClipRewardEnv(env)[source]

Bases: gym.core.Wrapper

reset(**kwargs)[source]

Resets the environment to an initial state and returns an initial observation.

Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.

Returns

the initial observation.

Return type

observation (object)

step(action)[source]

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Parameters

action (object) – an action provided by the agent

Returns

agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

Return type

observation (object)

class pytorchrl.envs.atari.wrappers.EpisodicLifeEnv(env)[source]

Bases: gym.core.Wrapper

reset(**kwargs)[source]

Reset only when lives are exhausted. This way all states are still reachable even though lives are episodic, and the learner need not know about any of this behind-the-scenes.

step(action)[source]

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Parameters

action (object) – an action provided by the agent

Returns

agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

Return type

observation (object)

class pytorchrl.envs.atari.wrappers.FireResetEnv(env)[source]

Bases: gym.core.Wrapper

reset(**kwargs)[source]

Resets the environment to an initial state and returns an initial observation.

Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.

Returns

the initial observation.

Return type

observation (object)

step(ac)[source]

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Parameters

action (object) – an action provided by the agent

Returns

agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

Return type

observation (object)

class pytorchrl.envs.atari.wrappers.LazyFrames(frames)[source]

Bases: object

count()[source]
frame(i)[source]
class pytorchrl.envs.atari.wrappers.MaxAndSkipEnv(env, skip=4)[source]

Bases: gym.core.Wrapper

reset(**kwargs)[source]

Resets the environment to an initial state and returns an initial observation.

Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.

Returns

the initial observation.

Return type

observation (object)

step(action)[source]

Repeat action, sum reward, and max over last observations.

class pytorchrl.envs.atari.wrappers.MontezumaEmbeddingsEnv(env, embeddings_shape=(11, 8), embeddings_num_values=8, use_domain_knowledge=False, domain_knowledge_embedding='default', double_state=False)[source]

Bases: gym.core.Wrapper

reset()[source]

Resets the environment to an initial state and returns an initial observation.

Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.

Returns

the initial observation.

Return type

observation (object)

step(action)[source]

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Parameters

action (object) – an action provided by the agent

Returns

agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

Return type

observation (object)

class pytorchrl.envs.atari.wrappers.MontezumaVisitedRoomEnv(env, room_address)[source]

Bases: gym.core.Wrapper

reset()[source]

Resets the environment to an initial state and returns an initial observation.

Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.

Returns

the initial observation.

Return type

observation (object)

step(action)[source]

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Parameters

action (object) – an action provided by the agent

Returns

agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

Return type

observation (object)

class pytorchrl.envs.atari.wrappers.NoopResetEnv(env, noop_max=30)[source]

Bases: gym.core.Wrapper

reset(**kwargs)[source]

Do no-op action for a number of steps in [1, noop_max].

step(ac)[source]

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Parameters

action (object) – an action provided by the agent

Returns

agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

Return type

observation (object)

class pytorchrl.envs.atari.wrappers.PitfallEmbeddingsEnv(env, embeddings_shape=(11, 8), embeddings_num_values=8, use_domain_knowledge=False, double_state=False)[source]

Bases: gym.core.Wrapper

reset()[source]

Resets the environment to an initial state and returns an initial observation.

Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.

Returns

the initial observation.

Return type

observation (object)

step(action)[source]

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Parameters

action (object) – an action provided by the agent

Returns

agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

Return type

observation (object)

class pytorchrl.envs.atari.wrappers.ScaleRewardEnv(env, scaling=0.001)[source]

Bases: gym.core.Wrapper

reset(**kwargs)[source]

Resets the environment to an initial state and returns an initial observation.

Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.

Returns

the initial observation.

Return type

observation (object)

step(action)[source]

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Parameters

action (object) – an action provided by the agent

Returns

agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

Return type

observation (object)

class pytorchrl.envs.atari.wrappers.ScaledFloatFrame(env)[source]

Bases: gym.core.ObservationWrapper

observation(observation)[source]
class pytorchrl.envs.atari.wrappers.StickyActionEnv(env, p=0.25)[source]

Bases: gym.core.Wrapper

reset()[source]

Resets the environment to an initial state and returns an initial observation.

Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.

Returns

the initial observation.

Return type

observation (object)

step(action)[source]

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Parameters

action (object) – an action provided by the agent

Returns

agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

Return type

observation (object)

class pytorchrl.envs.atari.wrappers.TimeLimit(env, max_episode_steps=None)[source]

Bases: gym.core.Wrapper

reset(**kwargs)[source]

Resets the environment to an initial state and returns an initial observation.

Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.

Returns

the initial observation.

Return type

observation (object)

step(ac)[source]

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Parameters

action (object) – an action provided by the agent

Returns

agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

Return type

observation (object)

class pytorchrl.envs.atari.wrappers.WarpFrame(env, width=84, height=84, grayscale=True, dict_space_key=None)[source]

Bases: gym.core.ObservationWrapper

observation(obs)[source]
pytorchrl.envs.atari.wrappers.make_atari(env_id, max_episode_steps=None, sticky_actions=False)[source]
pytorchrl.envs.atari.wrappers.wrap_deepmind(env, episode_life=True, clip_rewards=True, frame_stack=1, scale=False)[source]

Configure environment for DeepMind-style Atari

Module contents