pytorchrl.envs.obstacle_tower package

Submodules

pytorchrl.envs.obstacle_tower.obstacle_tower_env_factory module

pytorchrl.envs.obstacle_tower.utils module

pytorchrl.envs.obstacle_tower.utils.box_is_placed(state)[source]

Given a state returns True if the box is placed on the platform. Could be useful for reward scaling.

pytorchrl.envs.obstacle_tower.utils.box_location(state)[source]

Returns a tuple: boolean ‘unplaced box is visible’ and ‘center of it’.

pytorchrl.envs.obstacle_tower.utils.place_location(state)[source]

Returns a tuple: boolean ‘if placing platform is visible’ and ‘center of it’. Some lighting issues might occur.

pytorchrl.envs.obstacle_tower.wrappers module

class pytorchrl.envs.obstacle_tower.wrappers.BasicObstacleEnv(env, min_floor, max_floor, seed_list=[])[source]

Bases: gym.core.Wrapper

reset(**kwargs)[source]

Resets the environment to an initial state and returns an initial observation.

Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.

Returns

the initial observation.

Return type

observation (object)

step(action)[source]

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Parameters

action (object) – an action provided by the agent

Returns

agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

Return type

observation (object)

class pytorchrl.envs.obstacle_tower.wrappers.BasicObstacleEnvTest(env, min_floor, max_floor, seed_list=[1001, 1002, 1003, 1004, 1005])[source]

Bases: gym.core.Wrapper

reset(**kwargs)[source]

Resets the environment to an initial state and returns an initial observation.

Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.

Returns

the initial observation.

Return type

observation (object)

step(action)[source]

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Parameters

action (object) – an action provided by the agent

Returns

agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

Return type

observation (object)

class pytorchrl.envs.obstacle_tower.wrappers.ReducedActionEnv(env, num_actions=8)[source]

Bases: gym.core.Wrapper

class pytorchrl.envs.obstacle_tower.wrappers.RewardShapeObstacleEnv(env, killed_reward=2)[source]

Bases: gym.core.Wrapper

reset(**kwargs)[source]

Resets the environment to an initial state and returns an initial observation.

Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.

Returns

the initial observation.

Return type

observation (object)

step(action)[source]

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Parameters

action (object) – an action provided by the agent

Returns

agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

Return type

observation (object)

Module contents