policy-autoencoder

This is a simple policy auto-encoder model that learns approximation of state transition and policy functions. These functions are deterministic versions of the corresponding MDP probability distributions. The state transition can be used in model based reinforcement learning for planning complex behaviors. The policy function provides elementary actions from the current state to the next desired state. An agent just "imagines" a next state it wants to be in and apply this function to get there.

The test environment is a grid world consisting of an dot agent with 9 actions: up, up-right, right, down-right, down, down-left, left, up-left, and stop. The training data set is randomly generated initial states, actions and next states, e.g. a sample for 4x4 gird-world:

Initial state	Action	Next state
	Move right

The model consists of two modules:

Encoder that accepts initial state and action and outputs a next state;
Decoder that takes initial state, next state and outputs an action.

The trained model can be decoupled in to:

Encoder module used for prediction of next state - state transition function;
Decoder module used to provide elementary actions to achieve desired state - policy function.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

policy-autoencoder

Files

README.md

Latest commit

History

README.md

File metadata and controls

policy-autoencoder