Skip to content

Commit

Permalink
Refactor of problem using a map to represent the state space.
Browse files Browse the repository at this point in the history
  • Loading branch information
dylan-asmar committed Dec 11, 2023
1 parent 91e71f7 commit 7d338bb
Show file tree
Hide file tree
Showing 28 changed files with 1,095 additions and 561 deletions.
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
Manifest.toml
.DS_Store
docs/build/
.vscode
.vscode
policy.out
model.pomdpx
12 changes: 8 additions & 4 deletions Project.toml
Original file line number Diff line number Diff line change
@@ -1,16 +1,20 @@
name = "TagPOMDPProblem"
uuid = "8a653263-a1cc-4cf9-849f-f530f6ffc800"
version = "0.1.1"
version = "0.2.0"

[deps]
Graphs = "86223c79-3864-5bf0-83f7-82e725a168b6"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
MetaGraphs = "626554b9-1ddb-594c-aa3c-2596fe9399a5"
POMDPTools = "7588e00f-9cae-40de-98dc-e0c70c48cdd7"
POMDPs = "a93abf59-7444-517b-a68a-c42f96afdd7d"
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
SparseArrays = "2f01184e-e22b-5df5-ae63-d93ebab69eaf"

[compat]
julia = "1.6"
Graphs = "1.9"
LinearAlgebra = "1.6"
MetaGraphs = "0.7"
POMDPTools = "0.1"
POMDPs = "0.9"
Plots = "1.23"
POMDPTools = "0.1"
julia = "1.6"
81 changes: 61 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,12 @@
[![Build Status](https://github.com/dylan-asmar/TagPOMDPProblem.jl/actions/workflows/CI.yml/badge.svg)](https://github.com/dylan-asmar/TagPOMDPProblem.jl/actions/workflows/CI.yml)
[![codecov](https://codecov.io/gh/dylan-asmar/TagPOMDPProblem.jl/branch/main/graph/badge.svg?token=UNYWMYUBDL)](https://codecov.io/gh/dylan-asmar/TagPOMDPProblem.jl)
[![](https://img.shields.io/badge/docs-stable-blue.svg)](https://dylan-asmar.github.io/TagPOMDPProblem.jl/stable)
[![](https://img.shields.io/badge/docs-dev-blue.svg)](https://dylan-asmar.github.io/TagPOMDPProblem.jl/dev)


The Tag [1] problem with the [POMDPs.jl](https://github.com/JuliaPOMDP/POMDPs.jl) interface.

[1] Pineau, Joelle et al. “Point-based value iteration: An anytime algorithm for POMDPs.” in *IJCAI* 2003 ([link](https://www.ijcai.org/Proceedings/03/Papers/147.pdf))



![Tag Demo](./gifs/tag_SARSOP.gif)
![Tag Demo](./gifs/default.gif)

## Installation
Use `]` to get to the package manager to add the package.
Expand All @@ -21,16 +17,14 @@ julia> ]
pkg> add TagPOMDPProblem
```


## Problem description
The goal of the agent is to tag the opponent by performing the tag action while in the same square as the opponent.
The goal of the agent is to tag the opponent by performing the tag action while in the same square as the opponent.

- **States**: position of the robot and target and whether the target has been tagged or not

- **Actions**: The agent can move in the four cardinal directions or perform the tag action. When performing the `tag` action, the robot does not move. The target moves during `tag` if the robot and target are not at the same location.

- **Transition model**: The movement of the agent is deterministic based on its selected action. The opponent moves stochastically according to a fixed policy away from the agent. The opponent moves away from the agent `move_away_probability` of the time and stays in the same cell otherwise. The implementation of the opponent’s movement policy varies slightly from the original paper allowing more movement away from the agent, thus making the scenario slightly more challenging. This implementation redistributes the probabilities of actions that result in hitting a wall to other actions that result in moving away. See the [transitions.jl](https://github.com/dylan-asmar/TagPOMDPProblem.jl/blob/b0100ddb39b27990a70668187d6f1de8acb50f1e/src/transition.jl#L11) for details. The transition function from the original implementation can be used by passing `orig_transition_fcn = true`.

- **Transition model**: The movement of the agent is deterministic based on its selected action. The opponent moves stochastically according to a fixed policy away from the agent. The opponent moves away from the agent `move_away_probability` of the time and stays in the same cell otherwise. The implementation of the opponent’s movement policy varies slightly from the original paper allowing more movement away from the agent, thus making the scenario slightly more challenging. This implementation redistributes the probabilities of actions that result in hitting a wall to other actions that result in moving away. See the `transitions.jl` for details. The transition function from the original implementation can be used by passing `transition_option=:orig`.

- **Observation model**: The agent’s position is fully observable but the opponent’s position is unobserved unless both actors are in the same cell. The number of observations is one more than the number of grid squares (e.g. 30 observations for the default problem).

Expand All @@ -46,31 +40,78 @@ using SARSOP # load a POMDP Solver
using POMDPGifs # to make gifs

pomdp = TagPOMDP()

solver = SARSOPSolver(; timeout=150)
policy = solve(solver, pomdp)

sim = GifSimulator(filename="test.gif", max_steps=50)
sim = GifSimulator(;
filename="default.gif",
max_steps=50
)
simulate(sim, pomdp, policy)
```

![Tag Example](./gifs/test.gif)
![Tag Example](./gifs/default.gif)


### Larger Grid
### Larger Map
```julia
using POMDPs
using TagPOMDPProblem
using SARSOP # load a POMDP Solver
using POMDPGifs # to make gifs
using SARSOP
using POMDPGifs

map_str = """
xxooooooxxxxxxx
xxooooooxxxxxxx
xxooooooxxxxxxx
xxooooooxxxxxxx
xxooooooxxxxxxx
ooooooooooooooo
ooooooooooooooo
ooooooooooooooo
ooooooooooooooo
"""
pomdp = TagPOMDP(;map_str=map_str)
solver = SARSOPSolver(; timeout=600)
policy = solve(solver, pomdp)

sim = GifSimulator(;
filename="larger.gif",
max_steps=50
)
simulate(sim, pomdp, policy)
```

grid = TagGrid(;bottom_grid=(12, 4), top_grid=(6, 5), top_grid_x_attach_pt=3)
pomdp = TagPOMDP(;tag_grid=grid)
![Tag Larger Map Example](./gifs/larger.gif)

### Map with Obstacles
```julia
using POMDPs
using TagPOMDPProblem
using SARSOP
using POMDPGifs

map_str = """
xxxxxxxxxx
xoooooooox
xoxoxxxxox
xoxoxxxxox
xoxooooxox
xoxoxxoxox
xoxoxxoxox
xoxoxxoxox
xoooooooox
xxxxxxxxxx
"""
pomdp = TagPOMDP(;map_str=map_str)
solver = SARSOPSolver(; timeout=600)
policy = solve(solver, pomdp)

sim = GifSimulator(filename="test_larger.gif", max_steps=50)
sim = GifSimulator(;
filename="boundary.gif",
max_steps=50,
rng=Random.MersenneTwister(1)
)
simulate(sim, pomdp, policy)
```

![Tag Larger Grid Example](./gifs/test_larger.gif)
![Obstacle Map Example](./gifs/boundary.gif)
118 changes: 114 additions & 4 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,122 @@

Tag POMDP problem using [POMDPs.jl](https://github.com/JuliaPOMDP/POMDPs.jl). Original problem was presented in Pineau, Joelle et al. “Point-based value iteration: An anytime algorithm for POMDPs.” IJCAI (2003) ([online here](https://www.ijcai.org/Proceedings/03/Papers/147.pdf)).

The goal of the agent is to tag the opponent by performing the tag action while in the same square as the opponent. The agent can move in the four cardinal directions or perform the tag action. The movement of the agent is deterministic based on its selected action. A reward of `step_penalty` is imposed for each motion action and the tag action results in a `tag_reward` for a successful tag and `tag_penalty` otherwise. The agent’s position is fully observable but the opponent’s position is unobserved unless both actors are in the same cell. The opponent moves stochastically according to a fixed policy away from the agent. The opponent moves away from the agent `move_away_probability` of the time and stays in the same cell otherwise. The implementation of the opponent’s movement policy varies slightly from the original paper allowing more movement away from the agent, thus making the scenario slightly more challenging. This implementation redistributes the probabilities of actions that result in hitting a wall to other actions that result in moving away. The original transition function is available by passing `orig_transition_fcn = true` during creation of the problem.
The goal of the agent is to tag the opponent by performing the tag action while in the same square as the opponent. The agent can move in the four cardinal directions or perform the tag action. The movement of the agent is deterministic based on its selected action. A reward of `step_penalty` is imposed for each motion action and the tag action results in a `tag_reward` for a successful tag and `tag_penalty` otherwise. The agent’s position is fully observable but the opponent’s position is unobserved unless both actors are in the same cell. The opponent moves stochastically according to a fixed policy away from the agent. The opponent moves away from the agent `move_away_probability` of the time and stays in the same cell otherwise. The implementation of the opponent’s movement policy varies slightly from the original paper allowing more movement away from the agent, thus making the scenario slightly more challenging. This implementation redistributes the probabilities of actions that result in hitting a wall to other actions that result in moving away. The original transition function is available by passing `transition_option=:orig` during creation of the problem.

## Manual Outline

```@contents
```

## Installation
Use `]` to get to the package manager to add the package.
```julia
julia> ]
pkg> add TagPOMDPProblem
```

## Examples

### Default Problem
```julia
using POMDPs
using TagPOMDPProblem
using SARSOP # load a POMDP Solver
using POMDPGifs # to make gifs

pomdp = TagPOMDP()
solver = SARSOPSolver(; timeout=150)
policy = solve(solver, pomdp)
sim = GifSimulator(;
filename="default.gif",
max_steps=50
)
simulate(sim, pomdp, policy)
```

![Tag Example](../../gifs/default.gif)


### Larger Map
```julia
using POMDPs
using TagPOMDPProblem
using SARSOP
using POMDPGifs

map_str = """
xxooooooxxxxxxx
xxooooooxxxxxxx
xxooooooxxxxxxx
xxooooooxxxxxxx
xxooooooxxxxxxx
ooooooooooooooo
ooooooooooooooo
ooooooooooooooo
ooooooooooooooo
"""
pomdp = TagPOMDP(;map_str=map_str)
solver = SARSOPSolver(; timeout=600)
policy = solve(solver, pomdp)

sim = GifSimulator(;
filename="larger.gif",
max_steps=50
)
simulate(sim, pomdp, policy)
```

![Tag Larger Map Example](../../gifs/larger.gif)

### Map with Obstacles
```julia
using POMDPs
using TagPOMDPProblem
using SARSOP
using POMDPGifs

map_str = """
xxxxxxxxxx
xoooooooox
xoxoxxxxox
xoxoxxxxox
xoxooooxox
xoxoxxoxox
xoxoxxoxox
xoxoxxoxox
xoooooooox
xxxxxxxxxx
"""
pomdp = TagPOMDP(;map_str=map_str)
solver = SARSOPSolver(; timeout=600)
policy = solve(solver, pomdp)

sim = GifSimulator(;
filename="boundary.gif",
max_steps=50,
rng=Random.MersenneTwister(1)
)
simulate(sim, pomdp, policy)
```

![Obstacle Map Example](../../gifs/boundary.gif)


# Exported Functions
```@docs
TagPOMDP
TagPOMDP()
TagGrid()
list_actions()
TagState
TagPOMDP
TagGrid
POMDPTools.render(TagPOMDP, Any)
```

# Internal Functions
```@docs
create_metagraph_from_map()
map_str_from_metagraph()
state_from_index()
modified_transition()
orig_transition()
move_direction()
```
Binary file added gifs/boundary.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added gifs/default.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added gifs/larger.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed gifs/tag_SARSOP.gif
Binary file not shown.
Binary file removed gifs/test.gif
Binary file not shown.
Binary file removed gifs/test_larger.gif
Binary file not shown.
40 changes: 40 additions & 0 deletions scripts/check_vs_original.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
"""
This script compares this implementation of the Tag POMDP with the original implementation
performance using SARSOP. The original implementation is available at:
https://bigbird.comp.nus.edu.sg/pmwiki/farm/appl/index.php?n=Main.Repository
"""

using Pkg
Pkg.add("SARSOP")
Pkg.add("StatsBase")
Pkg.add("ProgressMeter")

using POMDPs
using POMDPTools
using TagPOMDPProblem
using SARSOP
using StatsBase
using ProgressMeter

sarsop_timeout = 5
num_sims = 5000

pomdp = TagPOMDP(; transition_option=:orig)
solver = SARSOPSolver(; timeout=sarsop_timeout)
policy = solve(solver, pomdp)

sim = RolloutSimulator(; max_steps=50)

rewards = []
@showprogress dt=1 desc="Running simulations..." for ii in 1:num_sims
r = simulate(sim, pomdp, policy)
push!(rewards, r)
end

# Print out the mean and 95% confidence interval
println("Original SARSOP performance: $(-6.13) +/- $(0.12)")
println("Reward (w/ 95% CI): $(mean(rewards)) +/- $(1.96 * std(rewards) / sqrt(length(rewards)))")

Pkg.rm("SARSOP")
Pkg.rm("StatsBase")
Pkg.rm("ProgressMeter")
Loading

0 comments on commit 7d338bb

Please sign in to comment.