Refactor of problem using a map to represent the state space.

JuliaPOMDP · Dec 11, 2023 · 7d338bb · 7d338bb
1 parent 91e71f7
commit 7d338bb
Show file tree

Hide file tree

Showing 28 changed files with 1,095 additions and 561 deletions.
diff --git a/.gitignore b/.gitignore
@@ -1,4 +1,6 @@
 Manifest.toml
 .DS_Store
 docs/build/
-.vscode
+.vscode
+policy.out
+model.pomdpx
diff --git a/Project.toml b/Project.toml
@@ -1,16 +1,20 @@
 name = "TagPOMDPProblem"
 uuid = "8a653263-a1cc-4cf9-849f-f530f6ffc800"
-version = "0.1.1"
+version = "0.2.0"
 
 [deps]
+Graphs = "86223c79-3864-5bf0-83f7-82e725a168b6"
 LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
+MetaGraphs = "626554b9-1ddb-594c-aa3c-2596fe9399a5"
 POMDPTools = "7588e00f-9cae-40de-98dc-e0c70c48cdd7"
 POMDPs = "a93abf59-7444-517b-a68a-c42f96afdd7d"
 Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
-SparseArrays = "2f01184e-e22b-5df5-ae63-d93ebab69eaf"
 
 [compat]
-julia = "1.6"
+Graphs = "1.9"
+LinearAlgebra = "1.6"
+MetaGraphs = "0.7"
+POMDPTools = "0.1"
 POMDPs = "0.9"
 Plots = "1.23"
-POMDPTools = "0.1"
+julia = "1.6"
diff --git a/README.md b/README.md
@@ -3,16 +3,12 @@
 [![Build Status](https://github.com/dylan-asmar/TagPOMDPProblem.jl/actions/workflows/CI.yml/badge.svg)](https://github.com/dylan-asmar/TagPOMDPProblem.jl/actions/workflows/CI.yml)
 [![codecov](https://codecov.io/gh/dylan-asmar/TagPOMDPProblem.jl/branch/main/graph/badge.svg?token=UNYWMYUBDL)](https://codecov.io/gh/dylan-asmar/TagPOMDPProblem.jl)
 [![](https://img.shields.io/badge/docs-stable-blue.svg)](https://dylan-asmar.github.io/TagPOMDPProblem.jl/stable)
-[![](https://img.shields.io/badge/docs-dev-blue.svg)](https://dylan-asmar.github.io/TagPOMDPProblem.jl/dev)
-
 
 The Tag [1] problem with the [POMDPs.jl](https://github.com/JuliaPOMDP/POMDPs.jl) interface. 
 
 [1] Pineau, Joelle et al. “Point-based value iteration: An anytime algorithm for POMDPs.” in *IJCAI* 2003 ([link](https://www.ijcai.org/Proceedings/03/Papers/147.pdf))
 
-
-
-![Tag Demo](./gifs/tag_SARSOP.gif)
+![Tag Demo](./gifs/default.gif)
 
 ## Installation
 Use `]` to get to the package manager to add the package. 
@@ -21,16 +17,14 @@ julia> ]
 pkg> add TagPOMDPProblem
 ```
 
-
 ## Problem description
-The goal of the agent is to tag the opponent by performing the tag action while in the same square as the opponent. 
+The goal of the agent is to tag the opponent by performing the tag action while in the same square as the opponent.
 
 - **States**: position of the robot and target and whether the target has been tagged or not
 
 - **Actions**:  The agent can move in the four cardinal directions or perform the tag action. When performing the `tag` action, the robot does not move. The target moves during `tag` if the robot and target are not at the same location.  
 
-- **Transition model**: The movement of the agent is deterministic based on its selected action. The opponent moves stochastically according to a fixed policy away from the agent. The opponent moves away from the agent `move_away_probability` of the time and stays in the same cell otherwise. The implementation of the opponent’s movement policy varies slightly from the original paper allowing more movement away from the agent, thus making the scenario slightly more challenging. This implementation redistributes the probabilities of actions that result in hitting a wall to other actions that result in moving away. See the [transitions.jl](https://github.com/dylan-asmar/TagPOMDPProblem.jl/blob/b0100ddb39b27990a70668187d6f1de8acb50f1e/src/transition.jl#L11) for details. The transition function from the original implementation can be used by passing `orig_transition_fcn = true`.
-
+- **Transition model**: The movement of the agent is deterministic based on its selected action. The opponent moves stochastically according to a fixed policy away from the agent. The opponent moves away from the agent `move_away_probability` of the time and stays in the same cell otherwise. The implementation of the opponent’s movement policy varies slightly from the original paper allowing more movement away from the agent, thus making the scenario slightly more challenging. This implementation redistributes the probabilities of actions that result in hitting a wall to other actions that result in moving away. See the `transitions.jl` for details. The transition function from the original implementation can be used by passing `transition_option=:orig`.
 
 - **Observation model**: The agent’s position is fully observable but the opponent’s position is unobserved unless both actors are in the same cell. The number of observations is one more than the number of grid squares (e.g. 30 observations for the default problem).
 
@@ -46,31 +40,78 @@ using SARSOP # load a  POMDP Solver
 using POMDPGifs # to make gifs
 
 pomdp = TagPOMDP()
-
 solver = SARSOPSolver(; timeout=150)
 policy = solve(solver, pomdp)
-
-sim = GifSimulator(filename="test.gif", max_steps=50)
+sim = GifSimulator(;
+    filename="default.gif",
+    max_steps=50
+)
 simulate(sim, pomdp, policy)
 ```
 
-![Tag Example](./gifs/test.gif)
+![Tag Example](./gifs/default.gif)
 
 
-### Larger Grid
+### Larger Map
 ```julia
 using POMDPs
 using TagPOMDPProblem
-using SARSOP # load a  POMDP Solver
-using POMDPGifs # to make gifs
+using SARSOP 
+using POMDPGifs
+
+map_str = """
+xxooooooxxxxxxx
+xxooooooxxxxxxx
+xxooooooxxxxxxx
+xxooooooxxxxxxx
+xxooooooxxxxxxx
+ooooooooooooooo
+ooooooooooooooo
+ooooooooooooooo
+ooooooooooooooo
+"""
+pomdp = TagPOMDP(;map_str=map_str)
+solver = SARSOPSolver(; timeout=600)
+policy = solve(solver, pomdp)
+
+sim = GifSimulator(;
+    filename="larger.gif",
+    max_steps=50
+)
+simulate(sim, pomdp, policy)
+```
 
-grid = TagGrid(;bottom_grid=(12, 4), top_grid=(6, 5), top_grid_x_attach_pt=3)
-pomdp = TagPOMDP(;tag_grid=grid)
+![Tag Larger Map Example](./gifs/larger.gif)
+
+### Map with Obstacles
+```julia
+using POMDPs
+using TagPOMDPProblem
+using SARSOP 
+using POMDPGifs
+
+map_str = """
+xxxxxxxxxx
+xoooooooox
+xoxoxxxxox
+xoxoxxxxox
+xoxooooxox
+xoxoxxoxox
+xoxoxxoxox
+xoxoxxoxox
+xoooooooox
+xxxxxxxxxx
+"""
+pomdp = TagPOMDP(;map_str=map_str)
 solver = SARSOPSolver(; timeout=600)
 policy = solve(solver, pomdp)
 
-sim = GifSimulator(filename="test_larger.gif", max_steps=50)
+sim = GifSimulator(;
+    filename="boundary.gif",
+    max_steps=50,
+    rng=Random.MersenneTwister(1)
+)
 simulate(sim, pomdp, policy)
 ```
 
-![Tag Larger Grid Example](./gifs/test_larger.gif)
+![Obstacle Map Example](./gifs/boundary.gif)
diff --git a/docs/src/index.md b/docs/src/index.md
@@ -2,12 +2,122 @@
 
 Tag POMDP problem using [POMDPs.jl](https://github.com/JuliaPOMDP/POMDPs.jl). Original problem was presented in Pineau, Joelle et al. “Point-based value iteration: An anytime algorithm for POMDPs.” IJCAI (2003) ([online here](https://www.ijcai.org/Proceedings/03/Papers/147.pdf)).
 
-The goal of the agent is to tag the opponent by performing the tag action while in the same square as the opponent. The agent can move in the four cardinal directions or perform the tag action. The movement of the agent is deterministic based on its selected action. A reward of `step_penalty` is imposed for each motion action and the tag action results in a `tag_reward` for a successful tag and `tag_penalty` otherwise. The agent’s position is fully observable but the opponent’s position is unobserved unless both actors are in the same cell. The opponent moves stochastically according to a fixed policy away from the agent. The opponent moves away from the agent `move_away_probability` of the time and stays in the same cell otherwise. The implementation of the opponent’s movement policy varies slightly from the original paper allowing more movement away from the agent, thus making the scenario slightly more challenging. This implementation redistributes the probabilities of actions that result in hitting a wall to other actions that result in moving away. The original transition function is available by passing `orig_transition_fcn = true` during creation of the problem.
+The goal of the agent is to tag the opponent by performing the tag action while in the same square as the opponent. The agent can move in the four cardinal directions or perform the tag action. The movement of the agent is deterministic based on its selected action. A reward of `step_penalty` is imposed for each motion action and the tag action results in a `tag_reward` for a successful tag and `tag_penalty` otherwise. The agent’s position is fully observable but the opponent’s position is unobserved unless both actors are in the same cell. The opponent moves stochastically according to a fixed policy away from the agent. The opponent moves away from the agent `move_away_probability` of the time and stays in the same cell otherwise. The implementation of the opponent’s movement policy varies slightly from the original paper allowing more movement away from the agent, thus making the scenario slightly more challenging. This implementation redistributes the probabilities of actions that result in hitting a wall to other actions that result in moving away. The original transition function is available by passing `transition_option=:orig` during creation of the problem.
 
+## Manual Outline
+
+```@contents
+```
+
+## Installation
+Use `]` to get to the package manager to add the package. 
+```julia
+julia> ]
+pkg> add TagPOMDPProblem
+```
+
+## Examples
+
+### Default Problem
+```julia
+using POMDPs
+using TagPOMDPProblem
+using SARSOP # load a  POMDP Solver
+using POMDPGifs # to make gifs
+
+pomdp = TagPOMDP()
+solver = SARSOPSolver(; timeout=150)
+policy = solve(solver, pomdp)
+sim = GifSimulator(;
+    filename="default.gif",
+    max_steps=50
+)
+simulate(sim, pomdp, policy)
+```
+
+![Tag Example](../../gifs/default.gif)
+
+
+### Larger Map
+```julia
+using POMDPs
+using TagPOMDPProblem
+using SARSOP 
+using POMDPGifs
+
+map_str = """
+xxooooooxxxxxxx
+xxooooooxxxxxxx
+xxooooooxxxxxxx
+xxooooooxxxxxxx
+xxooooooxxxxxxx
+ooooooooooooooo
+ooooooooooooooo
+ooooooooooooooo
+ooooooooooooooo
+"""
+pomdp = TagPOMDP(;map_str=map_str)
+solver = SARSOPSolver(; timeout=600)
+policy = solve(solver, pomdp)
+
+sim = GifSimulator(;
+    filename="larger.gif",
+    max_steps=50
+)
+simulate(sim, pomdp, policy)
+```
+
+![Tag Larger Map Example](../../gifs/larger.gif)
+
+### Map with Obstacles
+```julia
+using POMDPs
+using TagPOMDPProblem
+using SARSOP 
+using POMDPGifs
+
+map_str = """
+xxxxxxxxxx
+xoooooooox
+xoxoxxxxox
+xoxoxxxxox
+xoxooooxox
+xoxoxxoxox
+xoxoxxoxox
+xoxoxxoxox
+xoooooooox
+xxxxxxxxxx
+"""
+pomdp = TagPOMDP(;map_str=map_str)
+solver = SARSOPSolver(; timeout=600)
+policy = solve(solver, pomdp)
+
+sim = GifSimulator(;
+    filename="boundary.gif",
+    max_steps=50,
+    rng=Random.MersenneTwister(1)
+)
+simulate(sim, pomdp, policy)
+```
+
+![Obstacle Map Example](../../gifs/boundary.gif)
+
+
+# Exported Functions
 ```@docs
+TagPOMDP
 TagPOMDP()
-TagGrid()
+list_actions()
 TagState
-TagPOMDP
-TagGrid
+POMDPTools.render(TagPOMDP, Any)
+```
+
+# Internal Functions
+```@docs
+create_metagraph_from_map()
+map_str_from_metagraph()
+state_from_index()
+modified_transition()
+orig_transition()
+move_direction()
 ```
diff --git a/gifs/boundary.gif b/gifs/boundary.gif
diff --git a/gifs/default.gif b/gifs/default.gif
diff --git a/gifs/larger.gif b/gifs/larger.gif
diff --git a/gifs/tag_SARSOP.gif b/gifs/tag_SARSOP.gif
diff --git a/gifs/test.gif b/gifs/test.gif
diff --git a/gifs/test_larger.gif b/gifs/test_larger.gif
diff --git a/scripts/check_vs_original.jl b/scripts/check_vs_original.jl
@@ -0,0 +1,40 @@
+"""
+This script compares this implementation of the Tag POMDP with the original implementation
+performance using SARSOP. The original implementation is available at:
+https://bigbird.comp.nus.edu.sg/pmwiki/farm/appl/index.php?n=Main.Repository
+"""
+
+using Pkg
+Pkg.add("SARSOP")
+Pkg.add("StatsBase")
+Pkg.add("ProgressMeter")
+
+using POMDPs
+using POMDPTools
+using TagPOMDPProblem
+using SARSOP
+using StatsBase
+using ProgressMeter
+
+sarsop_timeout = 5
+num_sims = 5000
+
+pomdp = TagPOMDP(; transition_option=:orig)
+solver = SARSOPSolver(; timeout=sarsop_timeout)
+policy = solve(solver, pomdp)
+
+sim = RolloutSimulator(; max_steps=50)
+
+rewards = []
+@showprogress dt=1 desc="Running simulations..." for ii in 1:num_sims
+    r = simulate(sim, pomdp, policy)
+    push!(rewards, r)
+end
+
+# Print out the mean and 95% confidence interval
+println("Original SARSOP performance: $(-6.13) +/- $(0.12)")
+println("Reward (w/ 95% CI): $(mean(rewards)) +/- $(1.96 * std(rewards) / sqrt(length(rewards)))")
+
+Pkg.rm("SARSOP")
+Pkg.rm("StatsBase")
+Pkg.rm("ProgressMeter")