Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect scheduler for MDP expected rewards in the infinity states #683

Open
TheGreatfpmK opened this issue Mar 6, 2025 · 1 comment
Open

Comments

@TheGreatfpmK
Copy link
Contributor

We noticed that the scheduler for infinity states in MDPs when computing expected rewards is sometimes incorrect. I'm attaching a simple MDP in PRISM inf-reward-bad-scheduler-mdp.zip. When you run model checking with property Rmax=? [F "goal"] it correctly decides the value to be infinity, however, if you extract the scheduler and apply it to the MDP you get a DTMC where the same property doesn't give you infinity anymore. This is because the scheduler chooses the action a3 instead of a4, which would ensure that the state s=2 is a trap state.

From our investigation, the culprit is the following function:

void computeSchedulerRewInf(storm::storage::BitVector const& rewInfStates, storm::storage::SparseMatrix<T> const& transitionMatrix,
storm::storage::SparseMatrix<T> const& backwardTransitions, storm::storage::Scheduler<SchedulerValueType>& scheduler) {
// Get the states from which we can never exit the rewInfStates, i.e. the states satisfying Pmax=1 [ G "rewInfStates"]
// Also set a corresponding choice for all those states
storm::storage::BitVector trapStates(rewInfStates.size(), false);
auto const& nondeterministicChoiceIndices = transitionMatrix.getRowGroupIndices();
for (auto state : rewInfStates) {
STORM_LOG_ASSERT(nondeterministicChoiceIndices[state + 1] - nondeterministicChoiceIndices[state] > 0,
"Expected at least one action enabled in state " << state);
for (uint_fast64_t choice = nondeterministicChoiceIndices[state]; choice < nondeterministicChoiceIndices[state + 1]; ++choice) {
auto const& row = transitionMatrix.getRow(choice);
if (std::all_of(row.begin(), row.end(), [&rewInfStates](auto const& entry) { return rewInfStates.get(entry.getColumn()); })) {
trapStates.set(state, true);
for (uint_fast64_t memState = 0; memState < scheduler.getNumberOfMemoryStates(); ++memState) {
scheduler.setChoice(choice - nondeterministicChoiceIndices[state], state, memState);
}
break;
}
}
}
// All remaining rewInfStates must reach a trapState with positive probability
computeSchedulerProbGreater0E(transitionMatrix, backwardTransitions, rewInfStates, trapStates, scheduler);

The comment in the function is correct on how this should be computed, but the code is wrong. For me, the trap states should only be those infinity states which are part of some end component where every state is an infinity state. Here line 610 only checks if all the successors of the selected choice are infinity states but does not check whether they are all part of end component.

@volkm
Copy link
Contributor

volkm commented Mar 6, 2025

Related to #298

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants