Incorrect scheduler for MDP expected rewards in the infinity states #683

TheGreatfpmK · 2025-03-06T18:00:46Z

We noticed that the scheduler for infinity states in MDPs when computing expected rewards is sometimes incorrect. I'm attaching a simple MDP in PRISM inf-reward-bad-scheduler-mdp.zip. When you run model checking with property Rmax=? [F "goal"] it correctly decides the value to be infinity, however, if you extract the scheduler and apply it to the MDP you get a DTMC where the same property doesn't give you infinity anymore. This is because the scheduler chooses the action a3 instead of a4, which would ensure that the state s=2 is a trap state.

From our investigation, the culprit is the following function:

storm/src/storm/utility/graph.cpp

Lines 599 to 620 in 8dc90e9

    
           void computeSchedulerRewInf(storm::storage::BitVector const& rewInfStates, storm::storage::SparseMatrix<T> const& transitionMatrix, 
        
                                       storm::storage::SparseMatrix<T> const& backwardTransitions, storm::storage::Scheduler<SchedulerValueType>& scheduler) { 
        
               // Get the states from which we can never exit the rewInfStates, i.e. the states satisfying  Pmax=1 [ G "rewInfStates"] 
        
               // Also set a corresponding choice for all those states 
        
               storm::storage::BitVector trapStates(rewInfStates.size(), false); 
        
               auto const& nondeterministicChoiceIndices = transitionMatrix.getRowGroupIndices(); 
        
               for (auto state : rewInfStates) { 
        
                   STORM_LOG_ASSERT(nondeterministicChoiceIndices[state + 1] - nondeterministicChoiceIndices[state] > 0, 
        
                                    "Expected at least one action enabled in state " << state); 
        
                   for (uint_fast64_t choice = nondeterministicChoiceIndices[state]; choice < nondeterministicChoiceIndices[state + 1]; ++choice) { 
        
                       auto const& row = transitionMatrix.getRow(choice); 
        
                       if (std::all_of(row.begin(), row.end(), [&rewInfStates](auto const& entry) { return rewInfStates.get(entry.getColumn()); })) { 
        
                           trapStates.set(state, true); 
        
                           for (uint_fast64_t memState = 0; memState < scheduler.getNumberOfMemoryStates(); ++memState) { 
        
                               scheduler.setChoice(choice - nondeterministicChoiceIndices[state], state, memState); 
        
                           } 
        
                           break; 
        
                       } 
        
                   } 
        
               } 
        
               // All remaining rewInfStates must reach a trapState with positive probability 
        
               computeSchedulerProbGreater0E(transitionMatrix, backwardTransitions, rewInfStates, trapStates, scheduler);

The comment in the function is correct on how this should be computed, but the code is wrong. For me, the trap states should only be those infinity states which are part of some end component where every state is an infinity state. Here line 610 only checks if all the successors of the selected choice are infinity states but does not check whether they are all part of end component.

The text was updated successfully, but these errors were encountered:

volkm · 2025-03-06T20:48:08Z

Related to #298

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect scheduler for MDP expected rewards in the infinity states #683

Incorrect scheduler for MDP expected rewards in the infinity states #683

TheGreatfpmK commented Mar 6, 2025

volkm commented Mar 6, 2025

Incorrect scheduler for MDP expected rewards in the infinity states #683

Incorrect scheduler for MDP expected rewards in the infinity states #683

Comments

TheGreatfpmK commented Mar 6, 2025

volkm commented Mar 6, 2025