You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Its clear that the highest value is of index 4. Lets simulate the output of the explorer :
if the explorer decide to return the index of the highest authorized value (greedy choice), it will return the related index 3 of the selected value in the subset of the authorized index, and not in the total set of values :
julia>findmax(values, mask)[2]
3
this is exactly the expected behavior of the RLCore function : Base.findmax(A::AbstractVector, mask::AbstractVector{Bool}) = findmax(i -> A[i], view(keys(A), mask))
if the explorer decides to return a random index, it will return the index of the selected value in the original set of values :
julia>rand(rng, findall(mask))
4
The output signification is thus inconsistent. I am still discovering the package, so please let me know if I made a mistake. If this behavior turns out to be a bug, I can propose a simple fix for that.
The text was updated successfully, but these errors were encountered:
While working on my package, I noticed that the the EpsilonGreedyExplorer had a strange behaviour with its output.
Here is the related function :
I seems that depending if the explorer with return the greedy choice (left side) or a random choice (right side), the output will be respectively :
Let me explain the problem with a little example :
Its clear that the highest value is of index 4. Lets simulate the output of the explorer :
this is exactly the expected behavior of the RLCore function :
Base.findmax(A::AbstractVector, mask::AbstractVector{Bool}) = findmax(i -> A[i], view(keys(A), mask))
The output signification is thus inconsistent. I am still discovering the package, so please let me know if I made a mistake. If this behavior turns out to be a bug, I can propose a simple fix for that.
The text was updated successfully, but these errors were encountered: