Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistency of the EpsilonGreedyExplorer selection function #520

Closed
3rdCore opened this issue Oct 6, 2021 · 1 comment · Fixed by #521
Closed

Inconsistency of the EpsilonGreedyExplorer selection function #520

3rdCore opened this issue Oct 6, 2021 · 1 comment · Fixed by #521
Labels
bug Something isn't working

Comments

@3rdCore
Copy link
Contributor

3rdCore commented Oct 6, 2021

While working on my package, I noticed that the the EpsilonGreedyExplorer had a strange behaviour with its output.

Here is the related function :

function (s::EpsilonGreedyExplorer{<:Any,false})(values, mask)
    ϵ = get_ϵ(s)
    s.is_training && (s.step += 1)
    rand(s.rng) >= ϵ ? findmax(values, mask)[2] : rand(s.rng, findall(mask))
end

I seems that depending if the explorer with return the greedy choice (left side) or a random choice (right side), the output will be respectively :

  • for the greedy choice : the index of the selected value in the subset of the authorized values.
  • for the random choice : the index of the selected value in the original set of values.

Let me explain the problem with a little example :

values = Float32[-0.48240864, 0.07573502, -0.19618785, 0.25742468]
mask = Bool[1, 1, 0, 1]
rng =  MersenneTwister()

Its clear that the highest value is of index 4. Lets simulate the output of the explorer :

  1. if the explorer decide to return the index of the highest authorized value (greedy choice), it will return the related index 3 of the selected value in the subset of the authorized index, and not in the total set of values :
julia> findmax(values, mask)[2]
3

this is exactly the expected behavior of the RLCore function :
Base.findmax(A::AbstractVector, mask::AbstractVector{Bool}) = findmax(i -> A[i], view(keys(A), mask))

  1. if the explorer decides to return a random index, it will return the index of the selected value in the original set of values :
julia> rand(rng, findall(mask))
4

The output signification is thus inconsistent. I am still discovering the package, so please let me know if I made a mistake. If this behavior turns out to be a bug, I can propose a simple fix for that.

@findmyway findmyway added the bug Something isn't working label Oct 7, 2021
@findmyway
Copy link
Member

Thanks for reporting.

That's indeed an unexpected bug.
It seems to be introduced in https://github.com/JuliaLang/julia/pull/41076/files and JuliaLang/Compat.jl#748

The following lines may need to be changed:

function find_all_max(x)
v = maximum(x)
v, findall(==(v), x)
end
function find_all_max(x, mask::AbstractVector{Bool})
v = maximum(view(x, mask))
v, [k for (m, k) in zip(mask, keys(x)) if m && x[k] == v]
end
# !!! watch https://github.com/JuliaLang/julia/pull/35316#issuecomment-622629895
# Base.findmax(f, domain) = mapfoldl(x -> (f(x), x), _rf_findmax, domain)
# _rf_findmax((fm, m), (fx, x)) = isless(fm, fx) ? (fx, x) : (fm, m)
# !!! type piracy
Base.findmax(A::AbstractVector, mask::AbstractVector{Bool}) =
findmax(i -> A[i], view(keys(A), mask))

A PR is welcomed! 🤗

@findmyway findmyway linked a pull request Oct 7, 2021 that will close this issue
@3rdCore 3rdCore closed this as completed Oct 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants