[RLlib] Use observations (input_dict) for exploration #26437
Labels
@author-action-required
The PR author is responsible for the next step. Remove tag to send back to the reviewer.
enhancement
Request for new feature and/or capability
feature-request
P2
Important issue, but not time-critical
rllib
RLlib related issues
Description
I want to do exploration based on the observations that follow from the environment. Currently, the network output is given as input to the exploration for the greedy action, but the network input (the observations) is not given as input to the exploration. For the random action, I want to give some actions a higher probability of being chosen, because I have a prior about some actions being useless in certain states.
Use case
The availability of the observations in the explore method of an exploration class enables that ...
Needed Code Changes (torch)
The variable
input_dict
given as input to the method_compute_action_helper()
of the classTorchPolicy
(torch_policy.py) must be passed in that method as input to the functionself.exploration.get_exploration_action()
. The methodget_exploration_action()
of theExploration
class (exploration.py) must be modified so that it also has as inputinput_dict: Dict[str, TensorType]
.Custom exploration classes that inherit from the class
Exploration
(exploration.py) will also need to be modified. An example of this is that for the classEpsilonGreedy
(epsilon_greedy.py) the importfrom ray.rllib.utils.typing import Dict
needs to be added at the top of the script and the methodget_exploration_action()
ofEpsilonGreedy
needs to be modified so that it also has as inputinput_dict: Dict[str, TensorType]
.The text was updated successfully, but these errors were encountered: