[RLlib] Use observations (input_dict) for exploration #26437

TedLentsch · 2022-07-11T12:47:22Z

Description

I want to do exploration based on the observations that follow from the environment. Currently, the network output is given as input to the exploration for the greedy action, but the network input (the observations) is not given as input to the exploration. For the random action, I want to give some actions a higher probability of being chosen, because I have a prior about some actions being useless in certain states.

Use case

The availability of the observations in the explore method of an exploration class enables that ...

... priors about the usefulness of certain actions given a state can be used to explore potentially better. This could help train the network.
... constraints can be checked so that only random actions are chosen which are allowed given a certain state. This information can be processed in an action mask (e.g. EpsilonGreedy from the file epsilon_greedy.py only chooses actions with a q-value larger than FLOAT_MIN), but this is not feasible if checking constraints for a single action takes a relatively long time.

Needed Code Changes (torch)

The variable input_dict given as input to the method _compute_action_helper() of the class TorchPolicy (torch_policy.py) must be passed in that method as input to the function self.exploration.get_exploration_action(). The method get_exploration_action() of the Exploration class (exploration.py) must be modified so that it also has as input input_dict: Dict[str, TensorType].

Custom exploration classes that inherit from the class Exploration (exploration.py) will also need to be modified. An example of this is that for the class EpsilonGreedy (epsilon_greedy.py) the import from ray.rllib.utils.typing import Dict needs to be added at the top of the script and the method get_exploration_action() of EpsilonGreedy needs to be modified so that it also has as input input_dict: Dict[str, TensorType].

The text was updated successfully, but these errors were encountered:

kouroshHakha · 2022-08-02T19:33:17Z

I think your proposal makes sense. We are thinking about improving the exploration API anyways. I added your feature request to out design notes. Granted that this is more of a long term effort, are you willing to submit a PR that does this for both Torch and TF?

Stefan-1313 · 2022-08-10T07:48:09Z

In the past I worked with @TeddeVriesLentsch .
I will try to look at this. It may take some time though. @TeddeVriesLentsch , feel free to look at it also if you wish, then I leave it for you.

For Torch we have this working. I have this pushed to my personal forked Ray repo for branch release/1.13.0.
I have practically no experience with TensorFlow but will try to make it work (this is the reason it will take some time, I have to find the time for figuring out how to test with TensorFlow).

It will also be my first PullRequest ever, so if you @kouroshHakha have any tips (or Ray guidelines) let me know!

Stefan-1313 · 2022-08-30T07:57:10Z

@kouroshHakha by the way, I submitted a pull request for this for both PyTorch and TensorFlow. Also not only for EpsilonGreedy and torch_policy but also for the other exploration methods and policies.

Unfortunately, I messed up the sign-off process a little. So I hope someone knows how to fix it. I'm no Git wizard, but if any action is required from my side, please let me know.

TedLentsch added the enhancement Request for new feature and/or capability label Jul 11, 2022

kouroshHakha added P2 Important issue, but not time-critical rllib RLlib related issues feature-request @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. labels Aug 2, 2022

Stefan-1313 mentioned this issue Aug 18, 2022

[RLlib] Use observations (input_dict) for exploration #27979

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Use observations (input_dict) for exploration #26437

[RLlib] Use observations (input_dict) for exploration #26437

TedLentsch commented Jul 11, 2022

kouroshHakha commented Aug 2, 2022

Stefan-1313 commented Aug 10, 2022 •

edited

Loading

Stefan-1313 commented Aug 30, 2022

[RLlib] Use observations (input_dict) for exploration #26437

[RLlib] Use observations (input_dict) for exploration #26437

Comments

TedLentsch commented Jul 11, 2022

Description

Use case

Needed Code Changes (torch)

kouroshHakha commented Aug 2, 2022

Stefan-1313 commented Aug 10, 2022 • edited Loading

Stefan-1313 commented Aug 30, 2022

Stefan-1313 commented Aug 10, 2022 •

edited

Loading