Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Use observations (input_dict) for exploration #26437

Open
TedLentsch opened this issue Jul 11, 2022 · 3 comments
Open

[RLlib] Use observations (input_dict) for exploration #26437

TedLentsch opened this issue Jul 11, 2022 · 3 comments
Labels
@author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. enhancement Request for new feature and/or capability feature-request P2 Important issue, but not time-critical rllib RLlib related issues

Comments

@TedLentsch
Copy link

Description

I want to do exploration based on the observations that follow from the environment. Currently, the network output is given as input to the exploration for the greedy action, but the network input (the observations) is not given as input to the exploration. For the random action, I want to give some actions a higher probability of being chosen, because I have a prior about some actions being useless in certain states.

Use case

The availability of the observations in the explore method of an exploration class enables that ...

  1. ... priors about the usefulness of certain actions given a state can be used to explore potentially better. This could help train the network.
  2. ... constraints can be checked so that only random actions are chosen which are allowed given a certain state. This information can be processed in an action mask (e.g. EpsilonGreedy from the file epsilon_greedy.py only chooses actions with a q-value larger than FLOAT_MIN), but this is not feasible if checking constraints for a single action takes a relatively long time.

Needed Code Changes (torch)

The variable input_dict given as input to the method _compute_action_helper() of the class TorchPolicy (torch_policy.py) must be passed in that method as input to the function self.exploration.get_exploration_action(). The method get_exploration_action() of the Exploration class (exploration.py) must be modified so that it also has as input input_dict: Dict[str, TensorType].

Custom exploration classes that inherit from the class Exploration (exploration.py) will also need to be modified. An example of this is that for the class EpsilonGreedy (epsilon_greedy.py) the import from ray.rllib.utils.typing import Dict needs to be added at the top of the script and the method get_exploration_action() of EpsilonGreedy needs to be modified so that it also has as input input_dict: Dict[str, TensorType].

@TedLentsch TedLentsch added the enhancement Request for new feature and/or capability label Jul 11, 2022
@kouroshHakha
Copy link
Contributor

I think your proposal makes sense. We are thinking about improving the exploration API anyways. I added your feature request to out design notes. Granted that this is more of a long term effort, are you willing to submit a PR that does this for both Torch and TF?

@kouroshHakha kouroshHakha added P2 Important issue, but not time-critical rllib RLlib related issues feature-request @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. labels Aug 2, 2022
@Stefan-1313
Copy link

Stefan-1313 commented Aug 10, 2022

In the past I worked with @TeddeVriesLentsch .
I will try to look at this. It may take some time though. @TeddeVriesLentsch , feel free to look at it also if you wish, then I leave it for you.

For Torch we have this working. I have this pushed to my personal forked Ray repo for branch release/1.13.0.
I have practically no experience with TensorFlow but will try to make it work (this is the reason it will take some time, I have to find the time for figuring out how to test with TensorFlow).

It will also be my first PullRequest ever, so if you @kouroshHakha have any tips (or Ray guidelines) let me know!

@Stefan-1313
Copy link

@kouroshHakha by the way, I submitted a pull request for this for both PyTorch and TensorFlow. Also not only for EpsilonGreedy and torch_policy but also for the other exploration methods and policies.

Unfortunately, I messed up the sign-off process a little. So I hope someone knows how to fix it. I'm no Git wizard, but if any action is required from my side, please let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. enhancement Request for new feature and/or capability feature-request P2 Important issue, but not time-critical rllib RLlib related issues
Projects
None yet
Development

No branches or pull requests

3 participants