[Feature] Categorical encoding for action space #593

artkorenev · 2022-10-20T23:00:01Z

Description

Added alternative to one-hot encoding for action spaces with categorical features.

Motivation and Context

Implementing feature #538.

Types of changes

What types of changes does your code introduce? Remove all that do not apply:

New feature (non-breaking change which adds core functionality)

Checklist

I have read the CONTRIBUTION guide (required)
My change requires a change to the documentation.
I have updated the tests accordingly (required for a bug fix or a new feature).
I have updated the documentation accordingly.

Checklist from #538:

Create a DiscreteTensorSpec similar to MultOneHotDiscreteTensorSpec + tests
Change QValueHook to make it possible to use a "categorical" space (or similar) instead of "one_hot"
Do the same with DistributionalQValueHook
Adapt the DQNLoss to these changes
Adapt the DistributionalDQNLoss to these changes
Change the gym specs reader here to make it possible to choose one spec type or the other. One way to go about this would be to have a global variable that controls which categorical encoding must be used.

Current implementation seems to couple different parts of TorchRL quite tightly as there is quite a few places where it is assumed that the action space is one-hot. Perhaps, a more generic action-space object that encapsulates manipulation with values and that is shared across losses, actors, hooks, etc. (as a nn.Module maybe) might be a worth investment in future.

The mode is turned on by export CATEGORICAL_ACTION_ENCODING=True command, and it is turned off by default. Another alternative could be to provide it through the config-file, however this, again, would require a bit more generic solution to configure module that will be passed to all the necessary modules.

codecov · 2022-10-21T00:03:49Z

Codecov Report

Merging #593 (813381a) into main (e1fbf86) will increase coverage by 0.14%.
The diff coverage is 98.68%.

@@            Coverage Diff             @@
##             main     #593      +/-   ##
==========================================
+ Coverage   87.24%   87.39%   +0.14%     
==========================================
  Files         122      124       +2     
  Lines       22532    22784     +252     
==========================================
+ Hits        19658    19911     +253     
+ Misses       2874     2873       -1

Flag	Coverage Δ
linux-cpu	`85.79% <98.68%> (+0.16%)`	⬆️
linux-gpu	`87.17% <98.68%> (+0.14%)`	⬆️
linux-outdeps-gpu	`76.19% <94.07%> (+0.23%)`	⬆️
linux-stable-cpu	`85.77% <98.68%> (+0.16%)`	⬆️
linux-stable-gpu	`87.17% <98.68%> (+0.14%)`	⬆️
macos-cpu	`85.56% <98.35%> (+0.16%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
test/test_cost.py	`96.29% <93.93%> (-0.11%)`	⬇️
torchrl/objectives/dqn.py	`93.07% <96.29%> (+0.34%)`	⬆️
test/test_utils.py	`97.43% <97.43%> (ø)`
test/mocking_classes.py	`97.88% <100.00%> (+0.02%)`	⬆️
test/test_actors.py	`100.00% <100.00%> (ø)`
test/test_env.py	`98.87% <100.00%> (+0.03%)`	⬆️
test/test_helpers.py	`90.25% <100.00%> (+0.02%)`	⬆️
test/test_modules.py	`99.37% <100.00%> (+0.04%)`	⬆️
test/test_tensor_spec.py	`99.53% <100.00%> (+0.02%)`	⬆️
test/test_trainer.py	`97.88% <100.00%> (+0.01%)`	⬆️
... and 8 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

vmoens

We'd like to add the new classes to the docs (this is not done automatically atm).
Just add them in docs/source/data.

vmoens · 2022-10-21T09:59:53Z

torchrl/_utils.py

+        key (str): name of the environment variable.
+    """
+    val = os.environ.get(key, False)
+    if val in ("0", "False", False):


if we consider the option False why not plain 0?

I changed it to "False" by default. Environment variables cannot be set to something other than str.

torchrl/_utils.py

torchrl/data/tensor_specs.py

vmoens · 2022-10-21T10:07:25Z

torchrl/data/tensor_specs.py

+    def __init__(
+        self,
+        n: int,
+        shape: Optional[torch.Size] = torch.Size((1,)),


IIRC we once had a linting issue with this (that seems to have disappeared now).
This is why we usually have a None and if None then shape is replaced by Size([1]).

Noted, changed to that

torchrl/modules/tensordict_module/actors.py

vmoens · 2022-10-21T10:21:41Z

torchrl/objectives/dqn.py

+
+        if _CATEGORICAL_ACTION_ENCODING:
+            batch_size = action.size(0)
+            pred_val_index = pred_val[range(batch_size), action.squeeze(-1)]


I guess we'd like to be able to work with batch sizes that are more than unidimensional.

We can use gather (I think)

x = torch.randn(3, 4, 5) idx = torch.randint(5, (3, 4)) x.gather(-1, idx)

This will index x along the last dimension.

this is not covered by the tests (it should be done in test_costs.py I believe).

Reworked indexing here, thx

Also added tests so should be covered by coverage by now

vmoens · 2022-10-21T10:26:11Z

torchrl/objectives/dqn.py

        pred_val = td_copy.get("action_value")
-        pred_val_index = (pred_val * action).sum(-1)
+
+        if _CATEGORICAL_ACTION_ENCODING:


This is a bit tricky

What if someone explicitly builds a policy that uses categorical actions, but does not set the env variable?

My view of the env variable is more along the line of "a way of easily change a whole training script from categorical to one hot and vice-versa".

But the reversed dependency should not hold, and the environment variable should only be checked when building the network.

One option could be to infer the type of action by looking at the value_network: if it has registered the action spec, then we can use it to infer the type of action we will see. Otherwise, we must ask to the user to tell us (via an arg in the constructor) what kind of action we should be expecting.

Open to other suggestions obviously :)

What if someone explicitly builds a policy that uses categorical actions, but does not set the env variable?

This is exactly what I meant when mentioned that the current solution couples things a bit too much.

Since actions are passed as pure tensors (even though, wrapped in TensorDict) it is either assumed or configured separately for each class what to do with action tensors.

Now I can see two options what we can do here:

Whenever we deal with actions, we either pass action_space in the constructor or derive it from some other entity (e.g. value_network as you proposed). However here we rely on careful configuration of all modules together (in theory, somebody could explicitly configure value_network incompatible with an environment). Another drawback is that we "spreading out" the logic for each case across the code base which can make code a harder to understand (I think in case of one-hot/categorical it is fine, but for future approaches it can blow up).

We can introduce something like ActionSpec which would be an extended version of existing specs, though purely for action processing (deriving actions from value_network, etc.). We instantiate the ActionSpec once and pass its instance everywhere we need encapsulating work with actions. Although, it is hard for me to estimate what interface ActionSpec needs in order to cover all possible cases (I mean globally for all RL cases). And this also would require larger scale refactoring.

tldr; I think 1st option seems better choice for this situation and somewhat follows current code base practices (e.g. QValueHook)

So, I reworked it so it can be specified through config and this eliminates the whole problem with working with environment variables. Now we just specify binary flag that will set the mode of handling discrete gym environments and which also be passed as action space to value network and al the hooks.

vmoens · 2022-10-21T10:26:27Z

torchrl/objectives/dqn.py

-        log_ps_a = log_ps_a.view(batch_size, atoms)  # log p(s_t, a_t; θonline)
+
+        if _CATEGORICAL_ACTION_ENCODING:
+            log_ps_a = action_log_softmax[range(batch_size), :, action.squeeze(-1)]


same comments as above regarding indexing, usage of env variable and test coverage

Reworked this as well with a few tweaks since we work with atoms here.

vmoens · 2022-10-21T10:28:44Z

torchrl/trainers/helpers/models.py

+
+    if _CATEGORICAL_ACTION_ENCODING:
+        actor_kwargs.update({"action_space": "categorical"})
+        out_features = env_specs["action_spec"].space.n


perhaps we could have an arg in the function that indicates if categorical has to be used.
If default (e.g. None) then it falls back on _CATEGORICAL_ACTION_ENCODING.

I changed it here to be inferred from action spec that is set during the env setup.

vmoens

Great! Amazing work, if fits in the library very well now. I think we're there.

A couple of things before we merge this

Can you have a look at the few comments I left?
Can you have a look at the coverage report (some lines are not covered -- don't worry about things that are not covered but are in the test directory).

Oh and one more thing: Can you add the new classes to the doc? Have a look in docs/source/reference

vmoens · 2022-10-25T09:47:37Z

test/test_actors.py

+
+
+def test_qvalue_hook_wrong_action_space():
+    with pytest.raises(ValueError):


let's check that the message match, to make sure we're not capturing the wrong error

Added a message check (had to make it short since it relies on order of the items in dict)

vmoens · 2022-10-25T09:47:43Z

test/test_actors.py

+
+
+def test_distributional_qvalue_hook_wrong_action_space():
+    with pytest.raises(ValueError):


same as above

Fixed as in test_qvalue_hook_wrong_action_space

vmoens · 2022-10-25T09:50:20Z

test/test_actors.py

+)
+
+
+def test_qvalue_hook_wrong_action_space():


could we put all those test_qvalue under a TestQValue class?

I guess that if we add a test_actors.py we should also move some tests there in a future PR

could we put all those test_qvalue under a TestQValue class?
Fixed, thanks

vmoens · 2022-10-25T09:53:21Z

test/test_utils.py

@@ -0,0 +1,62 @@
+import os


Same note to myself: we should move some tests here (e.g. timeit etc)

artkorenev · 2022-10-25T14:51:15Z

Awesome! Thank you for the review!
It seems like the coverage is now fixed completely.

Artem Korenev added 2 commits October 20, 2022 21:34

Added functionality for categorical action space spec

7303b6f

Added tests

8b0d505

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 20, 2022

Fix style

1fc515d

vmoens reviewed Oct 21, 2022

View reviewed changes

vmoens added the enhancement New feature or request label Oct 21, 2022

Artem Korenev added 8 commits October 24, 2022 17:38

Remove environment variable for categorical action space

282af05

Fix binary os variable and add tests

b0e5991

Fix discrete tensor spec + tests

1eb5bde

Refactor gather indexing

6ea6512

small comment

a9dce75

adding tests for q value hooks

4425417

Fix format

54e256d

expanded tests for dqn losses

bb902f3

vmoens approved these changes Oct 25, 2022

View reviewed changes

Artem Korenev added 4 commits October 25, 2022 12:27

improve tests, fix coverage, add docs

37071fe

fix format

40809eb

improve coverage for categorical action

c9cc56f

fix format

813381a

vmoens merged commit 61b80f8 into pytorch:main Oct 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Categorical encoding for action space #593

[Feature] Categorical encoding for action space #593

artkorenev commented Oct 20, 2022 •

edited

Loading

codecov bot commented Oct 21, 2022 •

edited

Loading

vmoens left a comment

vmoens Oct 21, 2022

artkorenev Oct 24, 2022

vmoens Oct 21, 2022

artkorenev Oct 24, 2022

vmoens Oct 21, 2022

vmoens Oct 21, 2022

artkorenev Oct 24, 2022

artkorenev Oct 25, 2022

vmoens Oct 21, 2022

artkorenev Oct 21, 2022

artkorenev Oct 24, 2022

vmoens Oct 21, 2022

artkorenev Oct 24, 2022

vmoens Oct 21, 2022

artkorenev Oct 24, 2022

vmoens left a comment •

edited

Loading

vmoens Oct 25, 2022

artkorenev Oct 25, 2022

vmoens Oct 25, 2022

artkorenev Oct 25, 2022

vmoens Oct 25, 2022

vmoens Oct 25, 2022

artkorenev Oct 25, 2022

vmoens Oct 25, 2022

artkorenev commented Oct 25, 2022



		def test_qvalue_hook_wrong_action_space():
		with pytest.raises(ValueError):



		def test_distributional_qvalue_hook_wrong_action_space():
		with pytest.raises(ValueError):

[Feature] Categorical encoding for action space #593

[Feature] Categorical encoding for action space #593

Conversation

artkorenev commented Oct 20, 2022 • edited Loading

Description

Motivation and Context

Types of changes

Checklist

codecov bot commented Oct 21, 2022 • edited Loading

Codecov Report

vmoens left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vmoens left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

artkorenev commented Oct 25, 2022

artkorenev commented Oct 20, 2022 •

edited

Loading

codecov bot commented Oct 21, 2022 •

edited

Loading

vmoens left a comment •

edited

Loading