Update `RescaleAction` and `RescaleObservation` for `np.inf` bounds #1095

TimSchneider42 · 2024-06-28T08:53:25Z

Description

Fixes #1092

Type of change

Please delete options that are not relevant.

Documentation only change (no code changed)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

Checklist:

I have run the pre-commit checks with pre-commit run --all-files (see CONTRIBUTING.md instructions to set it up)
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

RedTachyon · 2024-06-28T13:31:20Z

(the tests failing are due to an unrelated issue, I'll rerun them when that's handled)

pseudo-rnd-thoughts · 2024-07-02T09:09:21Z

@TimSchneider42 Could you update the PR with the project main

pseudo-rnd-thoughts · 2024-07-02T09:23:02Z

Looking at the CI, this reminds me that I think we made the change a while ago intentionally from np.inf to an actual number primarily because we had an environment checker tool that complained about np.inf bound as they can't be normalized. You can see this in the CI as we can't use it with RescaleObservation wrapper.

I'm not sure what to do, because the current bound is unreachable to my knowledge (therefore stupid) but np.inf doesn't seem much better to me
@TimSchneider42 What was your motivation for changing it to np.inf other than it being "cleaner" than very big number?

TimSchneider42 · 2024-07-02T11:34:39Z

Hi,

From what I can see, all environments with unbounded observation spaces use np.inf, except for CartPole. So I think for consistency, it should be done this way here, too.

This also means that the RescaleObservation wrapper should not work in most environments, right? To fix this issue, maybe we should ignore unbounded observation components instead of rescaling them.

My specific use case is the following: I have a custom normalization wrapper that scales all bounded values to [-1, 1] and leaves unbounded values as is. The CartPole environment makes it seem as if the value is bounded when, in reality, it is not, which made my wrapper fail silently by scaling the value down to almost zero. Since CartPole is the only environment that seems to do it that way, I now have a specific exception for it.

Best,
Tim

pseudo-rnd-thoughts · 2024-07-02T11:46:32Z

Thanks for the rapid response @TimSchneider42, yes I think that is a very reasonable justification
Could you update the wrapper to reflect this change?

TimSchneider42 · 2024-07-02T13:38:33Z

I just patched RescaleObservation in a way that deals with unbounded observation spaces. In a nutshell, I distinguish between the following cases on a per-component basis:

The component is bounded above and below: behavior unchanged
The component is unbounded above and below: no scaling or offset is applied
The component is bounded above and unbounded below: no scaling is applied but an offset to ensure that the upper bound matches the target upper bound
The component is unbounded above and bounded below: again no scaling and an offset to make the lower bound match the target lower bound

I hope this is what you had in mind.

TimSchneider42 · 2024-07-02T13:46:48Z

@pseudo-rnd-thoughts, I get a warning from the env checker due to the limits being infinite, which makes the tests fail. How do I address that?

TimSchneider42 · 2024-07-02T13:52:23Z

I had a look at test_mujoco_custom_env.py and decided to fix the test by ignoring this specific warning.

pseudo-rnd-thoughts · 2024-07-02T14:06:07Z

I'm doing some testing to understand the PR better

from gymnasium.spaces import Box
from gymnasium.wrappers import RescaleObservation
from gymnasium.wrappers import RescaleAction
import numpy as np

from tests.testing_env import GenericTestEnv

def func(self, action):
    return action, 0, False, False, {}

env = GenericTestEnv(
    observation_space=Box(low=np.array([-10, 0, 3, -np.inf, 0,      -np.inf], dtype=np.float32),
                          high=np.array([10, 1, 5,  np.inf, np.inf,  1,    ], dtype=np.float32)),
    action_space=Box(low=np.array([-10, 0, 3, -np.inf, 0,      -np.inf], dtype=np.float32),
                     high=np.array([10, 1, 5,  np.inf, np.inf,  1,    ], dtype=np.float32)),
    step_func=func
)
print(f'{env.observation_space=}')
env = RescaleObservation(
    env,
    min_obs=np.array([0, 0, -1, -np.inf, -1,      -np.inf], dtype=np.float32),
    max_obs=np.array([1, 1, 1,   np.inf,  np.inf, 2      ], dtype=np.float32)
)
print(f'{env.observation_space=}')

env.reset()
unscaled_obs = np.array([5, 0.3, 4.5, 2, 10, -10], dtype=np.float32)
scaled_obs, *_ = env.step(unscaled_obs)
print(f'{unscaled_obs=}')
print(f'{scaled_obs=}')
assert np.all(scaled_obs == np.array([0.75, 0.3, 0.5, 2, 9, -9], dtype=np.float32))

print(f'{env.action_space=}')
env = RescaleAction(
    env,
    min_action=np.array([0, 0, -1, -np.inf, -1,      -np.inf], dtype=np.float32),
    max_action=np.array([1, 1, 1,   np.inf,  np.inf, 2      ], dtype=np.float32)
)
print(f'{env.action_space=}')
env.reset()

unscaled_action = np.array([5, 0.3, 4.5, 2, 10, -10], dtype=np.float32)
scaled_action, *_ = env.step(unscaled_action)
print(f'{unscaled_action=}')
print(f'{scaled_action=}')
assert np.all(scaled_action == np.array([0.75, 0.3, 0.5, 2, 9, -9], dtype=np.float32))

Does this make sense? We can possibly add this testing if it is correct

pseudo-rnd-thoughts · 2024-07-02T14:16:24Z

@TimSchneider42 I've edited the code above to make it work, would you be able to make similar changes to RescaleAction?
Is there a good way of modifying the unbounded values to have a bounded value that makes sense?

TimSchneider42 · 2024-07-03T12:14:33Z

@pseudo-rnd-thoughts, I can look into RescaleAction. After all, it should be exactly the same.

Is there a good way of modifying the unbounded values to have a bounded value that makes sense?

What do you mean by that?

Also, some doctest checks keep failing. Is there a script I can run to update the docstrings, or do I have to do this by hand?

pseudo-rnd-thoughts · 2024-07-03T12:52:05Z

For the doctests, you'll need to update individually and you need NumPy 2.0 for it
Run pytest --doctest-module gymnasium/ to check it

Ignore me on the unbounded / bounded question

…ama-Foundation#1092.

… spaces

…failed due to the velocity observation limits of CartPole being infinite

TimSchneider42 · 2024-07-03T13:42:38Z

@pseudo-rnd-thoughts, I fixed the docstrings and reorganized the commits. Is it fine to be merged now?

pseudo-rnd-thoughts

Amazing, thanks for the PR and making all the changes

TimSchneider42 force-pushed the main branch from 45dcd5d to 251cf7b Compare July 2, 2024 13:57

TimSchneider42 force-pushed the main branch from 0600811 to b2e8681 Compare July 3, 2024 12:49

TimSchneider42 and others added 5 commits July 3, 2024 15:26

Replaced np.finfo(np.float32).max by np.inf in cartpole.py. Fixes Far…

02f8dd9

…ama-Foundation#1092.

Made RescaleObservation capable of dealing with unbounded observation…

d0d9a87

… spaces

Fixed test_env_checker.test_no_error_warnings test, which previously …

3d3b4b8

…failed due to the velocity observation limits of CartPole being infinite

Made RescaleAction capable of dealing with unbounded observation spaces

9cd1516

Updated docstrings to reflect new bounds of CartPole

d78ff24

TimSchneider42 force-pushed the main branch from f40c710 to d78ff24 Compare July 3, 2024 13:32

pseudo-rnd-thoughts approved these changes Jul 3, 2024

View reviewed changes

pseudo-rnd-thoughts changed the title ~~Replaced np.finfo(np.float32).max by np.inf in cartpole.py.~~ Update RescaleAction and RescaleObservation for np.inf bounds Jul 3, 2024

pseudo-rnd-thoughts merged commit fc55d47 into Farama-Foundation:main Jul 3, 2024
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update `RescaleAction` and `RescaleObservation` for `np.inf` bounds #1095

Update `RescaleAction` and `RescaleObservation` for `np.inf` bounds #1095

TimSchneider42 commented Jun 28, 2024

RedTachyon commented Jun 28, 2024

pseudo-rnd-thoughts commented Jul 2, 2024

pseudo-rnd-thoughts commented Jul 2, 2024 •

edited

Loading

TimSchneider42 commented Jul 2, 2024

pseudo-rnd-thoughts commented Jul 2, 2024

TimSchneider42 commented Jul 2, 2024 •

edited

Loading

TimSchneider42 commented Jul 2, 2024

TimSchneider42 commented Jul 2, 2024

pseudo-rnd-thoughts commented Jul 2, 2024 •

edited

Loading

pseudo-rnd-thoughts commented Jul 2, 2024

TimSchneider42 commented Jul 3, 2024

pseudo-rnd-thoughts commented Jul 3, 2024

TimSchneider42 commented Jul 3, 2024

pseudo-rnd-thoughts left a comment

Update RescaleAction and RescaleObservation for np.inf bounds #1095

Update RescaleAction and RescaleObservation for np.inf bounds #1095

Conversation

TimSchneider42 commented Jun 28, 2024

Description

Type of change

Checklist:

RedTachyon commented Jun 28, 2024

pseudo-rnd-thoughts commented Jul 2, 2024

pseudo-rnd-thoughts commented Jul 2, 2024 • edited Loading

TimSchneider42 commented Jul 2, 2024

pseudo-rnd-thoughts commented Jul 2, 2024

TimSchneider42 commented Jul 2, 2024 • edited Loading

TimSchneider42 commented Jul 2, 2024

TimSchneider42 commented Jul 2, 2024

pseudo-rnd-thoughts commented Jul 2, 2024 • edited Loading

pseudo-rnd-thoughts commented Jul 2, 2024

TimSchneider42 commented Jul 3, 2024

pseudo-rnd-thoughts commented Jul 3, 2024

TimSchneider42 commented Jul 3, 2024

pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

Update `RescaleAction` and `RescaleObservation` for `np.inf` bounds #1095

Update `RescaleAction` and `RescaleObservation` for `np.inf` bounds #1095

pseudo-rnd-thoughts commented Jul 2, 2024 •

edited

Loading

TimSchneider42 commented Jul 2, 2024 •

edited

Loading

pseudo-rnd-thoughts commented Jul 2, 2024 •

edited

Loading