Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update RescaleAction and RescaleObservation for np.inf bounds #1095

Merged
merged 5 commits into from
Jul 3, 2024

Conversation

TimSchneider42
Copy link
Contributor

Description

Fixes #1092

Type of change

Please delete options that are not relevant.

  • Documentation only change (no code changed)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Checklist:

  • I have run the pre-commit checks with pre-commit run --all-files (see CONTRIBUTING.md instructions to set it up)
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@RedTachyon
Copy link
Member

(the tests failing are due to an unrelated issue, I'll rerun them when that's handled)

@pseudo-rnd-thoughts
Copy link
Member

@TimSchneider42 Could you update the PR with the project main

@pseudo-rnd-thoughts
Copy link
Member

pseudo-rnd-thoughts commented Jul 2, 2024

Looking at the CI, this reminds me that I think we made the change a while ago intentionally from np.inf to an actual number primarily because we had an environment checker tool that complained about np.inf bound as they can't be normalized. You can see this in the CI as we can't use it with RescaleObservation wrapper.

I'm not sure what to do, because the current bound is unreachable to my knowledge (therefore stupid) but np.inf doesn't seem much better to me
@TimSchneider42 What was your motivation for changing it to np.inf other than it being "cleaner" than very big number?

@TimSchneider42
Copy link
Contributor Author

Hi,

From what I can see, all environments with unbounded observation spaces use np.inf, except for CartPole. So I think for consistency, it should be done this way here, too.

This also means that the RescaleObservation wrapper should not work in most environments, right? To fix this issue, maybe we should ignore unbounded observation components instead of rescaling them.

My specific use case is the following: I have a custom normalization wrapper that scales all bounded values to [-1, 1] and leaves unbounded values as is. The CartPole environment makes it seem as if the value is bounded when, in reality, it is not, which made my wrapper fail silently by scaling the value down to almost zero. Since CartPole is the only environment that seems to do it that way, I now have a specific exception for it.

Best,
Tim

@pseudo-rnd-thoughts
Copy link
Member

Thanks for the rapid response @TimSchneider42, yes I think that is a very reasonable justification
Could you update the wrapper to reflect this change?

@TimSchneider42
Copy link
Contributor Author

TimSchneider42 commented Jul 2, 2024

I just patched RescaleObservation in a way that deals with unbounded observation spaces. In a nutshell, I distinguish between the following cases on a per-component basis:

  1. The component is bounded above and below: behavior unchanged
  2. The component is unbounded above and below: no scaling or offset is applied
  3. The component is bounded above and unbounded below: no scaling is applied but an offset to ensure that the upper bound matches the target upper bound
  4. The component is unbounded above and bounded below: again no scaling and an offset to make the lower bound match the target lower bound

I hope this is what you had in mind.

@TimSchneider42
Copy link
Contributor Author

@pseudo-rnd-thoughts, I get a warning from the env checker due to the limits being infinite, which makes the tests fail. How do I address that?

@TimSchneider42
Copy link
Contributor Author

I had a look at test_mujoco_custom_env.py and decided to fix the test by ignoring this specific warning.

@pseudo-rnd-thoughts
Copy link
Member

pseudo-rnd-thoughts commented Jul 2, 2024

I'm doing some testing to understand the PR better

from gymnasium.spaces import Box
from gymnasium.wrappers import RescaleObservation
from gymnasium.wrappers import RescaleAction
import numpy as np

from tests.testing_env import GenericTestEnv

def func(self, action):
    return action, 0, False, False, {}

env = GenericTestEnv(
    observation_space=Box(low=np.array([-10, 0, 3, -np.inf, 0,      -np.inf], dtype=np.float32),
                          high=np.array([10, 1, 5,  np.inf, np.inf,  1,    ], dtype=np.float32)),
    action_space=Box(low=np.array([-10, 0, 3, -np.inf, 0,      -np.inf], dtype=np.float32),
                     high=np.array([10, 1, 5,  np.inf, np.inf,  1,    ], dtype=np.float32)),
    step_func=func
)
print(f'{env.observation_space=}')
env = RescaleObservation(
    env,
    min_obs=np.array([0, 0, -1, -np.inf, -1,      -np.inf], dtype=np.float32),
    max_obs=np.array([1, 1, 1,   np.inf,  np.inf, 2      ], dtype=np.float32)
)
print(f'{env.observation_space=}')

env.reset()
unscaled_obs = np.array([5, 0.3, 4.5, 2, 10, -10], dtype=np.float32)
scaled_obs, *_ = env.step(unscaled_obs)
print(f'{unscaled_obs=}')
print(f'{scaled_obs=}')
assert np.all(scaled_obs == np.array([0.75, 0.3, 0.5, 2, 9, -9], dtype=np.float32))

print(f'{env.action_space=}')
env = RescaleAction(
    env,
    min_action=np.array([0, 0, -1, -np.inf, -1,      -np.inf], dtype=np.float32),
    max_action=np.array([1, 1, 1,   np.inf,  np.inf, 2      ], dtype=np.float32)
)
print(f'{env.action_space=}')
env.reset()

unscaled_action = np.array([5, 0.3, 4.5, 2, 10, -10], dtype=np.float32)
scaled_action, *_ = env.step(unscaled_action)
print(f'{unscaled_action=}')
print(f'{scaled_action=}')
assert np.all(scaled_action == np.array([0.75, 0.3, 0.5, 2, 9, -9], dtype=np.float32))

Does this make sense? We can possibly add this testing if it is correct

@pseudo-rnd-thoughts
Copy link
Member

@TimSchneider42 I've edited the code above to make it work, would you be able to make similar changes to RescaleAction?
Is there a good way of modifying the unbounded values to have a bounded value that makes sense?

@TimSchneider42
Copy link
Contributor Author

@pseudo-rnd-thoughts, I can look into RescaleAction. After all, it should be exactly the same.

Is there a good way of modifying the unbounded values to have a bounded value that makes sense?

What do you mean by that?

Also, some doctest checks keep failing. Is there a script I can run to update the docstrings, or do I have to do this by hand?

@pseudo-rnd-thoughts
Copy link
Member

For the doctests, you'll need to update individually and you need NumPy 2.0 for it
Run pytest --doctest-module gymnasium/ to check it

Ignore me on the unbounded / bounded question

@TimSchneider42
Copy link
Contributor Author

@pseudo-rnd-thoughts, I fixed the docstrings and reorganized the commits. Is it fine to be merged now?

Copy link
Member

@pseudo-rnd-thoughts pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing, thanks for the PR and making all the changes

@pseudo-rnd-thoughts pseudo-rnd-thoughts changed the title Replaced np.finfo(np.float32).max by np.inf in cartpole.py. Update RescaleAction and RescaleObservation for np.inf bounds Jul 3, 2024
@pseudo-rnd-thoughts pseudo-rnd-thoughts merged commit fc55d47 into Farama-Foundation:main Jul 3, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Proposal] Change np.finfo(np.float32).max in CartPole to np.inf
3 participants