Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the value function clipping #208

Closed
Tracked by #206
vwxyzjn opened this issue Jun 20, 2022 · 0 comments
Closed
Tracked by #206

Remove the value function clipping #208

vwxyzjn opened this issue Jun 20, 2022 · 0 comments

Comments

@vwxyzjn
Copy link
Owner

vwxyzjn commented Jun 20, 2022

Problem description

Per Andrychowicz, et al. (2021) and anecdotal evidence, value function clipping is not useful. Hence we should remove the following code.

cleanrl/cleanrl/ppo.py

Lines 283 to 291 in 94a685d

v_loss_unclipped = (newvalue - b_returns[mb_inds]) ** 2
v_clipped = b_values[mb_inds] + torch.clamp(
newvalue - b_values[mb_inds],
-args.clip_coef,
args.clip_coef,
)
v_loss_clipped = (v_clipped - b_returns[mb_inds]) ** 2
v_loss_max = torch.max(v_loss_unclipped, v_loss_clipped)
v_loss = 0.5 * v_loss_max.mean()

We should do it with great care - conducting benchmark experiments confirming this removal results in the same or better performance in the games we test. That is, we should re-run the following and confirms the performance is ok.

# export WANDB_ENTITY=openrlbenchmark
poetry install
OMP_NUM_THREADS=1 xvfb-run -a python -m cleanrl_utils.benchmark \
--env-ids CartPole-v1 Acrobot-v1 MountainCar-v0 \
--command "poetry run python cleanrl/ppo.py --cuda False --track --capture-video" \
--num-seeds 3 \
--workers 9
poetry install -E atari
OMP_NUM_THREADS=1 xvfb-run -a python -m cleanrl_utils.benchmark \
--env-ids PongNoFrameskip-v4 BeamRiderNoFrameskip-v4 BreakoutNoFrameskip-v4 \
--command "poetry run python cleanrl/ppo_atari.py --track --capture-video" \
--num-seeds 3 \
--workers 3
poetry install -E atari
OMP_NUM_THREADS=1 xvfb-run -a python -m cleanrl_utils.benchmark \
--env-ids PongNoFrameskip-v4 BeamRiderNoFrameskip-v4 BreakoutNoFrameskip-v4 \
--command "poetry run python cleanrl/ppo_atari_lstm.py --track --capture-video" \
--num-seeds 3 \
--workers 3
poetry install -E envpool
xvfb-run -a python -m cleanrl_utils.benchmark \
--env-ids Pong-v5 BeamRider-v5 Breakout-v5 \
--command "poetry run python cleanrl/ppo_atari_envpool.py --track --capture-video" \
--num-seeds 3 \
--workers 1
poetry install -E "mujoco pybullet"
python -c "import mujoco_py"
OMP_NUM_THREADS=1 xvfb-run -a python -m cleanrl_utils.benchmark \
--env-ids HalfCheetah-v2 Walker2d-v2 Hopper-v2 \
--command "poetry run python cleanrl/ppo_continuous_action.py --cuda False --track --capture-video" \
--num-seeds 3 \
--workers 9
poetry install -E procgen
xvfb-run -a python -m cleanrl_utils.benchmark \
--env-ids starpilot bossfight bigfish \
--command "poetry run python cleanrl/ppo_procgen.py --track --capture-video" \
--num-seeds 3 \
--workers 1
poetry install -E atari
xvfb-run -a python -m cleanrl_utils.benchmark \
--env-ids PongNoFrameskip-v4 BeamRiderNoFrameskip-v4 BreakoutNoFrameskip-v4 \
--command "poetry run torchrun --standalone --nnodes=1 --nproc_per_node=2 cleanrl/ppo_atari_multigpu.py --track --capture-video" \
--num-seeds 3 \
--workers 1
poetry install -E "pettingzoo atari"
poetry run AutoROM --accept-license
xvfb-run -a python -m cleanrl_utils.benchmark \
--env-ids pong_v3 surround_v2 tennis_v3 \
--command "poetry run python cleanrl/ppo_pettingzoo_ma_atari.py --track --capture-video" \
--num-seeds 3 \
--workers 3

@vwxyzjn vwxyzjn mentioned this issue Jun 20, 2022
5 tasks
@vwxyzjn vwxyzjn closed this as completed Nov 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant