Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PBT issue #263

Open
frankie4fingers opened this issue Feb 1, 2023 · 6 comments
Open

PBT issue #263

frankie4fingers opened this issue Feb 1, 2023 · 6 comments

Comments

@frankie4fingers
Copy link

I've received the following error, when run PBT on single batch env. But without PBT sf works fine. I've tried to fix this but then I receive next portion of errors so it looks like PBT not tested for some time. Could someone please check.

  File ".../flow_env.py", line 221, in <module>
    sys.exit(main())
  File "../flow_env.py", line 215, in main
    status = run_rl(cfg)
  File "../anaconda3/lib/python3.9/site-packages/sample_factory/train.py", line 37, in run_rl
    status = runner.init()
  File "../anaconda3/lib/python3.9/site-packages/sample_factory/algo/runners/runner_serial.py", line 16, in init
    status = super().init()
  File "../anaconda3/lib/python3.9/site-packages/sample_factory/algo/runners/runner.py", line 559, in init
    self._observers_call(AlgoObserver.on_init, self)
  File "../anaconda3/lib/python3.9/site-packages/sample_factory/algo/runners/runner.py", line 514, in _observers_call
    getattr(observer, func.__name__)(*args, **kwargs)
  File "../anaconda3/lib/python3.9/site-packages/sample_factory/pbt/population_based_training.py", line 156, in on_init
    self.policy_cfg[policy_id][param_name] = self.cfg[param_name]
TypeError: "Namespace" object is not subscriptable```
@alex-petrenko
Copy link
Owner

Hi! Can you please share your command line if possible? A simple PBT setup is a part of automatic tests on Github Actions, so it's tested regularly and should theoretically work!

@alex-petrenko
Copy link
Owner

Okay, I managed to reproduce this.
There's a bug where if --restart_behavior is not resume, then cfg Namespace object is not turned into an AttrDict.
This function expects it to be AttrDict.

I'll fix it in the next release! Thank you for reporting!

For a quick patch, add the following code to your parse_cfg function:

from sample_factory.cfg.arguments import cfg_dict

cfg = cfg_dict(cfg)

This will make sure that you're using AttrDict cfg.

@frankie4fingers
Copy link
Author

Thanks! Will check :))

@frankie4fingers
Copy link
Author

It seems I can run now sf in PBT mode, but I still can't find any changes with learning. Juts to make sure:

  1. I run sf with the following config:
    `env_configs = dict(
    test_env=dict(
    env_agents=8192,
    batch_size=256,
    batched_sampling=True,
    actor_worker_gpus=[],
    use_rnn=True,
    policy_initialization="torch_default",
    lr_schedule="constant",
    lr_schedule_kl_threshold=0.008,
    gae_lambda=0.95,
    with_vtrace=False,
    value_bootstrap=False,
    serial_mode=True,
    async_rl=False,
    use_env_info_cache=False,
    encoder_mlp_layers=[64, 32],
    nonlinearity="elu",
    expected_max_policy_lag=0,
    exploration_loss_coeff=1e-3,
    ppo_clip_ratio=0.2,
    value_loss_coeff=0.5,
    normalize_returns=False,
    normalize_input=True,
    num_envs_per_worker=1,
    worker_num_splits=1,
    restart_behavior="overwrite",
    reward_scale=1,

    pbt_mix_policies_in_one_env=False,
    train_for_env_steps=300000,
    is_multiagent=False,
    num_workers=1,
    with_pbt=True,
    num_epochs=1,
    num_batches_per_epoch=32,
    gamma=0.99,
    rnn_size=128,
    rnn_type="gru",
    rnn_num_layers=1,
    rollout=32,
    recurrence=16,
    num_policies = 10,
    pbt_period_env_steps = 200000,
    pbt_start_mutation = 200000,
    save_best_after=int(50000),
    ),
    )`

  2. My test env simulate 8-24k env's inside so sf works with one worker on single gpu (2060rtx - linux with 32gb mem).

  3. I try to find best params, so PBT should help here, but nothing happens - it just continue learning until train_for_env_steps reached then stops.

  4. If I run learning with num_policies=2 then sf train for 104k samples and show empty updates like Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 5889.0). Total num frames: 524288. Throughput: 0: 0.0, 1: 0.0. Samples: 524288. Policy #0 lag: (min: 18.0, avg: 19.0, max: 50.0). I think it waits another worker. But I have only one.

  5. I tested several possible params like num_policies=2, num_policies=1, num_workers=2 and 2 batched envs but actually I do not see any changes in log. I prefer using batched env with trully sync mode without policy lag to make sure that lag can't hurt final policy.

So the question is - can I run internal PBT here with my single test batched env which will run experiment for 5mln steps and then periodically mutate params to learn another policy (ex if I set num_policies=10) to find best possible params in sync mode one by one? Or it's impossible and I have to implement using external PBT from NNI for example?

@frankie4fingers
Copy link
Author

@alex-petrenko could you please help with my problem above?

@alex-petrenko
Copy link
Owner

alex-petrenko commented Feb 7, 2023

I believe you actually have a configuration that I never tried before, which is batched sampling + PBT.

A little bit of background would help to explain this. Originally the current version of PBT was written for SF 1.0 which didn't have a batched sampling regime and didn't support vectorized simulators. The idea with PBT was that we can change the mapping between the specific agent in an environment and the policy that controls it, thus mixing many different agents/policies in multi-agent environments. The original (non-batched) sampling regime is perfect for this. See for example: https://github.com/alex-petrenko/sample-factory/blob/master/sample_factory/algo/utils/agent_policy_mapping.py

With batched sampling mode the whole idea is that we process experience in big batches. Collect like 2K or 4K timesteps and directly pass it to the inference worker to calculate actions, and so on. This makes it very difficult to arbitrarily map these 2K or 4K agents to different policies - this kind of re-batching would be super slow and would defeat the whole purpose.

Therefore currently batched mode only supports PBT on the level of workers, i.e. if you have num_workers = N then experience from these N rollout workers will be sent to N different policies participating in PBT. I haven't tested this regime very much but it should work.

I'm not sure if you can fit many workers and policies on one 2060 though... so I'm not sure if this is even helpful. In my experience PBT is starting to be helpful from ~4 policies, but starts to shine at 8+.

Let me know if this does it for you. Otherwise, maybe taking a look at external tools make sense in the meantime.

I've also started to work on an "external" PBT implementation that would work with Sample Factory. Instead of being a part of the algorithm and tightly integrated with it, it would basically start N independent experiments and treat them as a population. This would not work with multi-agent envs where you want multiple policies competing in the same environment, but would work for single-agent envs, hyperparameter tuning and so on.
I have a preprint about this that I can privately share with you if you email me!
No estimates for when this will be available though. I have a bunch of things on my roadmap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants