PBT issue #263

frankie4fingers · 2023-02-01T11:43:56Z

I've received the following error, when run PBT on single batch env. But without PBT sf works fine. I've tried to fix this but then I receive next portion of errors so it looks like PBT not tested for some time. Could someone please check.

  File ".../flow_env.py", line 221, in <module>
    sys.exit(main())
  File "../flow_env.py", line 215, in main
    status = run_rl(cfg)
  File "../anaconda3/lib/python3.9/site-packages/sample_factory/train.py", line 37, in run_rl
    status = runner.init()
  File "../anaconda3/lib/python3.9/site-packages/sample_factory/algo/runners/runner_serial.py", line 16, in init
    status = super().init()
  File "../anaconda3/lib/python3.9/site-packages/sample_factory/algo/runners/runner.py", line 559, in init
    self._observers_call(AlgoObserver.on_init, self)
  File "../anaconda3/lib/python3.9/site-packages/sample_factory/algo/runners/runner.py", line 514, in _observers_call
    getattr(observer, func.__name__)(*args, **kwargs)
  File "../anaconda3/lib/python3.9/site-packages/sample_factory/pbt/population_based_training.py", line 156, in on_init
    self.policy_cfg[policy_id][param_name] = self.cfg[param_name]
TypeError: "Namespace" object is not subscriptable```

The text was updated successfully, but these errors were encountered:

alex-petrenko · 2023-02-01T21:09:38Z

Hi! Can you please share your command line if possible? A simple PBT setup is a part of automatic tests on Github Actions, so it's tested regularly and should theoretically work!

alex-petrenko · 2023-02-01T22:05:00Z

Okay, I managed to reproduce this.
There's a bug where if --restart_behavior is not resume, then cfg Namespace object is not turned into an AttrDict.
This function expects it to be AttrDict.

I'll fix it in the next release! Thank you for reporting!

For a quick patch, add the following code to your parse_cfg function:

from sample_factory.cfg.arguments import cfg_dict

cfg = cfg_dict(cfg)

This will make sure that you're using AttrDict cfg.

frankie4fingers · 2023-02-02T08:54:30Z

Thanks! Will check :))

frankie4fingers · 2023-02-03T12:22:35Z

It seems I can run now sf in PBT mode, but I still can't find any changes with learning. Juts to make sure:

I run sf with the following config:
`env_configs = dict(
test_env=dict(
env_agents=8192,
batch_size=256,
batched_sampling=True,
actor_worker_gpus=[],
use_rnn=True,
policy_initialization="torch_default",
lr_schedule="constant",
lr_schedule_kl_threshold=0.008,
gae_lambda=0.95,
with_vtrace=False,
value_bootstrap=False,
serial_mode=True,
async_rl=False,
use_env_info_cache=False,
encoder_mlp_layers=[64, 32],
nonlinearity="elu",
expected_max_policy_lag=0,
exploration_loss_coeff=1e-3,
ppo_clip_ratio=0.2,
value_loss_coeff=0.5,
normalize_returns=False,
normalize_input=True,
num_envs_per_worker=1,
worker_num_splits=1,
restart_behavior="overwrite",
reward_scale=1,

pbt_mix_policies_in_one_env=False,
train_for_env_steps=300000,
is_multiagent=False,
num_workers=1,
with_pbt=True,
num_epochs=1,
num_batches_per_epoch=32,
gamma=0.99,
rnn_size=128,
rnn_type="gru",
rnn_num_layers=1,
rollout=32,
recurrence=16,
num_policies = 10,
pbt_period_env_steps = 200000,
pbt_start_mutation = 200000,
save_best_after=int(50000),
),
)`
My test env simulate 8-24k env's inside so sf works with one worker on single gpu (2060rtx - linux with 32gb mem).
I try to find best params, so PBT should help here, but nothing happens - it just continue learning until train_for_env_steps reached then stops.
If I run learning with num_policies=2 then sf train for 104k samples and show empty updates like Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 5889.0). Total num frames: 524288. Throughput: 0: 0.0, 1: 0.0. Samples: 524288. Policy #0 lag: (min: 18.0, avg: 19.0, max: 50.0). I think it waits another worker. But I have only one.
I tested several possible params like num_policies=2, num_policies=1, num_workers=2 and 2 batched envs but actually I do not see any changes in log. I prefer using batched env with trully sync mode without policy lag to make sure that lag can't hurt final policy.

So the question is - can I run internal PBT here with my single test batched env which will run experiment for 5mln steps and then periodically mutate params to learn another policy (ex if I set num_policies=10) to find best possible params in sync mode one by one? Or it's impossible and I have to implement using external PBT from NNI for example?

frankie4fingers · 2023-02-04T03:49:35Z

@alex-petrenko could you please help with my problem above?

alex-petrenko · 2023-02-07T00:13:29Z

I believe you actually have a configuration that I never tried before, which is batched sampling + PBT.

A little bit of background would help to explain this. Originally the current version of PBT was written for SF 1.0 which didn't have a batched sampling regime and didn't support vectorized simulators. The idea with PBT was that we can change the mapping between the specific agent in an environment and the policy that controls it, thus mixing many different agents/policies in multi-agent environments. The original (non-batched) sampling regime is perfect for this. See for example: https://github.com/alex-petrenko/sample-factory/blob/master/sample_factory/algo/utils/agent_policy_mapping.py

With batched sampling mode the whole idea is that we process experience in big batches. Collect like 2K or 4K timesteps and directly pass it to the inference worker to calculate actions, and so on. This makes it very difficult to arbitrarily map these 2K or 4K agents to different policies - this kind of re-batching would be super slow and would defeat the whole purpose.

Therefore currently batched mode only supports PBT on the level of workers, i.e. if you have num_workers = N then experience from these N rollout workers will be sent to N different policies participating in PBT. I haven't tested this regime very much but it should work.

I'm not sure if you can fit many workers and policies on one 2060 though... so I'm not sure if this is even helpful. In my experience PBT is starting to be helpful from ~4 policies, but starts to shine at 8+.

Let me know if this does it for you. Otherwise, maybe taking a look at external tools make sense in the meantime.

I've also started to work on an "external" PBT implementation that would work with Sample Factory. Instead of being a part of the algorithm and tightly integrated with it, it would basically start N independent experiments and treat them as a population. This would not work with multi-agent envs where you want multiple policies competing in the same environment, but would work for single-agent envs, hyperparameter tuning and so on.
I have a preprint about this that I can privately share with you if you email me!
No estimates for when this will be available though. I have a bunch of things on my roadmap.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PBT issue #263

PBT issue #263

frankie4fingers commented Feb 1, 2023

alex-petrenko commented Feb 1, 2023

alex-petrenko commented Feb 1, 2023

frankie4fingers commented Feb 2, 2023

frankie4fingers commented Feb 3, 2023

frankie4fingers commented Feb 4, 2023

alex-petrenko commented Feb 7, 2023 •

edited

Loading

PBT issue #263

PBT issue #263

Comments

frankie4fingers commented Feb 1, 2023

alex-petrenko commented Feb 1, 2023

alex-petrenko commented Feb 1, 2023

frankie4fingers commented Feb 2, 2023

frankie4fingers commented Feb 3, 2023

frankie4fingers commented Feb 4, 2023

alex-petrenko commented Feb 7, 2023 • edited Loading

alex-petrenko commented Feb 7, 2023 •

edited

Loading