New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Added behavior cloning baselines #531

Merged

StoneT2000 merged 2 commits into main from bc_baselines

Sep 3, 2024

Collaborator

arnavg115 commented Aug 28, 2024

Added the behavior cloning baseline for both state and rgbd observations.


          added behavior cloning baseline

2c0d5c3

StoneT2000 requested changes

View reviewed changes

examples/baselines/bc/bc.py Outdated

+                  if args.video:
+                     os.makedirs(os.path.join(log_path, args.video_path))
+                  gpu = args.sim_backend == "gpu"
+                  envs = make_eval_envs(args.env, args.num_envs, stats, control_mode,gpu)

Member

StoneT2000 Aug 28, 2024

we should have a CLI argument to modify the max episode steps number to be higher from the command line. The code assumes 100 at the moment? You can replace max_traj_len with max_episode_steps as an argument. This should then also be passed to the environment creation tool.

examples/baselines/bc/bc_rgbd.py Outdated

		os.makedirs(os.path.join(log_path, args.video_path))


		envs = make_eval_envs(args.env, args.num_envs, control_mode, gpu)

Member

StoneT2000 Aug 28, 2024

I recommend sticking to using the flatteng RGBD observation wrapper as used in the PPO rgb code instead of making all the extra process obs functions.

examples/baselines/bc/bc_rgbd.py Outdated

+                                          successes.append(fin_info["success"])
+                                  returns.append(rewards[mask])
+                          log_dict["charts/eval_elapsed_steps"] = np.concatenate(el_steps).mean()

Member

StoneT2000 Aug 28, 2024

for logged data, can you try to be as consistent as possible with the diffusion policy baseline?

Namely there should be

eval/success_once
eval/success_at_end
eval/episode_len
eval/return

examples/baselines/bc/bc_rgbd.py Outdated

+                          rewards = np.zeros((args.num_envs,))
+                          obs, _ = envs.reset()
+                          for _ in range(100):

Member

StoneT2000 Aug 28, 2024

100 is hardcoded here. should be max episode steps

examples/baselines/bc/bc_rgbd.py Outdated

+                      load_count=args.load_count,
+                  )
+                  gpu = args.sim_backend == "gpu"

Member

StoneT2000 Aug 28, 2024

rename gpu to gpu_sim. gpu alone is a bit ambigious

examples/baselines/bc/bc_rgbd.py Outdated

+                      group=config["group"],
+                      name=log_name.replace(os.path.sep, "__"),
+                      id=str(uuid.uuid4()),
+                      tags=["BC_RGBD"]

Member

StoneT2000 Aug 28, 2024

for consistency lets name the tag "bc" for both bc rgbd and bc state, and also label group to be "BehaviorCloning". Default project name in the tyro args should be ManiSkill.

moreover why is there a ID generated here? We don't need to use uuid.


          made suggested fixes

308fd6b

StoneT2000 merged commit 510393b into main

StoneT2000 deleted the bc_baselines branch

September 3, 2024 22:30

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet