-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added behavior cloning baselines #531
Conversation
examples/baselines/bc/bc.py
Outdated
if args.video: | ||
os.makedirs(os.path.join(log_path, args.video_path)) | ||
gpu = args.sim_backend == "gpu" | ||
envs = make_eval_envs(args.env, args.num_envs, stats, control_mode,gpu) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should have a CLI argument to modify the max episode steps number to be higher from the command line. The code assumes 100 at the moment? You can replace max_traj_len with max_episode_steps as an argument. This should then also be passed to the environment creation tool.
examples/baselines/bc/bc_rgbd.py
Outdated
os.makedirs(os.path.join(log_path, args.video_path)) | ||
|
||
|
||
envs = make_eval_envs(args.env, args.num_envs, control_mode, gpu) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend sticking to using the flatteng RGBD observation wrapper as used in the PPO rgb code instead of making all the extra process obs functions.
examples/baselines/bc/bc_rgbd.py
Outdated
successes.append(fin_info["success"]) | ||
returns.append(rewards[mask]) | ||
|
||
log_dict["charts/eval_elapsed_steps"] = np.concatenate(el_steps).mean() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for logged data, can you try to be as consistent as possible with the diffusion policy baseline?
Namely there should be
eval/success_once
eval/success_at_end
eval/episode_len
eval/return
examples/baselines/bc/bc_rgbd.py
Outdated
rewards = np.zeros((args.num_envs,)) | ||
obs, _ = envs.reset() | ||
|
||
for _ in range(100): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
100 is hardcoded here. should be max episode steps
examples/baselines/bc/bc_rgbd.py
Outdated
load_count=args.load_count, | ||
) | ||
|
||
gpu = args.sim_backend == "gpu" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename gpu to gpu_sim
. gpu
alone is a bit ambigious
examples/baselines/bc/bc_rgbd.py
Outdated
group=config["group"], | ||
name=log_name.replace(os.path.sep, "__"), | ||
id=str(uuid.uuid4()), | ||
tags=["BC_RGBD"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for consistency lets name the tag "bc" for both bc rgbd and bc state, and also label group to be "BehaviorCloning". Default project name in the tyro args should be ManiSkill.
moreover why is there a ID generated here? We don't need to use uuid.
Added the behavior cloning baseline for both state and rgbd observations.