Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added behavior cloning baselines #531

Merged
merged 2 commits into from
Sep 3, 2024
Merged

Added behavior cloning baselines #531

merged 2 commits into from
Sep 3, 2024

Conversation

arnavg115
Copy link
Collaborator

Added the behavior cloning baseline for both state and rgbd observations.

if args.video:
os.makedirs(os.path.join(log_path, args.video_path))
gpu = args.sim_backend == "gpu"
envs = make_eval_envs(args.env, args.num_envs, stats, control_mode,gpu)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should have a CLI argument to modify the max episode steps number to be higher from the command line. The code assumes 100 at the moment? You can replace max_traj_len with max_episode_steps as an argument. This should then also be passed to the environment creation tool.

os.makedirs(os.path.join(log_path, args.video_path))


envs = make_eval_envs(args.env, args.num_envs, control_mode, gpu)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend sticking to using the flatteng RGBD observation wrapper as used in the PPO rgb code instead of making all the extra process obs functions.

successes.append(fin_info["success"])
returns.append(rewards[mask])

log_dict["charts/eval_elapsed_steps"] = np.concatenate(el_steps).mean()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for logged data, can you try to be as consistent as possible with the diffusion policy baseline?

Namely there should be

eval/success_once
eval/success_at_end
eval/episode_len
eval/return

rewards = np.zeros((args.num_envs,))
obs, _ = envs.reset()

for _ in range(100):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100 is hardcoded here. should be max episode steps

load_count=args.load_count,
)

gpu = args.sim_backend == "gpu"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename gpu to gpu_sim. gpu alone is a bit ambigious

group=config["group"],
name=log_name.replace(os.path.sep, "__"),
id=str(uuid.uuid4()),
tags=["BC_RGBD"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for consistency lets name the tag "bc" for both bc rgbd and bc state, and also label group to be "BehaviorCloning". Default project name in the tyro args should be ManiSkill.

moreover why is there a ID generated here? We don't need to use uuid.

@StoneT2000 StoneT2000 merged commit 510393b into main Sep 3, 2024
@StoneT2000 StoneT2000 deleted the bc_baselines branch September 3, 2024 22:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants