Skip to content

Latest commit

 

History

History
34 lines (17 loc) · 2.87 KB

FAQ.md

File metadata and controls

34 lines (17 loc) · 2.87 KB

Frequently Asked Questions

Are the step-by-step instructions aligned with subgoals?

Yes, a step-by-step instruction has a corresponding subgoal in the training and validation trajectories. If you use this alignment during training, please see the submission guidelines for leaderboard submissions.

Getting 100% success rate with ground-truth trajectories

You should be able to achieve >99% success rate on training and validation tasks with the ground-truth actions and masks from the dataset. Occasionally, some non-determistic behaviors in THOR can lead to failures, but they are extremely rare.

Can you train an agent without mask prediction?

Mask prediction is an important part of the ALFRED challenge. Unlike non-interactive environments (e.g vision-language navigation) here it's necessary for the agent to specify what exactly it wants to interact with.

Why do feat_conv.pt in Full Dataset have 10 more frames than the number of images?

The last 10 frames are copies of the features from the last image frame.

Can I train with templated goal descriptions?

Yes. Run the training script with --use_templated_goals.

How do I get panoramic image observations?

You can use augment_trajectories.py to replay all the trajectories and augment the visual observations. At a step, use the THOR API to look around and take 6-12 shots of the surrounding. Then stitch together these shots to create a panoramic image for a frame. You might have to set 'forceAction': True for smooth moveahead/rotate/look. Note that getting panoramic images during test time would incur the additional cost of looking around with the agent.

Why do feat_conv.pt in Modeling Quickstart contain fewer frames than in Full Dataset

The Full Dataset contains extracted Resnet features for a frame in ['images'] which include filler frames inbetween a low-action (used to generate smooth videos), whereas Modeling Quickstart only contains features for a low_idx that correspond to frames after taking a low-level action.

Can I train the model on a smaller dataset for quick debugging?

Yes, run the training script with --fast_epoch.