Skip to content

Official implementation of: Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel

Notifications You must be signed in to change notification settings

wz0919/VLN-SRDF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel

arXiv



Data Release and Reproducibility Note

Please note that the original full dataset used in our final experiments is no longer recoverable due to an unexpectedly early and complete deletion of the first author's institutional account and personal storage. This deletion occurred after the author's departure (which was prior to paper submission) and before the paper was accepted, also is accidental during internal system maintenance.

As a result, we are only able to release an intermediate version of the dataset saved during the development phase. Although this version may yield slightly lower performance for training the best model (~85% SR and ~78% SPL on R2R val unseen, ~1% performance drop) compared to the results reported in the paper, it still significantly outperforms strong baselines such as ScaleVLN (81% SR and 70% SPL). We also released our final pretrained model.

Available Data:

  1. mantis.hm3d_round0_topk.3_enc.json – Generated instructions via sampling on HM3D in the first round.
  2. mantis.hm3d_round3_greedy_ndtw0.9_ranked_414k_rouge0.85.json – A subset of generated instructions via greedy decoding from the final-round generator (25.7 SPICE).

Unfortunately, the generated instructions on MP3D and the final refined dataset are no longer available.

We sincerely appreciate your understanding and are happy to address any questions regarding reproducibility.

Installation

Please follow ScaleVLN to set up the environment and training source code.

Model and Data

We release our final pretrained model and available data here. Details:

Model:

  1. model_step_170000.pt – The final pretrained model for downstream finetuning.

Data:

  1. mantis.hm3d_round0_topk.3_enc.json – Generated instructions via sampling on HM3D in the first round.
  2. mantis.hm3d_round3_greedy_ndtw0.9_ranked_414k_rouge0.85.json – A subset of generated instructions via greedy decoding from the final-round generator (25.7 SPICE).

Features:

  1. internvit_6b_fov60_mp3d.hdf5 – InternViT features on MP3D environments.
  2. internvit_6b_fov60_mp3d_panogen.hdf5 – InternViT features on Panogen-augmented MP3D environments.
  3. scans_internvit_6b/ – Contains InternViT features for all HM3D + MP3D environments.

Training

Our training process follows ScaleVLN with minimal modifications.

Pretraining:
Update the pretraining config file to use:

  • InternViT features from features/scans_internvit_6b/
  • Our sampling-generated instructions: data/mantis.hm3d_round0_topk.3_enc.json

Empirically, training for 1–2 epochs is sufficient.

(Note: You may need to transfer data/mantis.hm3d_round0_topk.3_enc.json to a jsonl file where each item is aligned with ScaleVLN’s pretraining input format. (like R2R/annotations/pretrain_map/R2R_hm3d_aug_envdrop_generated_enc.jsonl in ScaleVLN's processed data))

Finetuning:
To finetune on downstream tasks, modify:

  • The augmented environment path to features/internvit_6b_fov60_mp3d_panogen.hdf5
  • args.features to take in features/internvit_6b_fov60_mp3d.hdf5
  • The training script to:
    • Set feature dim to 3200
    • Use our pretrained checkpoint: model/model_step_170000.pt or your own pretrained checkpoint
    • Use our augmented data (Only for R2R finetuning): data/mantis.hm3d_round3_greedy_ndtw0.9_ranked_414k_rouge0.85.json

Citation

If you find our project useful in your research, please cite the following paper:

@article{zun2024srdf,
    author = { Wang, Zun and  Li, Jialu and Hong, Yicong and Li, Songze and Li, Kunchang and Yu, Shoubin and Wang, Yi and Qiao, Yu and Wang, Yali and Bansal, Mohit and Wang, Limin},
    title  = {Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel},
	journal   = {arxiv},
	year      = {2024},
	url       = {https://arxiv.org/abs/2412.08467}
}

About

Official implementation of: Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published