Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel

Zun Wang, Jialu Li, Yicong Hong, Songze Li, Kunchang Li, Shoubin Yu, Yi Wang, Yu Qiao, Yali Wang, Mohit Bansal, Limin Wang

Data Release and Reproducibility Note

Please note that the original full dataset used in our final experiments is no longer recoverable due to an unexpectedly early and complete deletion of the first author's institutional account and personal storage. This deletion occurred after the author's departure (which was prior to paper submission) and before the paper was accepted, also is accidental during internal system maintenance.

As a result, we are only able to release an intermediate version of the dataset saved during the development phase. Although this version may yield slightly lower performance for training the best model (~85% SR and ~78% SPL on R2R val unseen, ~1% performance drop) compared to the results reported in the paper, it still significantly outperforms strong baselines such as ScaleVLN (81% SR and 70% SPL). We also released our final pretrained model.

Available Data:

mantis.hm3d_round0_topk.3_enc.json – Generated instructions via sampling on HM3D in the first round.
mantis.hm3d_round3_greedy_ndtw0.9_ranked_414k_rouge0.85.json – A subset of generated instructions via greedy decoding from the final-round generator (25.7 SPICE).

Unfortunately, the generated instructions on MP3D and the final refined dataset are no longer available.

We sincerely appreciate your understanding and are happy to address any questions regarding reproducibility.

Installation

Please follow ScaleVLN to set up the environment and training source code.

Model and Data

We release our final pretrained model and available data here. Details:

Model:

model_step_170000.pt – The final pretrained model for downstream finetuning.

Data:

mantis.hm3d_round0_topk.3_enc.json – Generated instructions via sampling on HM3D in the first round.
mantis.hm3d_round3_greedy_ndtw0.9_ranked_414k_rouge0.85.json – A subset of generated instructions via greedy decoding from the final-round generator (25.7 SPICE).

Features:

internvit_6b_fov60_mp3d.hdf5 – InternViT features on MP3D environments.
internvit_6b_fov60_mp3d_panogen.hdf5 – InternViT features on Panogen-augmented MP3D environments.
scans_internvit_6b/ – Contains InternViT features for all HM3D + MP3D environments.

Training

Our training process follows ScaleVLN with minimal modifications.

Pretraining:
Update the pretraining config file to use:

InternViT features from features/scans_internvit_6b/
Our sampling-generated instructions: data/mantis.hm3d_round0_topk.3_enc.json

Empirically, training for 1–2 epochs is sufficient.

(Note: You may need to transfer data/mantis.hm3d_round0_topk.3_enc.json to a jsonl file where each item is aligned with ScaleVLN’s pretraining input format. (like R2R/annotations/pretrain_map/R2R_hm3d_aug_envdrop_generated_enc.jsonl in ScaleVLN's processed data))

Finetuning:
To finetune on downstream tasks, modify:

The augmented environment path to features/internvit_6b_fov60_mp3d_panogen.hdf5
args.features to take in features/internvit_6b_fov60_mp3d.hdf5
The training script to:
- Set feature dim to 3200
- Use our pretrained checkpoint: model/model_step_170000.pt or your own pretrained checkpoint
- Use our augmented data (Only for R2R finetuning): data/mantis.hm3d_round3_greedy_ndtw0.9_ranked_414k_rouge0.85.json

Citation

If you find our project useful in your research, please cite the following paper:

@article{zun2024srdf,
    author = { Wang, Zun and  Li, Jialu and Hong, Yicong and Li, Songze and Li, Kunchang and Yu, Shoubin and Wang, Yi and Qiao, Yu and Wang, Yali and Bansal, Mohit and Wang, Limin},
    title  = {Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel},
	journal   = {arxiv},
	year      = {2024},
	url       = {https://arxiv.org/abs/2412.08467}
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
files		files
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel

Zun Wang, Jialu Li, Yicong Hong, Songze Li, Kunchang Li, Shoubin Yu, Yi Wang, Yu Qiao, Yali Wang, Mohit Bansal, Limin Wang

Data Release and Reproducibility Note

Installation

Model and Data

Training

Citation

About

Uh oh!

Releases

Packages

wz0919/VLN-SRDF

Folders and files

Latest commit

History

Repository files navigation

Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel

Zun Wang, Jialu Li, Yicong Hong, Songze Li, Kunchang Li, Shoubin Yu, Yi Wang, Yu Qiao, Yali Wang, Mohit Bansal, Limin Wang

Data Release and Reproducibility Note

Installation

Model and Data

Training

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages