Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Final multimodal PR with our recent developments on MM side #8127

Merged
merged 630 commits into from
Jan 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
630 commits
Select commit Hold shift + click to select a range
503301b
Hotfix (#7501) (#7568)
github-actions[bot] Oct 11, 2023
98e6ffe
Avoid duplicated checkpoint save (#7555) (#7566)
github-actions[bot] Oct 11, 2023
b6fecc5
Cache FP8 weight and transpose only at the first micro-batch in each …
github-actions[bot] Oct 11, 2023
292d232
Add an option to disable manual GC in validation (#7467) (#7476)
github-actions[bot] Oct 11, 2023
9c48ce1
Remove PUBLICATIONS.md, point to github.io NeMo page instead (#7694) …
github-actions[bot] Oct 11, 2023
762b5ca
Fix multi rank finetune for ASR (#7684) (#7699)
github-actions[bot] Oct 11, 2023
7755c17
Update docs: readme, getting started, ASR intro (#7679)
erastorgueva-nv Oct 11, 2023
5f35a8c
fix onnx (#7703) (#7704)
github-actions[bot] Oct 12, 2023
29910cd
move core install to /workspace (#7706)
aklife97 Oct 12, 2023
aa3a977
Fix typo in audio codec config, encoder target (#7697)
anteju Oct 12, 2023
eab0f54
Replace strategy='dp'/None with 'auto' (#7681) (#7696)
github-actions[bot] Oct 13, 2023
233e62b
[ASR] Multichannel mask estimator with flex number of channels (#7317)
anteju Oct 13, 2023
3cd9fbd
fix ptl_bugs in slu_models.py (#7689) (#7712)
github-actions[bot] Oct 13, 2023
ddf546d
fix code block typo (#7717)
erastorgueva-nv Oct 13, 2023
ff7154d
Update key mapping logic
Victor49152 Oct 16, 2023
f73180d
Merge branch 'main' into internal/main
yaoyu-33 Oct 16, 2023
0087ee3
Few merge fixes
yaoyu-33 Oct 16, 2023
8bdbd47
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 16, 2023
7be8108
Fix diff for non-mm models
yaoyu-33 Oct 16, 2023
aab3c40
Fix diff for non-mm models
yaoyu-33 Oct 16, 2023
38dc290
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 16, 2023
563cadb
Remove deployment and export scripts
yaoyu-33 Oct 16, 2023
9a566be
Improve the unet ckpt loading logic.
Victor49152 Oct 16, 2023
7a0ae36
Improve the unet ckpt loading logic.
Victor49152 Oct 16, 2023
576c652
Add checkpoint_averaging script
yaoyu-33 Oct 17, 2023
d6900f9
Hide multimodal code changes
yaoyu-33 Oct 17, 2023
3b1b802
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 17, 2023
526924d
Merge branch 'main' into multimodal_merge
ericharper Oct 19, 2023
a1f7296
Fix Eric's comments
yaoyu-33 Oct 23, 2023
41632c6
Revert "Hide multimodal code changes"
yaoyu-33 Oct 23, 2023
f40b56e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 23, 2023
c032a6d
Merge branch 'multimodal/merge_mm_code' into internal/main
yaoyu-33 Oct 24, 2023
ec8256b
Fix configs
yaoyu-33 Oct 24, 2023
5dad277
Fix neva model
yaoyu-33 Oct 24, 2023
c1c5981
Fix neva casting
yaoyu-33 Oct 24, 2023
b0c5320
Fix neva LoRA non MCore version
yaoyu-33 Oct 25, 2023
14cf3bd
Merge branch 'main' into multimodal_merge
ericharper Oct 25, 2023
4e178e3
Fix neva LoRA MCore
yaoyu-33 Oct 25, 2023
cacf9a8
[SD] group norm fixes
sjmikler Oct 25, 2023
2da64db
Fix neva cfg merge
yaoyu-33 Oct 26, 2023
fba2548
remove groupnorm dependency
suiyoubi Oct 27, 2023
a2da20d
Merge branch 'main' into multimodal_merge
ericharper Oct 30, 2023
41b1b51
Fix copyright headers
yaoyu-33 Oct 30, 2023
438617e
Merge branch 'aot/apex_gn' into 'internal/main'
Oct 30, 2023
7422dbe
LLaVA 1_5 and LORA update
Oct 30, 2023
de405b9
Merge branch 'yuya/llava_1_5_update' into 'internal/main'
Oct 30, 2023
5965a5f
Fix logs
yaoyu-33 Oct 30, 2023
26ee7dc
Fix neva mcore infernece
yaoyu-33 Oct 31, 2023
7356b1c
Fix ema
yaoyu-33 Oct 31, 2023
93e4f99
Fix ema
yaoyu-33 Oct 31, 2023
ca3d8f9
Address Somshubra comments
yaoyu-33 Nov 1, 2023
544e5ea
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 1, 2023
8493a8a
Fix NeVA
yaoyu-33 Nov 1, 2023
ea3d4fc
Remove llama tricks since we are padding the embedding weights direct…
yaoyu-33 Nov 1, 2023
2d5f5ab
Merge branch 'multimodal/merge' into multimodal/merge_mm_code
yaoyu-33 Nov 1, 2023
6f5df3f
Update Dockerfile and mm requirements
meatybobby Nov 1, 2023
65bcec3
Merge branch 'bobchen/nemo_toolkit' into 'internal/main'
Nov 1, 2023
4dff83f
Multimodal unit and jenkins tests
Nov 1, 2023
02cc05d
Merge branch 'mm_tests' into 'internal/main'
Nov 1, 2023
724c956
Add Multimodal Docs
Nov 1, 2023
4951f4f
Merge branch 'mm_docs' into 'internal/main'
Nov 1, 2023
6beaa50
update default conv_template
yaoyu-33 Nov 1, 2023
2f4e334
Merge branch 'internal/main' into multimodal/merge_mm_code
yaoyu-33 Nov 1, 2023
c083f0f
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 1, 2023
367723f
Merge branch 'main' into multimodal_merge
ericharper Nov 1, 2023
2840014
Fix neva evaluation
yaoyu-33 Nov 1, 2023
97d9bf9
Update Dockerfile
yaoyu-33 Nov 1, 2023
9173dc2
Merge branch 'internal/main' into multimodal/merge_mm_code
yaoyu-33 Nov 1, 2023
149cdde
Merge branch 'main' into multimodal_merge
ericharper Nov 2, 2023
6b84cef
Fix evaluation loading
yaoyu-33 Nov 2, 2023
ccd6cb5
Fix evaluation API
yaoyu-33 Nov 2, 2023
e0a74da
Merge branch 'internal/main' into multimodal/merge_mm_code
yaoyu-33 Nov 2, 2023
85bd797
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 2, 2023
9b4c9c2
Change quick-gelu to approx-gelu
yaoyu-33 Nov 2, 2023
e2ccc88
hide multimodal
yaoyu-33 Nov 2, 2023
1057139
Merge branch 'multimodal/merge' into multimodal/merge_mm_code
yaoyu-33 Nov 2, 2023
7ed6283
Revert "hide multimodal"
yaoyu-33 Nov 2, 2023
f6ef703
REstructure
yaoyu-33 Nov 2, 2023
9751d10
REstructure again
yaoyu-33 Nov 3, 2023
40105ae
Update neva evalution code
yaoyu-33 Nov 3, 2023
9ac6102
Update neva evalution code
yaoyu-33 Nov 3, 2023
d4fe16c
Merge branch 'internal/main_change_structure' into multimodal/merge_m…
yaoyu-33 Nov 3, 2023
488d7e9
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 3, 2023
3d5de20
Remove package requirement
meatybobby Nov 3, 2023
2f29c5d
Merge branch 'main' into multimodal/merge_mm_code
yaoyu-33 Nov 3, 2023
0e9c30c
Fix neva model after merging
yaoyu-33 Nov 3, 2023
f68ba2c
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 3, 2023
5df0c40
Restructure
yaoyu-33 Nov 6, 2023
b1555a6
Restructure, rename
yaoyu-33 Nov 6, 2023
71141c5
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 6, 2023
87b724e
Restructure
yaoyu-33 Nov 6, 2023
e9ba432
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 6, 2023
76df1d8
Merge branch 'main' into multimodal/merge_mm_code
yaoyu-33 Nov 6, 2023
82e7254
Merge branch 'bobchen/nemo_toolkit' into 'internal/main'
Nov 6, 2023
b49f12b
Remove package requirement
meatybobby Nov 3, 2023
d2c200c
hide docs and artifacts
yaoyu-33 Nov 6, 2023
8007765
Merge remote-tracking branch 'github/multimodal/merge_mm_code' into m…
yaoyu-33 Nov 6, 2023
72c683e
Rename Nerf
yaoyu-33 Nov 7, 2023
782316f
Hide Nerf and text to image
yaoyu-33 Nov 7, 2023
d24f74d
Merge branch 'main' into multimodal/merge_mm_code
ericharper Nov 10, 2023
28b083d
Add lora support to NeMo SD and Dreambooth
Victor49152 Nov 13, 2023
29b9b1d
Merge branch 'mingyuanm/lora' into 'internal/main'
Victor49152 Nov 13, 2023
20b2f57
Fix some mapping with inductor
Victor49152 Nov 14, 2023
cac6dc5
Mingyuanm/merge mlperf with main
Victor49152 Nov 14, 2023
a21f196
Merge branch 'mingyuanm/merge_mlperf_with_main' into 'internal/main'
Victor49152 Nov 14, 2023
ad54327
NeVA v1.5 Fixes Merge
Nov 15, 2023
67b0a7d
Merge branch 'yuya/neva_v1_5' into 'internal/main'
Nov 15, 2023
66d42be
Merge branch 'main' into multimodal/merge_mm_code
ericharper Nov 16, 2023
4b24a10
NeVA plain template fix
yaoyu-33 Nov 16, 2023
c8dd7e3
Update examples/multimodal/multimodal_llm/neva/convert_hf_llava_to_ne…
yaoyu-33 Nov 16, 2023
565e617
Update examples/multimodal/multimodal_llm/neva/convert_hf_llava_to_ne…
yaoyu-33 Nov 16, 2023
fda2cd0
Revert clip_grads.py from mlperf merge since it breaks models with TP
Victor49152 Nov 16, 2023
fd1ada8
Fix PR comments, clean comments, move to torch_dtype_from_precision
yaoyu-33 Nov 16, 2023
bccb0ea
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 16, 2023
d596f59
Update to torch_dtype_from_precision
yaoyu-33 Nov 16, 2023
ed9145c
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 16, 2023
57aa390
Revert "NeVA plain template fix"
yaoyu-33 Nov 17, 2023
ee4e398
Update gradio server and cli
yaoyu-33 Nov 17, 2023
448d9cb
KQVA adapter config were deleted by mistake
Victor49152 Nov 18, 2023
084ff69
Merge branch 'main' into multimodal/merge_mm_code
ericharper Nov 21, 2023
993f969
Merge branch 'main' into multimodal/merge_mm_code
ericharper Nov 27, 2023
2d3a6b7
Fix PR comments
yaoyu-33 Dec 4, 2023
00ef2b4
Fix copyright and docstrings
yaoyu-33 Dec 4, 2023
0ccf916
Update docstrings
yaoyu-33 Dec 4, 2023
3574590
Optimize imports
yaoyu-33 Dec 4, 2023
90d08a8
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 4, 2023
e14f713
Revert "Hide Nerf and text to image"
yaoyu-33 Dec 4, 2023
4d94fef
Add copyright information
yaoyu-33 Dec 4, 2023
50e8871
Optimize imports
yaoyu-33 Dec 4, 2023
2ce8e36
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 4, 2023
96b20b2
Merge branch 'multimodal/merge_mm_code' into multimodal/merge_mm_text…
yaoyu-33 Dec 4, 2023
2418283
Optimize imports
yaoyu-33 Dec 4, 2023
cd56c9e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 4, 2023
09c9904
update dreambooth_lora to support train text encoder with lora
Zhuoyao1012 Dec 4, 2023
9e19b07
Move the prefix stripping to default on all ckpt formats
Victor49152 Dec 5, 2023
94688e1
[SD] Add offline clip code-branch
Dec 5, 2023
f8f9901
Merge branch 'smikler/sd_restore_offline_clip' into 'internal/main'
Victor49152 Dec 5, 2023
a9fb5f9
[SD] pipeline and evaluation improevements
mwawrzos Dec 5, 2023
fe5efd4
Update clip_score calculation script
Victor49152 Dec 6, 2023
b420a03
Fix prefix stripping logic
Victor49152 Dec 8, 2023
a30db1c
Add Dino ViT Support in NeMo
Dec 11, 2023
2110f26
Merge branch 'yuya/add_dino_vit' into 'internal/main'
Dec 11, 2023
9e99e97
Rename inference field
yaoyu-33 Dec 12, 2023
78cac03
Fix setup tool issue
meatybobby Dec 12, 2023
63ad92a
Merge branch 'bobchen/fix_setup' into 'internal/main'
Dec 12, 2023
c9da4b2
Fix megatron partition change
yaoyu-33 Dec 13, 2023
63304d3
Merge branch 'main' into multimodal/merge_mm_text2img_nerf
yaoyu-33 Dec 13, 2023
a0b4861
Update multimodal docs and tests
yaoyu-33 Dec 13, 2023
660f657
Update multimodal jenkins
yaoyu-33 Dec 13, 2023
490cd1b
Merge branch 'main' into multimodal/merge_mm_text2img_nerf
yaoyu-33 Dec 13, 2023
a1ae609
Update unit test
yaoyu-33 Dec 13, 2023
4cf447c
Update docs
yaoyu-33 Dec 13, 2023
94ef0fb
Update docs
yaoyu-33 Dec 13, 2023
ed97b87
Fix peft config
yaoyu-33 Dec 14, 2023
7165ada
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 14, 2023
b677858
Merge branch 'main' into multimodal/merge_mm_text2img_nerf
ericharper Dec 15, 2023
5835c8a
Address comments
yaoyu-33 Dec 15, 2023
5ef5dc2
Merge branch 'multimodal/merge_mm_text2img_nerf' into multimodal/merg…
yaoyu-33 Dec 15, 2023
cf6a075
Bug fix due to restructure
yaoyu-33 Dec 15, 2023
eae4edc
Fix unit test
yaoyu-33 Dec 15, 2023
76d07fc
Bug fix due to restructure
yaoyu-33 Dec 15, 2023
21df321
remove color map detector
Victor49152 Dec 15, 2023
bdf1dc6
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 15, 2023
032a819
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 15, 2023
77f3b40
Merge remote-tracking branch 'github/multimodal/merge_mm_text2img_ner…
yaoyu-33 Dec 15, 2023
28102bc
Merge remote-tracking branch 'github/multimodal/merge_mm_docs_tests' …
yaoyu-33 Dec 15, 2023
a26caa2
Dreambooth loading fix
yaoyu-33 Dec 16, 2023
c9a2d6c
Fix Jenkinsfile
yaoyu-33 Dec 18, 2023
01d1862
copyright
yaoyu-33 Dec 18, 2023
d0b30ef
Fix jekins tests for sd/controlnet/dreambooth
Victor49152 Dec 18, 2023
7ed9491
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 18, 2023
a39bf94
Merge branch 'main' into multimodal/merge_mm_text2img_nerf
ericharper Dec 19, 2023
7165348
Merge branch 'multimodal/merge_mm_text2img_nerf' into multimodal/merg…
ericharper Jan 2, 2024
b389f1f
Merge branch 'internal/main' into multimodal/merge_mm_dev_update
yaoyu-33 Jan 4, 2024
ca5d91e
Multimodal merge bug fixes
yaoyu-33 Jan 4, 2024
1e31563
Update NeMo Multimodal Tutorials
chiachihchen Jan 4, 2024
dd28263
Merge branch 'chiachihchen/mm_doc' into 'internal/main'
Victor49152 Jan 4, 2024
849a696
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 4, 2024
2533943
Alit/neva strategy cleanup
JRD971000 Jan 4, 2024
608dddf
Merge branch 'alit/neva_strategy_cleanup' into 'internal/main'
Jan 4, 2024
c9fd5a1
Merge branch 'internal/main' into multimodal/merge_mm_dev_update
yaoyu-33 Jan 8, 2024
0550036
Merge remote-tracking branch 'github/multimodal/merge_mm_dev_update' …
yaoyu-33 Jan 8, 2024
1d36156
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 8, 2024
3887a76
neva api bug fix
yaoyu-33 Jan 9, 2024
1664e03
api change bug fixes
yaoyu-33 Jan 9, 2024
9f80ad9
update mcore api in Neva
yaoyu-33 Jan 9, 2024
4f125d4
Merge branch 'main' into multimodal/merge_mm_dev_update
yaoyu-33 Jan 9, 2024
084129c
Update Neva MCore api
yaoyu-33 Jan 9, 2024
5724c95
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 9, 2024
71978f1
Revert to do batch=1 during neva inference to avoid oom on large eval
yaoyu-33 Jan 10, 2024
a48e7bc
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 10, 2024
5688ea7
Merge branch 'main' into multimodal/merge_mm_docs_tests
yaoyu-33 Jan 11, 2024
8dc6835
Merge branch 'multimodal/merge_mm_docs_tests' into multimodal/merge_m…
yaoyu-33 Jan 11, 2024
f810c42
code scan clean
yaoyu-33 Jan 11, 2024
d4d44a9
Fix Jenkinsfile
yaoyu-33 Jan 11, 2024
866c6dc
Fix copyright
yaoyu-33 Jan 11, 2024
10bd961
Update requirements
yaoyu-33 Jan 11, 2024
4c3c9bc
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 11, 2024
7966fd1
Remove git versioning in requirements_test.txt
yaoyu-33 Jan 11, 2024
bd7705c
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 11, 2024
716f912
Update requirements
yaoyu-33 Jan 11, 2024
0ff66c2
Fix imports
yaoyu-33 Jan 11, 2024
fd3609e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 11, 2024
ced5795
Fix imports
yaoyu-33 Jan 11, 2024
1e3856f
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 11, 2024
37fa699
Fix imports
yaoyu-33 Jan 11, 2024
0a8ea16
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 12, 2024
0a39a55
Add a guard to warn users always to use spartial transformer
Victor49152 Jan 5, 2024
dc1f064
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 12, 2024
d08d844
Fix logging
yaoyu-33 Jan 12, 2024
5a84a5e
Fix jenkins
yaoyu-33 Jan 12, 2024
c22cc9f
Hide multimodal unit test
yaoyu-33 Jan 12, 2024
408d539
Remove flash attn requirement
yaoyu-33 Jan 12, 2024
e37c8c5
Hide vision test
yaoyu-33 Jan 12, 2024
04b234e
Fix requirements for multimodal
yaoyu-33 Jan 12, 2024
9025ffa
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 12, 2024
3232d42
Fix requirements for multimodal
yaoyu-33 Jan 12, 2024
b2adaa6
Merge branch 'multimodal/merge_mm_docs_tests' into multimodal/merge_m…
yaoyu-33 Jan 12, 2024
ba250bc
Fix readme
yaoyu-33 Jan 12, 2024
6761b59
copyright
yaoyu-33 Jan 12, 2024
d608a2f
Merge branch 'multimodal/merge_mm_docs_tests' into multimodal/merge_m…
yaoyu-33 Jan 12, 2024
a1d21bf
code clean
yaoyu-33 Jan 12, 2024
2c978ad
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 12, 2024
a63f38d
Mv Multimodal jenkins earlier
yaoyu-33 Jan 12, 2024
ec05686
Fix controlnet
yaoyu-33 Jan 12, 2024
89623cd
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 12, 2024
3140877
Remove some requirements
yaoyu-33 Jan 12, 2024
311cd49
Don't use flash attn in jenkins
yaoyu-33 Jan 12, 2024
65770a1
Turn off flash-attn
yaoyu-33 Jan 13, 2024
6493d83
Merge branch 'multimodal/merge_mm_docs_tests' into multimodal/merge_m…
yaoyu-33 Jan 13, 2024
9fc77b4
Hide MM jenkins tests
yaoyu-33 Jan 13, 2024
f0b5bce
Hide MM jenkins tests
yaoyu-33 Jan 13, 2024
bfaade8
replaced pymeshlab library with trimesh
ahmadki Jan 16, 2024
914c7ed
dropped pymeshlab from multimodal requirements file
ahmadki Jan 16, 2024
be4c0db
Merge branch 'multimodal/merge_mm_docs_tests' into multimodal/merge_m…
yaoyu-33 Jan 16, 2024
a3d0ac3
Update imageio requirements
yaoyu-33 Jan 16, 2024
93e7316
Merge branch 'multimodal/merge_mm_docs_tests' into multimodal/merge_m…
yaoyu-33 Jan 16, 2024
981efe2
path fix in playbook
yaoyu-33 Jan 16, 2024
88f9dda
Merge branch 'main' into multimodal/merge_mm_dev_update
yaoyu-33 Jan 17, 2024
83b2203
Merge branch 'main' into multimodal/merge_mm_dev_update
yaoyu-33 Jan 17, 2024
94cb663
Fix generation code
yaoyu-33 Jan 17, 2024
f8d909f
clean ups
yaoyu-33 Jan 19, 2024
fe77f9e
Merge branch 'main' into multimodal/merge_mm_dev_update
ericharper Jan 19, 2024
d6070a0
Merge branch 'main' into multimodal/merge_mm_dev_update
ericharper Jan 19, 2024
688b85d
Minor fixes for Neva inference
yaoyu-33 Jan 19, 2024
de3c35a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 19, 2024
2e8d324
Minor fixes neva template again
yaoyu-33 Jan 19, 2024
0a7a03a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 19, 2024
8918b33
Doc update
yaoyu-33 Jan 19, 2024
89ed211
Code scan fix
yaoyu-33 Jan 19, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/source/multimodal/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Model Classes
:members: __init__, configure_optimizers


.. autoclass:: nemo.collections.multimodal.models.stable_diffusion.ldm.ddpm.MegatronLatentDiffusion
.. autoclass:: nemo.collections.multimodal.models.text_to_image.stable_diffusion.ldm.ddpm.MegatronLatentDiffusion
:show-inheritance:
:no-members:
:members: __init__, training_step, validation_step, setup, build_train_valid_test_datasets
Expand Down Expand Up @@ -49,7 +49,7 @@ Modules
:show-inheritance:
:no-members:

.. autoclass:: nemo.collections.multimodal.models.stable_diffusion.ldm.autoencoder.AutoencoderKL
.. autoclass:: nemo.collections.multimodal.models.text_to_image.stable_diffusion.ldm.autoencoder.AutoencoderKL
:show-inheritance:
:no-members:
:members: __init__, encode, decode
Expand Down
2 changes: 1 addition & 1 deletion docs/source/multimodal/mllm/checkpoint.rst
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ Adjust model parallelism with:
--target_tensor_model_parallel_size=??? \
--pipeline_model_parallel_size=??? \
--target_pipeline_model_parallel_size=??? \
--model_class="nemo.collections.multimodal.models.neva.neva_model.MegatronNevaModel" \
--model_class="nemo.collections.multimodal.models.multimodal_llm.neva.neva_model.MegatronNevaModel" \
--precision=32 \
--tokenizer_model_path=/path/to/tokenizer.model \
--tp_conversion_only
2 changes: 1 addition & 1 deletion docs/source/multimodal/text2img/insp2p.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Model Introduction

InstructPix2Pix [InstructPix2Pix]_ :cite:`mm-models-insp2p` offers a unique approach to image editing using human-written instructions. Given an input image and a textual directive, the model adjusts the image according to the provided instructions. NeMo Multimodal presents a training pipeline for this conditional diffusion model, utilizing a dataset generated by harnessing the strengths of two prominent pretrained models: a language model (GPT-3) and a text-to-image model (Stable Diffusion). The InstructPix2Pix model operates swiftly, editing images within seconds, eliminating the need for per-example fine-tuning or inversion. It has demonstrated remarkable results across a wide variety of input images and written instructions.

Built upon the Stable Diffusion framework, NeMo's InstructPix2Pix shares a similar architecture with Stable Diffusion (refer to :doc:`Stable Diffusion <./sd>`). What sets it apart is its unique training dataset and the combined guidance from both image and text prompts. Specifically, InstructPix2pix ::class::``nemo.collections.multimodal.models.instruct_pix2pix.ldm.ddpm_edit.MegatronLatentDiffusionEdit`` is derived directly from Stable Diffusion's ::class::``nemo.collections.multimodal.models.stable_diffusion.ldm.ddpm.MegatronLatentDiffusion``, with alterations to accommodate the dataset and provide support for dual guidance.
Built upon the Stable Diffusion framework, NeMo's InstructPix2Pix shares a similar architecture with Stable Diffusion (refer to :doc:`Stable Diffusion <./sd>`). What sets it apart is its unique training dataset and the combined guidance from both image and text prompts. Specifically, InstructPix2pix ::class::``nemo.collections.multimodal.models.instruct_pix2pix.ldm.ddpm_edit.MegatronLatentDiffusionEdit`` is derived directly from Stable Diffusion's ::class::``nemo.collections.multimodal.models.text_to_image.stable_diffusion.ldm.ddpm.MegatronLatentDiffusion``, with alterations to accommodate the dataset and provide support for dual guidance.

Training Dataset
--------------------
Expand Down
2 changes: 1 addition & 1 deletion docs/source/multimodal/text2img/sd.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ The VAE configuration is defined under **first_stage_config**.
.. code-block:: yaml

first_stage_config:
_target_: nemo.collections.multimodal.models.stable_diffusion.ldm.autoencoder.AutoencoderKL
_target_: nemo.collections.multimodal.models.text_to_image.stable_diffusion.ldm.autoencoder.AutoencoderKL
from_pretrained: /path/to/vae.bin
embed_dim: 4
monitor: val/rec_loss
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ inference:
compute_logprob: False # a flag used to compute logprob of all the input text, a very special case of running inference, default False
end_strings: ["<extra_id_1>","<extra_id_7>",] # generation will stop when one of these tokens is generated
images_base_path: /pwd/images
insert_image_token: null # `left` or `right` or `null`

trainer:
devices: 8
Expand All @@ -24,7 +25,7 @@ tensor_model_parallel_size: 8
pipeline_model_parallel_size: 1
pipeline_model_parallel_split_rank: 0 # used for encoder and decoder model (0 for others)
neva_model_file: /pwd/nemo_experiments/nemo_llava.nemo #neva_22b_tp8_finetuned_v1.nemo neva_8b_tp4_finetuned_v1.nemo
llm_model_file: null
base_model_file: null
checkpoint_dir: null #/pwd/nemo_multimodal/nemo_experiments/nemo_llava_finetune/checkpoints # checkpoint file dir. This is used to load the PTL checkpoint generated during the Kosmos training
checkpoint_name: null #megatron_clip--val_loss=0.41-step=13499-consumed_samples=431904.0.ckpt # PTL checkpoint file name, only used for PTL checkpoint loading
hparams_file: null #/pwd/nemo_multimodal/nemo_experiments/nemo_llava_finetune/version_0/hparams.yaml # model configuration file, only used for PTL checkpoint loading
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -209,7 +209,7 @@ model:

optim:
name: fused_adam
lr: 2e-5
lr: 2e-4
weight_decay: 0.
betas:
- 0.9
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,8 @@
python convert_hf_llava_to_neva.py \
--in-file <path_to_hf_checkpoints_folder> \
--out-file <path_to_output_nemo_file> \
--tokenizer-model <path_to_sp_tokenizer_model>
--tokenizer-model <path_to_sp_tokenizer_model> \
--conv-template llama_2 # nvgpt, llama_2, v1 (vicuna)
"""

import os
Expand Down Expand Up @@ -49,6 +50,13 @@ def get_args():
"--in-file", type=str, default=None, required=True, help="Path to Huggingface LLaMA checkpoints",
)
parser.add_argument("--out-file", type=str, default=None, required=True, help="Path to output .nemo file.")
parser.add_argument(
"--conv-template",
type=str,
default="llama_2",
required=False,
help="Conversation template: nvgpt, llama_2, v1 (vicuna)",
)
parser.add_argument(
"--tokenizer-model", type=str, default=None, required=False, help="Path to sentencepiece tokenizer model."
)
Expand Down Expand Up @@ -121,6 +129,8 @@ def load_config(args, llava_config):
nemo_config.num_query_groups = llava_config['num_key_value_heads']
nemo_config.use_cpu_initialization = True
nemo_config.activation = 'fast-swiglu'
nemo_config.data.conv_template = args.conv_template
nemo_config.mm_cfg.model_type = args.conv_template
if args.tokenizer_model is None:
nemo_config.tokenizer.model = llava_config['tokenizer_model']
else:
Expand Down
41 changes: 41 additions & 0 deletions examples/multimodal/multimodal_llm/neva/eval/gradio_cli.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import base64

import requests

# URL of the Gradio server
url = 'http://localhost:8890/api/predict/'

# Prepare the text data
text_data = '<image>Describe this image please.'

# Prepare the image data
with open("/path/to/images/001.jpg", "rb") as image_file:
encoded_string = base64.b64encode(image_file.read()).decode()

# Data to send
data = {'data': [text_data, encoded_string]}

# Sending a POST request to the Gradio server
response = requests.post(url, json=data)

# Checking if the request was successful
if response.status_code == 200:
# Parsing the response
response_data = response.json()
print("Response from server:", response_data)
else:
print("Failed to get a response from the server, status code:", response.status_code)
108 changes: 108 additions & 0 deletions examples/multimodal/multimodal_llm/neva/eval/gradio_server.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import base64
import io

import gradio as gr
import PIL.Image
from omegaconf import OmegaConf

from nemo.collections.multimodal.parts.utils import create_neva_model_and_processor

CFG_STRING = """
trainer:
devices: 1
num_nodes: 1
accelerator: gpu
logger: False # logger provided by exp_manager
precision: bf16 # 16, 32, or bf16

inference:
greedy: False # Whether or not to use sampling ; use greedy decoding otherwise
top_k: 0 # The number of highest probability vocabulary tokens to keep for top-k-filtering.
top_p: 0.9 # If set to float < 1, only the most probable tokens with probabilities that add up to top_p or higher are kept for generation.
temperature: 0.2 # sampling temperature
add_BOS: False # add the bos token at the begining of the prompt
tokens_to_generate: 256 # The minimum length of the sequence to be generated.
all_probs: False # whether return the log prob for all the tokens in vocab
repetition_penalty: 1.2 # The parameter for repetition penalty. 1.0 means no penalty.
min_tokens_to_generate: 0 # The minimum length of the sequence to be generated.
compute_logprob: False # a flag used to compute logprob of all the input text, a very special case of running inference, default False
end_strings: ["<extra_id_1>","<extra_id_7>",] # generation will stop when one of these tokens is generated
images_base_path: /pwd/images
insert_image_token: null # `left` or `right` or `null`

cluster_type: BCP
tensor_model_parallel_size: 1
pipeline_model_parallel_size: 1
pipeline_model_parallel_split_rank: 0 # used for encoder and decoder model (0 for others)

neva_model_file: /pwd/nemo_experiments/nemo_llava.nemo #neva_22b_tp8_finetuned_v1.nemo neva_8b_tp4_finetuned_v1.nemo
base_model_file: null
checkpoint_dir: null #/pwd/nemo_multimodal/nemo_experiments/nemo_llava_finetune/checkpoints # checkpoint file dir. This is used to load the PTL checkpoint generated during the Kosmos training
checkpoint_name: null #megatron_clip--val_loss=0.41-step=13499-consumed_samples=431904.0.ckpt # PTL checkpoint file name, only used for PTL checkpoint loading
hparams_file: null #/pwd/nemo_multimodal/nemo_experiments/nemo_llava_finetune/version_0/hparams.yaml # model configuration file, only used for PTL checkpoint loading
"""

cfg = OmegaConf.create(CFG_STRING)
cfg.neva_model_file = "/path/to/llava-v1.5-7b.nemo"
model, image_processor = create_neva_model_and_processor(cfg)


def predict(prompt, image_base64=None):
input_data = {"prompt": prompt}
if image_base64 is not None:
image_data = base64.b64decode(image_base64)
# image = PIL.Image.fromarray(image)
image = PIL.Image.open(io.BytesIO(image_data))
input_data["image"] = image_processor(image)

length_params: LengthParam = {
"max_length": cfg.inference.tokens_to_generate,
"min_length": cfg.inference.min_tokens_to_generate,
}
sampling_params: SamplingParam = {
"use_greedy": cfg.inference.greedy,
"temperature": cfg.inference.temperature,
"top_k": cfg.inference.top_k,
"top_p": cfg.inference.top_p,
"repetition_penalty": cfg.inference.repetition_penalty,
"add_BOS": cfg.inference.add_BOS,
"all_probs": cfg.inference.all_probs,
"compute_logprob": cfg.inference.compute_logprob,
"end_strings": cfg.inference.end_strings,
}

# Generate model responses
responses = model.generate(
input_prompts=[input_data], # Adjust based on your model's requirements
length_params=length_params, # Define these parameters as in your original code
sampling_params=sampling_params, # Define these parameters as in your original code
inference_config=cfg,
)

return responses[0]["clean_response"]


iface = gr.Interface(
fn=predict,
inputs=[gr.Textbox(), gr.Textbox()],
outputs="text",
title="Multimodal Model Inference",
description="Enter a prompt and optionally upload an image for model inference.",
)

if __name__ == "__main__":
iface.launch(server_port=8890, share=False)
Loading
Loading