Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
239 commits
Select commit Hold shift + click to select a range
39289e3
resolve merge conflicts
ariG23498 May 13, 2025
a23a303
merging benchmark
ariG23498 May 13, 2025
259f146
resolving merge conflicts
ariG23498 May 20, 2025
f5a278b
Merge branch 'main' into kv-cache
ariG23498 May 22, 2025
11a9110
chore: implementing KV Cache
ariG23498 May 22, 2025
20f3a25
adding cache to generate
ariG23498 May 22, 2025
08e512a
position ids for kv-cache (#71)
kashif May 23, 2025
dc61269
Update models/vision_language_model.py
ariG23498 May 23, 2025
38e30be
Update models/vision_language_model.py
ariG23498 May 23, 2025
f40bfa5
remove duplicated benchmark and review suggestions
ariG23498 May 23, 2025
4690229
joao's review
ariG23498 May 23, 2025
d32f6cf
fix self.decoder() call
kashif May 23, 2025
87895e1
add back new line
kashif May 23, 2025
73027c8
fix call to generate
kashif May 23, 2025
d1014a9
Merge branch 'main' into kv-cache
kashif May 23, 2025
5ba4bec
fix script
kashif May 23, 2025
d3abf67
Update models/vision_language_model.py
kashif May 26, 2025
35ec3a4
tests
andimarafioti May 26, 2025
88a73ba
fix language model test
andimarafioti May 26, 2025
b57fca1
kashif's fix to make kv cache true/false produce same results
andimarafioti May 26, 2025
40c30cd
fix is_causal condition
kashif May 27, 2025
8c87101
add back the attention_mask in the generate
kashif May 27, 2025
1e1e95c
attention_mask=attention_mask,
kashif May 27, 2025
24e025d
set greedy=True
kashif May 27, 2025
862bc77
greedy=True
kashif May 27, 2025
a173568
Merge pull request #69 from huggingface/kv-cache
andimarafioti May 27, 2025
a3cce17
improve checkpointing
andimarafioti May 26, 2025
656164e
only set the folder for checkpointing
andimarafioti May 26, 2025
b8981da
make the checkpoint path a bit more robust
andimarafioti May 26, 2025
469afdc
we also need the full path to submit the model to HF
andimarafioti May 27, 2025
6e53742
added special tokens
lusxvr May 27, 2025
0846093
made changes backwards compatible
lusxvr May 27, 2025
70df1dc
adapted collator to handle image replacement tokens
lusxvr May 27, 2025
7a3ec7e
adapted VLM to handle image replacement tokens (MMStar and therefore …
lusxvr May 27, 2025
ad091cc
make kv cache default
ariG23498 May 28, 2025
9d2a556
check typo
ariG23498 May 28, 2025
8be6c6a
Merge pull request #84 from huggingface/kv-default
andimarafioti May 28, 2025
098db57
Merge pull request #79 from huggingface/improve_checkpointing
andimarafioti May 28, 2025
04e0fd4
adapted evaluation
lusxvr May 28, 2025
80d36a6
Merge branch 'main' into embd_combination
lusxvr May 28, 2025
4dd05c8
comparison run to main
lusxvr May 28, 2025
00ea46e
changed token/sec calculation
lusxvr May 28, 2025
41f6996
fixed forward loop
lusxvr May 28, 2025
ee6c6fa
simplified logic and improved generate
lusxvr May 28, 2025
c1d21e4
ablation runs
lusxvr May 29, 2025
f0bddcf
Add lmms-eval integration for enhanced model evaluation
Luodian May 30, 2025
12a6742
Enhance evaluation functionality with distributed support
Luodian May 30, 2025
6b4620a
Enhance evaluation result printing with table formatting
Luodian May 30, 2025
fe6d2a5
Update README.md to reorganize lmms-eval evaluation instructions
Luodian May 30, 2025
198d702
Add lmms-eval integration for comprehensive multi-modal evaluation
Luodian May 30, 2025
b895c70
Update .gitignore and remove .vscode/settings.json
Luodian May 30, 2025
807da1e
fixed grad norm log when using grad accum
lusxvr May 30, 2025
1789ae3
test run
lusxvr May 30, 2025
52f99f0
back to old config
lusxvr May 30, 2025
3813a47
cleaned logging
lusxvr May 30, 2025
5079e4e
tried to fix generate (still not working)
lusxvr Jun 2, 2025
a507702
trained 450M model with new embeddings
lusxvr Jun 2, 2025
527856f
changed tokenizer back to cosmo
lusxvr Jun 3, 2025
6360627
fixed typo
lusxvr Jun 3, 2025
b8b19a5
changed default model in generate
lusxvr Jun 3, 2025
bd6e509
cleaned config
lusxvr Jun 3, 2025
6e92046
more comprehensive run dating and max grad norm
lusxvr Jun 3, 2025
9f9b028
cleaned and incorporated suggestions
lusxvr Jun 3, 2025
614d913
post-processed generate
lusxvr Jun 3, 2025
524ba29
cleaned branch for merge and checked compatibility
lusxvr Jun 3, 2025
29dd1c5
cleaned naming
lusxvr Jun 3, 2025
29706d9
fixed lr scheduler
lusxvr Jun 3, 2025
072441b
updated config and README
lusxvr Jun 4, 2025
731d8c6
Merge pull request #82 from huggingface/embd_combination
lusxvr Jun 4, 2025
9d06a67
changed gitignore
lusxvr Jun 4, 2025
7c3d920
Merge pull request #91 from Luodian/dev/add_lmms_eval
lusxvr Jun 4, 2025
b79cc46
Add comments and type hints to language_model.py
HilariusJeremy Jun 4, 2025
51401d3
handled forward and generate if no image is provided
lusxvr Jun 4, 2025
93aade2
adapted test and wrapper to new embedding handling
lusxvr Jun 4, 2025
2c83d62
modeled generate_until after Qwen
lusxvr Jun 4, 2025
2aa62bf
cleaned likelihood method and changed defaults
lusxvr Jun 4, 2025
ea4dc00
cleaned structure
lusxvr Jun 4, 2025
3c1e3cb
removed requirements.txt
lusxvr Jun 4, 2025
351ed4b
Updated readme
lusxvr Jun 4, 2025
91a3b93
update vram measurement
andimarafioti Jun 6, 2025
e795da0
Merge pull request #104 from huggingface/update-vram
andimarafioti Jun 6, 2025
a4acd1c
fixed training hooks and cleaned
lusxvr Jun 6, 2025
bde06c3
cleaned
lusxvr Jun 6, 2025
ec0d7bb
cleaned readme
lusxvr Jun 6, 2025
4578b63
Merge branch 'main' into add_lmms_eval
lusxvr Jun 6, 2025
3010767
changed flattening to taking only the first image
lusxvr Jun 6, 2025
305a8dd
fixed output path and batching
lusxvr Jun 7, 2025
18b91b2
Starting to work on multiple texts/images
andimarafioti Jun 8, 2025
af20447
fixes
andimarafioti Jun 8, 2025
bb0841a
save on best eval loss
andimarafioti Jun 9, 2025
58fd112
fix casting
andimarafioti Jun 9, 2025
e098c49
Merge pull request #101 from HilariusJeremy/my-feature-branch
lusxvr Jun 9, 2025
b154d11
used proper eval script from lmms-eval
lusxvr Jun 9, 2025
191c585
updated readme
lusxvr Jun 9, 2025
3b73810
cleaned
lusxvr Jun 9, 2025
fe1ad40
cleaned
lusxvr Jun 9, 2025
f00745b
cleaned
lusxvr Jun 9, 2025
1d3392b
improvements. made collators more stable and simpler. Added chat temp…
andimarafioti Jun 10, 2025
a53353e
remove packing since I don't want the PR to explode
andimarafioti Jun 10, 2025
d4faf87
save a few lines
geronimi73 Jun 11, 2025
5b2c99c
Merge pull request #108 from geronimi73/tiny
andimarafioti Jun 11, 2025
db3547b
review comments
andimarafioti Jun 11, 2025
459cf4b
adapted the notebook
andimarafioti Jun 11, 2025
d34c295
Merge pull request #105 from huggingface/chat_template
andimarafioti Jun 11, 2025
709f7e6
Update nanoVLM.ipynb
leopardracer Jun 11, 2025
edca317
Update train.py
leopardracer Jun 11, 2025
6b893f0
Merge pull request #109 from leopardracer/main
lusxvr Jun 11, 2025
c3fa97c
Merge branch 'main' into add_lmms_eval
lusxvr Jun 11, 2025
75622ab
mmstar works
lusxvr Jun 11, 2025
96ecc6c
Update train.py
kilavvy Jun 12, 2025
f9e1bee
Merge pull request #110 from kilavvy/main
andimarafioti Jun 13, 2025
1aea4e4
Update nanoVLM.ipynb
vtjl10 Jun 13, 2025
13e598d
Merge pull request #111 from vtjl10/main
lusxvr Jun 14, 2025
0f7c0fa
Update nanoVLM.ipynb
zeevick10 Jun 15, 2025
598b400
Merge pull request #114 from zeevick10/main
lusxvr Jun 15, 2025
4c4c5cd
Add lm_chat_template in VLMConfig
jiadingfang Jun 15, 2025
ec0d01f
Merge pull request #113 from jiadingfang/main
lusxvr Jun 16, 2025
de8a28c
fixed multi-image handling
lusxvr Jun 11, 2025
ab31e18
added automatic eval during training
lusxvr Jun 12, 2025
78f3b49
Merge pull request #100 from huggingface/add_lmms_eval
andimarafioti Jun 17, 2025
abb441b
Fix tokenization
srai9 Jun 18, 2025
49d7ae3
Merge pull request #116 from srai9/fix-tokenization
andimarafioti Jun 18, 2025
f724472
packing is in a working state, but im in the middle of a big refactor
andimarafioti Jun 13, 2025
1c6c38d
This isn't working anymore and I don't know why
andimarafioti Jun 13, 2025
766e04a
the sneakiest of bugs
andimarafioti Jun 14, 2025
e3043f4
nice state, making collator even nicer with labels
andimarafioti Jun 15, 2025
883f9ae
small refactor to make ConstantLengthDataset keep on packing data whi…
andimarafioti Jun 15, 2025
4ee74bf
fix bug that would make data processing consume too much CPU, making …
andimarafioti Jun 16, 2025
08d816b
shard dataset if DDP and time training better
andimarafioti Jun 16, 2025
71dfda0
move constantlengthdataset to "advanced dataset" to avoid confusing b…
andimarafioti Jun 16, 2025
27f460e
linting
andimarafioti Jun 16, 2025
f4e7bc4
shard data manually and handle outliers across ranks, training works.
andimarafioti Jun 16, 2025
56676aa
nouamane's review
andimarafioti Jun 20, 2025
f1b389e
Update README.md
XuecWu Jun 23, 2025
d308931
added a way to set a max number of images per knapsack, and to set a …
andimarafioti Jun 23, 2025
0a5321e
Merge pull request #123 from XuecWu/main
andimarafioti Jun 23, 2025
0cb4b15
ruff format for luis
andimarafioti Jun 23, 2025
512919d
new config adding some params
andimarafioti Jun 23, 2025
7573e00
luis shuffling magic
andimarafioti Jun 23, 2025
efbebcc
seed the randomness
andimarafioti Jun 23, 2025
5568d18
better knapsacks and better logging
andimarafioti Jun 24, 2025
e258fd5
even longer buffers for the knapsack
andimarafioti Jun 24, 2025
399b8f6
move to step training and improved logging
andimarafioti Jun 24, 2025
3ebf4fa
Merge pull request #115 from huggingface/packing
andimarafioti Jun 24, 2025
1264844
cleaned eval and logging
lusxvr Jun 24, 2025
4913f05
fixed steps
lusxvr Jun 24, 2025
f477e77
removed double lmms indication in logging
lusxvr Jun 24, 2025
d20fa1d
fixed eval_interval
lusxvr Jun 24, 2025
0b96466
cleaned interval calc
lusxvr Jun 24, 2025
1b66b5a
fixed lmms-eval interval
lusxvr Jun 24, 2025
04f2c0e
Merge pull request #125 from huggingface/rmv_manual_eval
lusxvr Jun 25, 2025
b0d6ecf
fix measure_vram, edit parameters of VQADataset and VQACollator
Sucran Jun 21, 2025
290206d
fixed lmms-eval arg passing
lusxvr Jun 25, 2025
552d88f
Merge pull request #128 from huggingface/fix_eval_args
lusxvr Jun 25, 2025
be856b3
Make lmms_eval, accelerate dependency conditional on use_lmms_eval
cdboer Jun 26, 2025
45be422
Merge pull request #129 from cdboer/main
andimarafioti Jun 27, 2025
b016baa
Create LICENSE
andimarafioti Jun 30, 2025
018862f
Merge pull request #127 from Sucran/fix-measure-vram
lusxvr Jun 30, 2025
1e43ffa
Add copyright owner
andimarafioti Jun 30, 2025
1ca9f12
Update LICENSE
andimarafioti Jun 30, 2025
d9cd83a
Update LICENSE
andimarafioti Jun 30, 2025
8c87d89
Merge pull request #133 from huggingface/andimarafioti-patch-1
andimarafioti Jun 30, 2025
1c3b44d
fix notebook
Jun 30, 2025
06ed87e
Remove lm_eos_token_id from config.py
githubshaurya Jul 3, 2025
b438e5b
Merge pull request #138 from githubshaurya/fix__config
andimarafioti Jul 4, 2025
c73dac0
Merge pull request #135 from tonycao/notebook-fix
andimarafioti Jul 4, 2025
9137a47
train only MP, plus don't eval before starting to train, plus avoid b…
andimarafioti Jul 7, 2025
5d21cfc
Merge pull request #142 from huggingface/only_mp_train
andimarafioti Jul 9, 2025
5f5ee6c
pass no_log_wandb to the evaluation
andimarafioti Jul 11, 2025
8374ce1
Merge pull request #147 from huggingface/no_log_wandb
andimarafioti Jul 11, 2025
6d36f5d
load all configs for a dataset
andimarafioti Jul 11, 2025
8c14ccb
change config
andimarafioti Jul 11, 2025
371865f
Merge pull request #149 from huggingface/load_all_configs
andimarafioti Jul 11, 2025
1bdd691
image splitting :)
andimarafioti Jul 11, 2025
b034f6a
fixed bug with multi-image eval
andimarafioti Jul 11, 2025
d9c6f69
Merge pull request #151 from huggingface/image_splitting
andimarafioti Jul 11, 2025
ee4718c
Update VLMConfig to match SmolLM2-360M-Instruct model configuration
timbolotiukh Jul 16, 2025
4307717
1.modify get_image_processor function parameters
Jul 17, 2025
dfc69e1
Merge pull request #154 from timbolotiukh/fix-llm-config-for-360M
andimarafioti Jul 21, 2025
6bc2507
Merge pull request #157 from eecn/main
andimarafioti Jul 21, 2025
82838c6
Add required argument `splitted_image_size` for `get_image_processor`
timbolotiukh Jul 21, 2025
b47d0f8
Merge pull request #162 from timbolotiukh/fix/add-arg-to-image-processor
lusxvr Jul 23, 2025
7088b81
save state
andimarafioti Jul 26, 2025
2d99d8a
fix synchronizer for corner case where there are no images on any of …
andimarafioti Jul 29, 2025
f4907d4
if all samples are too long, we were returning a simple list, which m…
andimarafioti Jul 29, 2025
9074c2d
remove image tokens from text in datasets and don't process images if…
andimarafioti Jul 29, 2025
17db234
multi-node training
andimarafioti Aug 4, 2025
1c9b002
minimize queue get/put times by passing batches
andimarafioti Aug 4, 2025
e3978c9
process samples without images
andimarafioti Aug 4, 2025
ab80978
config multi-node
andimarafioti Aug 4, 2025
e8087c0
load models with split safetensors
andimarafioti Aug 4, 2025
7fd2edd
eval.slurm
andimarafioti Aug 5, 2025
800b182
run evaluation file
andimarafioti Aug 5, 2025
9e1f2a0
slurm script
andimarafioti Aug 12, 2025
d945ef6
improvements to the training and filtering per rating
andimarafioti Aug 12, 2025
9e9dc33
my current config
andimarafioti Aug 12, 2025
772d07c
add resize to max side, global img token, and cache
andimarafioti Aug 15, 2025
fc4df06
adapt global img token and resize max side
andimarafioti Aug 15, 2025
9c07609
ignored eval_results
lusxvr Aug 15, 2025
a1e4997
updated slurm scripts
lusxvr Aug 19, 2025
525760e
fixed eval
lusxvr Aug 19, 2025
54d4584
fixed train
lusxvr Aug 19, 2025
008b032
comprehensive score filtering
lusxvr Aug 20, 2025
d16de85
run config for internal ablations
lusxvr Aug 21, 2025
5e36004
experiment plotting
lusxvr Aug 21, 2025
42eefaf
reset train rating
lusxvr Aug 22, 2025
60f1d5c
new plots
lusxvr Aug 23, 2025
cd830ae
stage 1
lusxvr Aug 23, 2025
7a59c37
new plots
lusxvr Aug 24, 2025
0a8b8ee
stage 2
lusxvr Aug 24, 2025
1ae42f6
stage 2.5
lusxvr Aug 24, 2025
df24690
new experiments and ranking of run metrics
lusxvr Aug 25, 2025
681682d
removed chinese data
lusxvr Aug 25, 2025
58b8075
full run config
lusxvr Aug 25, 2025
791eaaa
benchmark prompts
lusxvr Aug 25, 2025
82ed932
new plots
lusxvr Aug 28, 2025
bc5c2e8
plots with stdrr
lusxvr Aug 28, 2025
384969a
current config
lusxvr Aug 28, 2025
e4d7a93
fixed generate
lusxvr Sep 1, 2025
c16fd94
updated plots
lusxvr Sep 3, 2025
6e04cb8
full stage 1 config
lusxvr Sep 8, 2025
4bfa66c
new plots
lusxvr Sep 8, 2025
45f8c0f
final experiments
lusxvr Sep 9, 2025
e45de9d
updated generate
lusxvr Sep 9, 2025
bb6f5b3
cleaned
lusxvr Sep 9, 2025
ef400cf
added slurm scripts
lusxvr Sep 9, 2025
856a420
train config
lusxvr Sep 9, 2025
9000974
single stage
lusxvr Sep 9, 2025
d33ebb9
updated readme
lusxvr Sep 9, 2025
9c560d2
Merge pull request #170 from huggingface/image_splitting
lusxvr Sep 9, 2025
39503e5
cleaned
lusxvr Sep 10, 2025
9de5e17
updated notebook
lusxvr Sep 10, 2025
3b0226f
Add ONNX and ExecuTorch export support
infil00p Oct 4, 2025
a73e635
Adding CLAUDE.md
infil00p Oct 9, 2025
85f89c5
Add C++ ExecuTorch inference with multi-image support (proof of concept)
infil00p Oct 12, 2025
93f5e3b
Fix tokenization: add grid position tokens to Rust tokenizer
infil00p Oct 12, 2025
f82bd09
Fix image token replacement bug in C++ ExecuTorch inference
infil00p Oct 14, 2025
84b4afb
Add decode loop investigation and debug instrumentation
infil00p Oct 15, 2025
54294e7
Fix KV cache reference invalidation bug in C++ inference
infil00p Oct 15, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 27 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ wheels/
wandb/
logs/
*.csv
.vscode

.python-version
pyproject.toml
Expand All @@ -23,4 +24,29 @@ uv.lock

checkpoints/
notebooks/
*.slurm
results/
eval_results*/
*.arrow
/shards/*
*.png
plots*/
# ExecuTorch exported models
executorch_models*/
onnx_export/onnx_models/

# Build artifacts
cpp-inference/build/
rust-preprocessor/target/

# Test binaries
test_rust_splitting

# Debug/test artifacts
*.npy
*.log
python_test_output.log
cpp_test_output.log
python_preprocessed_*.npy
cpp_combined_embeddings.npy
python_prefill_combined_embeddings.npy
python_decode_input_tokens.npy
Loading