Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
3d90a99
accept custom device_mesh
NouamaneTazi Apr 29, 2025
df1eaee
fix device_map
NouamaneTazi Apr 30, 2025
b929886
assert that num_heads % tp_size == 0
NouamaneTazi Apr 30, 2025
1df751b
todo.
NouamaneTazi Apr 30, 2025
5887ffc
ReplicateParallel
NouamaneTazi Apr 30, 2025
924ccee
handle tied weights
NouamaneTazi Apr 30, 2025
cfacec5
handle dtensor in save_pretrained with safe_serialization
NouamaneTazi Apr 30, 2025
9833305
tp test works
NouamaneTazi Apr 30, 2025
7d7b363
doesnt work
NouamaneTazi Apr 30, 2025
11f02a5
fix shard_and_distribute_module's rank should be local_rank
NouamaneTazi May 1, 2025
317c027
tp=4 is correct
NouamaneTazi May 2, 2025
f3b4ae8
dp+tp is broken
NouamaneTazi May 2, 2025
f6a49ee
todo allreduce with dtensors on another dim is annoying
NouamaneTazi May 2, 2025
eaa6592
workaround to sync dp grads when using dtensors
NouamaneTazi May 2, 2025
7c6219b
loading a checkpoint works
NouamaneTazi May 2, 2025
6ceabe0
wandb and compare losses with different tp/dp
NouamaneTazi May 2, 2025
a9a1592
cleaning
NouamaneTazi May 2, 2025
4e323a5
cleaning
NouamaneTazi May 2, 2025
7f327b1
.
NouamaneTazi May 2, 2025
c3e5c5e
.
NouamaneTazi May 2, 2025
810bd51
logs
NouamaneTazi May 3, 2025
8234873
CP2 DP2 no mask works after commenting attn_mask and is_causal from s…
NouamaneTazi May 3, 2025
29c2a9c
DP=2 TP=2 now works even with tied embeddings
NouamaneTazi May 4, 2025
8fa760b
model.parameters() and model.module.parameters() are empty..
NouamaneTazi May 4, 2025
610e6bb
reformat sanity_check_tensor_sync
NouamaneTazi May 4, 2025
75cad51
set atol=1e-4 for CP to pass
NouamaneTazi May 4, 2025
b816a3c
try populate _parameters from named_modules
NouamaneTazi May 4, 2025
688107c
refactors
NouamaneTazi May 5, 2025
cfe688b
is_causal=True and pack sequences, no attn mask, and preshuffle dataset
NouamaneTazi May 5, 2025
8309521
fix packing
NouamaneTazi May 5, 2025
c0f616e
CP=4 doesn't work
NouamaneTazi May 5, 2025
011d981
fix labels and position_ids for CP
NouamaneTazi May 5, 2025
265f90d
DP CP works with transformers 🥳🥳🥳
NouamaneTazi May 5, 2025
afa72e2
refactor
ArthurZucker May 15, 2025
7517679
add example cp
ArthurZucker May 15, 2025
835726d
fixup
ArthurZucker May 15, 2025
0ad2a15
revert sdpa changes
ArthurZucker May 15, 2025
5b11964
example cleared
ArthurZucker May 15, 2025
7855d10
add CP, DP to the mesh init
ArthurZucker May 15, 2025
0b2bd15
nit
ArthurZucker May 15, 2025
c82d39c
clean
NouamaneTazi May 15, 2025
957c351
use `ALL_PARALLEL_STYLES`
ArthurZucker May 15, 2025
6d462e9
Merge branch 'nouamane/nanotron' of github.com:huggingface/transforme…
ArthurZucker May 15, 2025
43c175d
style
ArthurZucker May 15, 2025
378b2e7
FSDP works
NouamaneTazi May 15, 2025
30752c6
log on 1 rank
NouamaneTazi May 15, 2025
9c1e1fc
.
NouamaneTazi May 15, 2025
3f683b6
fix?
ArthurZucker May 15, 2025
d36acce
Merge branch 'nouamane/nanotron' of github.com:huggingface/transforme…
ArthurZucker May 15, 2025
780d74d
FSDP1 also has .parameters() bug
NouamaneTazi May 15, 2025
9e54969
reported gradnorm when using FSDP1 is wrong, but loss is correct so i…
NouamaneTazi May 15, 2025
ba01287
.
NouamaneTazi May 15, 2025
677ce53
style and fixup
ArthurZucker May 20, 2025
81c21de
move stuff around
ArthurZucker May 20, 2025
656277c
Merge branch 'main' of github.com:huggingface/transformers into nouam…
ArthurZucker May 20, 2025
e27ddb8
fix tests
ArthurZucker May 20, 2025
d702d94
style
ArthurZucker May 20, 2025
5083c0b
let's make it a check
ArthurZucker May 20, 2025
67a8182
warning should be an info
ArthurZucker May 20, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading