Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
823e516
Cleanup some code
thomasw21 Jul 14, 2022
69ddfe5
WIP
thomasw21 Jul 26, 2022
a5d5f1a
I think the alibi trick doesn't work as well due to the masking strategy
thomasw21 Jul 26, 2022
31a4d0b
Fix device
thomasw21 Jul 26, 2022
8ad9085
Remove all prints
thomasw21 Jul 26, 2022
a8ee8f3
Make style
thomasw21 Jul 26, 2022
b79094e
Remove weird defaults
thomasw21 Jul 26, 2022
2a03d25
Test shouldn't be that flaky, let's test them
thomasw21 Jul 26, 2022
6712c2b
Bad defaults
thomasw21 Jul 26, 2022
fce342d
Checkpoint doesn't handle named arguments
thomasw21 Jul 26, 2022
b6a5607
WIP
thomasw21 Jul 26, 2022
33c430b
Cast then scaling then softmax then cast
thomasw21 Jul 26, 2022
eb52871
Turns out scaling before or after changes the results
thomasw21 Jul 26, 2022
1d5573a
Turns out max-collpase is a real thing
thomasw21 Jul 27, 2022
0feba3a
Make style
thomasw21 Jul 27, 2022
dcb1eeb
Remove unecessary files + add test for word_embeddings_in_fp32
thomasw21 Jul 27, 2022
286ca9c
Remove unecessary files
thomasw21 Jul 27, 2022
6126d30
Turns out dtype does nothing
thomasw21 Jul 27, 2022
c0f9cab
We have to use the same pretrained weights for testing multiple preci…
thomasw21 Jul 27, 2022
eb4a6b9
Woops
thomasw21 Jul 27, 2022
83e065d
Turns out that matching fp32 is a lot more complicated
thomasw21 Jul 27, 2022
9e4e4a3
Tune atol/rtol
thomasw21 Jul 27, 2022
19f8e5a
Fix
thomasw21 Jul 27, 2022
0dca0ff
Make style
thomasw21 Jul 27, 2022
f1579ee
PR reviews
thomasw21 Jul 27, 2022
0714d6a
This can only run on gpu
thomasw21 Jul 27, 2022
6377a32
make style
thomasw21 Jul 27, 2022
2347846
Import warnings
thomasw21 Jul 27, 2022
92c93b1
Rename to force_lm_head_in_fp32
thomasw21 Jul 27, 2022
b02d741
Maybe I can first require torch and then run torch.no_grad
thomasw21 Jul 27, 2022
e5216f6
Change the way I run torch.no_grad
thomasw21 Jul 27, 2022
457019c
Make style
thomasw21 Jul 27, 2022
d5265d0
Create test to check that fp32 model is the same as the fp16 one with…
thomasw21 Jul 28, 2022
40eaf20
make style
thomasw21 Jul 28, 2022
1563523
Turns out weight tie is not done in post_init but in from_pretrained
thomasw21 Jul 28, 2022
36beec7
Turns out state_dict returns shared weights multiple times
thomasw21 Jul 28, 2022
426858e
Try to handle both post_init and from_pretrained
thomasw21 Jul 28, 2022
201d571
Make style
thomasw21 Jul 28, 2022
fec68bf
Actually I might need to just override the tie_weights method
thomasw21 Jul 28, 2022
ba58e5b
Remove unused imports
thomasw21 Jul 28, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion src/transformers/models/bloom/configuration_bloom.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,9 @@ class BloomConfig(PretrainedConfig):
issue](https://github.com/pytorch/pytorch/issues/76232). A solution to obtain more accurate results is to
enable this feature. Enabling this will hurt the computational time of the inference. Will be probably
resolved in the future once the main model has been fine-tuned with TP_rank=1.

force_lm_head_in_fp32 (`bool` defaults to `True`):
Casts `lm_head` in fp32 in order to increase the chances that obtained logits are totally ordered, ie with
no values that is equal to another.
Example:

```python
Expand Down Expand Up @@ -130,6 +132,7 @@ def __init__(
attention_dropout=0.0,
pretraining_tp=1, # TP rank used when training with megatron
slow_but_exact=False,
force_lm_head_in_fp32=True,
**kwargs,
):
self.vocab_size = vocab_size
Expand All @@ -149,6 +152,7 @@ def __init__(
self.bos_token_id = bos_token_id
self.eos_token_id = eos_token_id
self.slow_but_exact = slow_but_exact
self.force_lm_head_in_fp32 = force_lm_head_in_fp32

super().__init__(bos_token_id=bos_token_id, eos_token_id=eos_token_id, **kwargs)

Expand Down
Loading