Fixed recursion error when uses both wrapped PEFT and DeepSpped#1400
Fixed recursion error when uses both wrapped PEFT and DeepSpped#1400kplau1128 wants to merge 1 commit into
Conversation
|
@kplau1128 thanks for the fix.
|
| for name, child in loss.named_children(): | ||
| if isinstance(child, torch.nn.Module): | ||
| # Avoid replacing the model again if it's already the desired model | ||
| if not (name == "model" and child is model): |
There was a problem hiding this comment.
| if not (name == "model" and child is model): | |
| if name != "model" or child is not model: |
can we simplify the condition! (Personal suggestion)
| and model != self.model # Only if the model is wrapped | ||
| and hasattr(loss_fn, "model") # Only if the loss stores the model | ||
| and loss_fn.model != model # Only if the wrapped model is not already stored | ||
| and loss_fn.model != self.model # Assign the original model instead |
There was a problem hiding this comment.
@kplau1128 this does not sound correct to me! Here the goal is to insert the wrapped model (distributed or compiled) into the loss function, not the original model
There was a problem hiding this comment.
I see that without this fix the recursion happen, but the workaround is different from the original intention
There was a problem hiding this comment.
Here it does not do anything since loss_fn.model is self.model
There was a problem hiding this comment.
@yafshar You are right.
The crux of the issue lies in circular references introduced by the combined wrappers. I will rework the workaround only to modify the override_model_in_loss function to skip override loss when combined wrappers.
4359ed9 to
92ce0c9
Compare
|
Re-work the workaround to check PEFT-wrapped model and skip override model in loss. Ran |
- Recursion error in compute_loss. The crux of the issue lies in circular references introduced by the combined wrappers.
92ce0c9 to
0f8adbf
Compare
|
Reworked as @nngokhale recommended, just add a check condition in |
|
@kplau1128 this does not sound correct to me. Let me check it more, I will update here |
|
@kplau1128 I made another PR #1428 which I think root causes this issue. Please take a look and close this PR if you agree with that solution. |
Thanks @yafshar. Looks similar to my previous rework tried to do 92ce0c9, but your PR is better. I will close this one. |
Recursion error in compute_loss. The crux of the issue lies in circular references introduced by the combined wrappers in
Sentence Transformers.Optimum-Habana example:
optimum-habana/examples/sentence-transformers-training/stsCommand Line:
python ../../gaudi_spawn.py --use_deepspeed --world_size 2 training_stsbenchmark.py --peft