Skip to content

Conversation

@ydshieh
Copy link
Collaborator

@ydshieh ydshieh commented Dec 9, 2022

What does this PR do?

⚠️⚠️⚠️ This changes the eps of those LayerNorm layers from (the default) 1e-5 to 1e-12, and the outputs will have slightly differences before/after this PR. ⚠️⚠️⚠️


Similar to #20554, but this time instead of removing the attribute from config, we use config.layer_norm_eps in some nn.LayerNorm.

"""Construct the overlapping patch embeddings."""

def __init__(self, patch_size, stride, num_channels, hidden_size):
def __init__(self, config, patch_size, stride, num_channels, hidden_size):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need this new argument so we can use eps=config.layer_norm_eps. As this is an internal class, should be fine.

"""Construct the overlapping patch embeddings."""

def __init__(self, patch_size, stride, num_channels, hidden_size):
def __init__(self, config, patch_size, stride, num_channels, hidden_size):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need this new argument so we can use eps=config.layer_norm_eps. As this is an internal class, should be fine.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Dec 9, 2022

The documentation is not available anymore as the PR was closed or merged.

@ydshieh ydshieh requested review from NielsRogge and sgugger and removed request for NielsRogge December 9, 2022 13:12
@NielsRogge
Copy link
Contributor

NielsRogge commented Dec 9, 2022

Just to confirm:

  • these changes don't have an impact on the integration tests, right? This only results in different outputs when one would train these models from scratch?

It has impact even for integration tests: the change is the constant eps being changed, which affects the forward call both in inference as well as in training.

  • technically we should check the layer_norm_eps for each paper to match the original code

I agree - but I am not sure, for recent models, if all these attributes are set according to the papers, or people just used add new model like templates ...

@sgugger
Copy link
Collaborator

sgugger commented Dec 9, 2022

This is too breaking I think. We need to be more careful on new models added that this attribute is consistently used but I don't think we should touch old models like this as it will change the results of the forward.

@ydshieh
Copy link
Collaborator Author

ydshieh commented Dec 9, 2022

OK! I will keep this list of models to skip in the WIP PR where we add a test for checking unused config attributes.

@ydshieh
Copy link
Collaborator Author

ydshieh commented Dec 9, 2022

Close as it is too breaking!

@ydshieh ydshieh closed this Dec 9, 2022
@ydshieh ydshieh deleted the cleanup_config_attrs_3 branch February 2, 2023 10:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants