Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request]: Improve interfacing between encoders and output heads #265

Open
2 of 6 tasks
laserkelvin opened this issue Jul 30, 2024 · 0 comments
Open
2 of 6 tasks
Assignees
Labels
code maintenance Issue/PR for refactors, code clean up, etc. enhancement New feature or request ux User experience, quality of life changes

Comments

@laserkelvin
Copy link
Collaborator

Feature/behavior summary

Currently, there are two things that make configuring output heads a bit of a wildcard:

  1. Difficult to match up the shapes between the embedding dimensionality and the output head input layer
  2. Irregular use of lazy layers for the MLP blocks: this was partly to resolve (1), but has resulted in some engineering complexity (since lazy output heads are created just in time), making it hard to maintain and sometimes difficult to start distributed training.

Request attributes

  • Would this be a refactor of existing code?
  • Does this proposal require new package dependencies?
  • Would this change break backwards compatibility?
  • Does this proposal include a new model?
  • Does this proposal include a new dataset?
  • Does this proposal include a new task/workflow?

Related issues

No response

Solution description

I don't really have a perfect solution, but my suggestions are:

  1. Remove the option to use lazy modules; this will break examples and probably some tests, but means that there is only one way to initialize models, making it easier for maintenance and less ambiguity in set up.
  2. In the abstract models (e.g. AbstractPyGModel), add an abstract property like encoder_output_dim or something to that effect, that will make it easier for output heads to be created: it'll essentially just use this to calculate the input dimensions for the output head, and the only configuration the output heads will need is the hidden_dim and possibly output_dim.

For the second item, it could be something as simple as:

class AbstractEncoder:

     @abstractproperty
    def encoder_output_dim(self) -> int:
         raise NotImplementedError

And in the concrete case, it might return the dimension of the final layer, or for more complicated (e.g. concatenated tensors), provide the arithmetic to calculate the expected output. We can then refactor OutputHead to rely on this property for the input dimension.

Additional notes

No response

@laserkelvin laserkelvin added enhancement New feature or request ux User experience, quality of life changes code maintenance Issue/PR for refactors, code clean up, etc. labels Jul 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code maintenance Issue/PR for refactors, code clean up, etc. enhancement New feature or request ux User experience, quality of life changes
Projects
None yet
Development

No branches or pull requests

2 participants