Skip to content

Conversation

@jvdp1
Copy link
Collaborator

@jvdp1 jvdp1 commented Jun 14, 2024

As discussed, here is a draft in which I suggst to moved the optimizer from the network level to the layer level.

This is just a draft with an implementation for the dense layer only.

Here are the wall clock times using my dataset (with 2 hidden dense layers):

v0.17.0

  • Forward + backward: 4.79s
  • Update: 4.59s

Current PR

  • Forward + backward: 4.81s
  • Update: 1.40s

@OneAdder
Copy link
Collaborator

OneAdder commented Mar 5, 2025

@jvdp1 That's actually a great idea. Apart from obvious performance gains, it can simplify code for combined layers. I will arrange everything in similar fashion in my project here: https://github.com/OneAdder/llm.f
Then we can backport it here along with implementation for all other layers

@jvdp1
Copy link
Collaborator Author

jvdp1 commented Aug 23, 2025

Closed as proposed changes implemented in #222

@jvdp1 jvdp1 closed this Aug 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants