Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented linear synaptic bit-slicing layer #631

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

HeatPhoenix
Copy link
Contributor

Related issues

Pull request opened to contribute to Issue #287.

Description

I felt that bit-slicing may give a more realistic approximation on how current crossbar arrays are implemented in hardware (as to represent 8 or 16 bits, multiple devices are often used), and found issue #287 with @maljoras's instructions on how to implement this sufficient to try contributing a solution. Main difference from this implementation is that the number of slices can be arbitrarily set by the user.

This pull request includes mainly linear_sliced.py, which contains the implementation of the new module. The implementation is fairly naive, and is likely lacking some details that may stand out to someone more experienced with the codebase. I have tested its functionality by comparing its performance to a regular AnalogLinear using the simple_layer.py example.

I would highly appreciate feedback on the implementation.

Details

linear_sliced.py implements AnalogLinearBitSlicingLayer. The constructor adds the following parameters to AnalogLinear:
number_slices, which sets the number of slices the weights should be divided over. evenly_sliced: whether the slices should have equal value (factor of 1) when True, or should be generated as MSB to LSB (factor of 2^x) when False. Finally, significance_factors, which allows for arbitrary setting of the factors, it must have a length equal to number_slices.

The to_digital (recombine all weights/factors together and assign) and to_analog (take digital weights and divide/split over number of slices) methods are also implemented, but since Linear would map to AnalogLinear I'm not sure how necessary or useful this is.
Specifically for to_analog, I did not see an elegant way to let the user set how many slices they'd like to generate or other parameters. As stated previously, feedback is highly appreciated, I'd like this contribution to both be useful and properly implemented.

@maljoras
Copy link
Collaborator

maljoras commented Apr 2, 2024

@HeatPhoenix very nice contribution, many thanks.

@maljoras maljoras self-requested a review April 2, 2024 07:22
@maljoras maljoras added the enhancement New feature or request label Apr 2, 2024
return forward_output # type: ignore

@classmethod
def from_digital(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to add some arguments in this from_digital call to set the slices etc. To use the convert_to_digital one has to use a custom mapping anyway. So the best would be to add a convert_to_digital_sliced function that converts all Linear to AnalogLinearSliced` which also accepts and passes the slicing parameters during the construction.

Copy link
Contributor Author

@HeatPhoenix HeatPhoenix Jun 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

convert_to_analog_sliced, you mean?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right

from aihwkit.nn import AnalogLinear


class AnalogLinearBitSlicing(AnalogLayerBase, Linear):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it would be better if you would just write a new module which inherits from AnalogContainerBase or simply from Module instead of AnalogLayerBase. This is because if AnalogLinear layers are used as children of an AnalogLayer which will cause issues. See for instance the RNN module that internally also use AnalogLinear layers, see here and here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might also not necessary to inherit from Linear at all, as this will cause additional issues with the Linear weight etc.

@maljoras
Copy link
Collaborator

Many thanks @HeatPhoenix for this great contribution. I think it needs some adaption, see my comments above. Also would be great if you would add some unit tests and an example.

@kaoutar55 kaoutar55 requested a review from charlesmackin July 22, 2024 15:48
@kaoutar55
Copy link
Collaborator

@charles-mackin can you please review this PR and check how it is aligned with your implementation of bit-slicing

@HeatPhoenix
Copy link
Contributor Author

FWIW, I still intend to follow up on the comments by @maljoras, just haven't had time to as of yet.

@charlesmackin
Copy link
Collaborator

charlesmackin commented Sep 18, 2024

It seems weight bit slicing can be implemented (within a unit cell) by simply writing a new g_converter class as seen here:

class CustomPairConductanceConverter(BaseConductanceConverter):
.

The CustomPairConductanceConverter class already almost does this - just in a non-quantized (interpolated for continuous weights) way. The f_lst parameter can be used to set the significance to equivalent 1s, 2^N, or arbitrary values. I would suggest creating a new class and modifying the convert_to_conductance method to quantize the weights as opposed to interpolating (as is done currently) and a new convert_back_to_weights method to accomplish this. It seems a much simpler way to implement weight bit slicing. It should also be compatible with linear layers, convs, etc. It should also be compatible with activation slicing (e.g. bit-wise and split-PWM modes).

For example, you should be able to simply provide parameters below for +/- weight programming for a 4-device unit cell with 2^N weighting.

f_lst = [1, 2]
g_lst = [[0, 0, 0, 0, 1, 0, 1], # gp0
              [1, 0, 1, 0, 0, 0, 0], # gm0
              [0, 0, 0, 0, 0, 1, 1], # gp1
              [1, 1, 0, 0, 0, 0, 0]] # gm1
#            -1 -0.67 -0.3 0 0.3 0.67 1  in unitless weights
# replace 0, 1 with g_min, g_max to specify how to program device

Use the modified convert_to_conductance method to convert continuous weight distributions to a quantized one using a scheme similar to the one above.

I think you could even substitute with non g_min, g_max values if you want to enable less trivial programming strategies. For instance, programming at the absolute g_max of a device may have some undesirable characteristics. You could easily substitute 1 = 0.9 * g_max which may program better at a slightly reduced weight range. Same goes with g_min.

@HeatPhoenix
Copy link
Contributor Author

It seems weight bit slicing can be implemented (within a unit cell) by simply writing a new g_converter class as seen here:

class CustomPairConductanceConverter(BaseConductanceConverter):

.
The CustomPairConductanceConverter class already almost does this - just in a non-quantized (interpolated for continuous weights) way. The f_lst parameter can be used to set the significance to equivalent 1s, 2^N, or arbitrary values. I would suggest creating a new class and modifying the convert_to_conductance method to quantize the weights as opposed to interpolating (as is done currently) and a new convert_back_to_weights method to accomplish this. It seems a much simpler way to implement weight bit slicing. It should also be compatible with linear layers, convs, etc. It should also be compatible with activation slicing (e.g. bit-wise and split-PWM modes).

For example, you should be able to simply provide parameters below for +/- weight programming for a 4-device unit cell with 2^N weighting.

f_lst = [1, 2]
g_lst = [[0, 0, 0, 0, 1, 0, 1], # gp0
              [1, 0, 1, 0, 0, 0, 0], # gm0
              [0, 0, 0, 0, 0, 1, 1], # gp1
              [1, 1, 0, 0, 0, 0, 0]] # gm1
#            -1 -0.67 -0.3 0 0.3 0.67 1  in unitless weights
# replace 0, 1 with g_min, g_max to specify how to program device

Use the modified convert_to_conductance method to convert continuous weight distributions to a quantized one using a scheme similar to the one above.

I think you could even substitute with non g_min, g_max values if you want to enable less trivial programming strategies. For instance, programming at the absolute g_max of a device may have some undesirable characteristics. You could easily substitute 1 = 0.9 * g_max which may program better at a slightly reduced weight range. Same goes with g_min.

I'm not really deep into it enough anymore to understand if this implies what I've implemented is superfluous and should be re-implemented basically entirely differently?

@charlesmackin
Copy link
Collaborator

@HeatPhoenix Yes, I believe it would be more appropriate (and likely simpler) to implement the synaptic bit slicing via a new ConductanceConverter class. It would be great if @maljoras could provide his thoughts as well here.

@HeatPhoenix
Copy link
Contributor Author

@HeatPhoenix Yes, I believe it would be more appropriate (and likely simpler) to implement the synaptic bit slicing via a new ConductanceConverter class. It would be great if @maljoras could provide his thoughts as well here.

I see, in that case it might be better to close this pull request and maybe open a new issue with instructions outlining this. I probably won't re-implement this, though.

@maljoras-sony
Copy link
Contributor

maljoras-sony commented Oct 8, 2024

Hi @charlesmackin, g_converter indeed could handle one form of bit-slicing and it makes sense in some use cases. However, it would only slice the weights after the convergence of the DNN so there is no gradient pass through the bit-slices of the weights. Also, the weight-slices are summing currents in analog and only then are subject to digital conversion. On the other hand, the approach by @HeatPhoenix I think was to use AnalogLinear as the base of the weight slices, which would mean that the gradient could pass through the individual slices during training and also that the summing of the outputs is done in digital instead. So there are slight differences to the approaches.

@HeatPhoenix
Copy link
Contributor Author

Hi @charlesmackin, g_converter indeed could handle one form of bit-slicing and it makes sense in some use cases. However, it would only slice the weights after the convergence of the DNN so there is no gradient pass through the bit-slices of the weights. Also, the weight-slices are summing currents in analog and only then are subject to digital conversion. On the other hand, the approach by @HeatPhoenix I think was to use AnalogLinear as the base of the weight slices, which would mean that the gradient could pass through the individual slices during training and also that the summing of the outputs is done in digital instead. So there are slight differences to the approaches.

In this case, should I move forward with the improvements as outlined by @maljoras-sony previously? As an alternative to @charlesmackin's implementation.

@charlesmackin
Copy link
Collaborator

@HeatPhoenix Yes, you should move forward with @maljoras-sony suggestions in this case

@PabloCarmona
Copy link
Collaborator

@HeatPhoenix did you had the chance to take a look at @maljoras-sony suggestions like @charlesmackin advice you? Let us know if you need any further help to be able to finish this PR. Thanks!

@HeatPhoenix
Copy link
Contributor Author

@HeatPhoenix did you had the chance to take a look at @maljoras-sony suggestions like @charlesmackin advice you? Let us know if you need any further help to be able to finish this PR. Thanks!

Still interested in finishing up, just really strapped for time, generally. Will try to get it done near the start of the new year.

@kaoutar55
Copy link
Collaborator

Thanks @HeatPhoenix please let us know when you have any new updates!

@PabloCarmona
Copy link
Collaborator

Hello @HeatPhoenix! Did you have any updates on this? Can we help you out in some manner? For us this is a good addition to the kit capabilities and we want to add it, thanks for your work!

@HeatPhoenix
Copy link
Contributor Author

Hello @HeatPhoenix! Did you have any updates on this? Can we help you out in some manner? For us this is a good addition to the kit capabilities and we want to add it, thanks for your work!

I'll try to finish the commented (by Maljoras) things by next week, it'd be good for me to know what kind of tests/unit tests are expected of me also.

@PabloCarmona
Copy link
Collaborator

Hello @HeatPhoenix! Did you have any updates on this? Can we help you out in some manner? For us this is a good addition to the kit capabilities and we want to add it, thanks for your work!

I'll try to finish the commented (by Maljoras) things by next week, it'd be good for me to know what kind of tests/unit tests are expected of me also.

Hello @HeatPhoenix in terms of the examples try to add one that showcase the use of that linear_sliced layer on a simple neural network, you can find inspiration if you go through the various examples in our examples folder in the root of the project.

And for the tests try to add one or two that makes sense on the creation of this new AnalogLinearBitSlicing class you created and the different methods that it has. Thanks for this great work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants