-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I used Actnn in Deeplab v3 plus ResNet50, and the it degrade the iou by %2 #29
Comments
Here is my codes (part): import actnn actnn.set_optimization_level("L3") def main():
after_scheduler=scheduler)
if name == 'main': |
ActNN is a lossy algorithm, so it is possible that it does not work with 2 bits for some models. Please try using more warmup iterations, and actnn.set_optimization_level("L2") |
I've tried set_optimization_level "L0", "L1", "L2", "L3", but the performance seems no much difference, so I'm confused as for warmup iterations, i used 4 epoch warmup, is that enough for training? |
ActNN L0 does exactly the same thing with full precision training. Is the 2% accuracy loss within random error? |
Could you print(model) before the training loop, and check if the model is correctly converted? ActNN converts nn.Modules with its own modules, and I noticed there are additional model converters after if opts.separable_conv and 'plus' in opts.model: |
I'm afraid not, when I don't use the actnn.module wrapper, I can always get a model with 2% of iou higher than that use it. Incidentally, my model is a 2-class segmentation model, Does this have an impact? |
the codes you list are always skiped during my training, and I print my model, seem all layers are converted correctly: DataParallel( |
That's strange. ActNN L0 and full precision training should have identical behavior. Could you try to debug by the following:
If you spot a bug in our implementation, please create a PR for us. |
Thanks for your advise last week. I've followed the steps you list but it seems we are getting into a trouble: I restored my model (only a nn.Conv2D layer) from a fixed checkpoint, and I found that the gradient with actnn(set level: "L0") and full-precision training are totally different. I've sent an email to your Gmail mailbox with my experimental code and printed results with details. You may check that during your free time. |
No description provided.
The text was updated successfully, but these errors were encountered: