-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What is the learning rate decay and preprocessing you used in your training? #56
Comments
|
Thank you for your reply! May I also confirm with you that for each dataset, you trained the model for a total of 300 epochs and you perform decay only every 100 epochs? |
Yes that is correct. |
Thank you for the confirmation. Could I also know if you have continued to turn on dropout and batch norm when evaluating the test data? For many models, I think this is a standard thing to do. However, on my side, I seem to see a large difference in performance when I turned off batch norm and dropout. Also, could I confirm the dataset you were using is equivalent to what is found here: https://github.com/alexgkendall/SegNet-Tutorial/tree/master/CamVid Thank you once again. |
|
I have tested on the test dataset with dropout and batch_norm activated, and the results seem to be better than having either batch_norm or dropout turned off (or both). Did you have to turn off dropout when evaluating the test dataset? I see that in many models it's a common thing to stop dropout for test dataset. Further, for the ordering of the classes in CamVid, I noted that the original dataset gave class labels from 0-11, where 11 is the void class. If the dataset you've used is the one found in the segnet tutorial as well, did you have to relabel all the segmentations from 1-12 (in lua's case), since you've put class 1 as void? Is there a particular reason why void is the first class instead of the last? CamVid Labelling: https://github.com/alexgkendall/SegNet-Tutorial/blob/c922cc4a4fcc7ce279dd998fb2d4a8703f34ebd7/Scripts/test_segmentation_camvid.py#L60 Your Labelling: ENet-training/train/data/loadCamVid.lua Line 19 in e4b664d
Could I also confirm with you if you performed median frequency balancing for obtained the weighted cross entropy loss? For a reference, these are the class weights used for the CamVid dataset: Thank you for your help once again. |
|
@codeAC29 thanks for your excellent response! I have been mulling over your response, and I've tried to create a version that can deactivate both batch_norm and spatial dropout during evaluation, however this gives me a very poor result. Like what you mentioned, during testing, batch_norm and spatial dropout are turned on. Is it correct to say these two functions are critical to evaluating images? On the other hand, if batch_norm is critical to helping the model perform, would evaluating single images result in a very poor result? From my results, somehow there is quite a bit of difference in output when evaluating single images vs a batch of image. Would there be a way to effectively evaluate singular images for the network? I am currently only performing feature standardization to alleviate the effects. Your paper has a great amount of content which I'm still learning to appreciate. Would you share how in particular is the p_class calculated in the weighing formula: w_class = 1.0 / ln(c + p_class) ? From your code, is it right to assume that p_class is the number of occurrences of a certain pixel label in all images, divided by the total number of pixels in all images? Is there a particular reason why the class weights should be restricted between 1 and 50? Using median frequency balancing, I see that the weights do not exceed 10. Also, to verify with you, the spatial dropout you have used is Spatial Dropout in 2D (channel wise dropping) - is this correct? |
|
@codeAC29 Thank you for your excellent response once again. I am currently trying a variant of ENet in TensorFlow, in which case batch_norm could be turned off by setting the argument If batch_norm and dropout are both turned on simultaneously during testing, how have you handled the test images differently from the validation images? Was there any change to the model when you were performing the inference for test images? If there aren't any changes, could the test images be included into the mix of train-validation images instead, given there is no difference from evaluating test images and validation images? That is, of course, assuming there are no changes to the model during testing. Also, what inspired you to not perform any preprocessing for the images? Is there a conceptual reason behind this? It would be interesting to learn the reason why no preprocessing works well for the datasets. In your paper, you mentioned that all the relus are replaced with prelus. However, in the decoder implementation here: https://github.com/e-lab/ENet-training/blob/master/train/models/decoder.lua |
|
@codeAC29 can I verify with you that your calculation on the ENet test accuracy was the result of testing on both the test and validation dataset combined together? It seems to me this is a natural choice given that there are no architectural changes for both the test and validation dataset and that the only difference comes from the data. In fact, perhaps the test dataset can be distributed to both training and validation datasets? |
@kwotsin No, we performed testing only on test dataset. Combining test data into training/validation will give you better result but then the whole point of test data will be lost. So, you should always train and validate your network using respective data and then in the end when you have your trained network, run it on test data. |
Thanks for providing the source code of this fantastic architecture. I am trying to clarify the learning rate decay as mentioned in your
opts.lua
file - is the learning rate decay 1e-1 or 1e-7 every 100 epochs? From your training, it seems that you didnt set the-d
parameter, so would the decay go to 1e-7 by default?However, the comment you gave for
lrDecayEvery
is:--lrDecayEvery (default 100) Decay learning rate every X epoch by 1e-1
So I'd like to ask if the decay rate should be 1e-7 or 1e-1 every 100 epochs.
Also, what do you mean by
# samples
in this line?-d,--learningRateDecay (default 1e-7) learning rate decay (in # samples)
Also, could I know how you performed your preprocessing for the training/evaluation data?
The text was updated successfully, but these errors were encountered: