Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue with the code #3

Open
lircsszz opened this issue Aug 10, 2022 · 4 comments
Open

issue with the code #3

lircsszz opened this issue Aug 10, 2022 · 4 comments

Comments

@lircsszz
Copy link

Hi I'm running training process using unet.bash, but during training a bug was encountered.
the unet.bash is like:
CUDA_VISIBLE_DEVICES=0 python3 main.py --mode train --train_file lists/training.txt \ --valid_file lists/validation.txt \ --test_file lists/testing.txt --batch_size 16 --sample_length 1 \ --total_length 1 --number_of_crops 1 --buffer_size 100 --exp_name uvae-test1-0810 --learning_rate 0.0008 \ --checkpoint_dir ./data/checkpoints/ --model UNet --datatype outdoor --num_epochs 300 \ --num_class 10 --block_size 1 --probability 0

and the error is like:
AttributeError: 'NoneType' object has no attribute 'log_scalar'

and I trace this error to:
` self.logger.log_scalar('l2 loss', self.losslmse
could you help me with it?
thanks alot

@lircsszz
Copy link
Author

It seems losslmse never be culculated

@ValentinaSanguineti
Copy link
Collaborator

ValentinaSanguineti commented Aug 10, 2022

Hi I am not sure about the issue.
Have you found in which script in "trainer" folder is the lossmse not calculated so that I can check?
Probably that is an old bash and then the trainers were modified so that it has not proper arguments.
Try with unetacresnet.bash and let me know if that bash is running.
Kind regards,
Valentina

@lircsszz
Copy link
Author

Hi I am not sure about the issue. Have you found in which script in "trainer" folder is the lossmse not calculated so that I can check? Probably that is an old bash and then the trainers were modified so that it has not proper arguments. Try with unetacresnet.bash and let me know if that bash is running. Kind regards, Valentina

The unetacresnet.bash script works, but there are still some problems, such as the loss does not drop, and there is no way to run the unet model successfully.
That's okay though, one of the things I worked on before was enhancing the representation between the VAE model input data and the latent space features, making the latent space features encode to more information. Your UVAE work is great and has inspired me a lot, thanks a lot.

2022-08-11 10:00:35.915105: 0811unet - Iteration: [284] Training_Loss: 15.671988 Training_Accuracy: 0.187500 2022-08-11 10:00:37.238345: 0811unet - Iteration: [285] Training_Loss: 15.672263 Training_Accuracy: 0.125000 2022-08-11 10:00:38.678872: 0811unet - Iteration: [286] Training_Loss: 15.671172 Training_Accuracy: 0.187500 2022-08-11 10:00:39.952024: 0811unet - Iteration: [287] Training_Loss: 15.671836 Training_Accuracy: 0.125000 2022-08-11 10:00:41.294306: 0811unet - Iteration: [288] Training_Loss: 15.672267 Training_Accuracy: 0.062500 2022-08-11 10:00:42.730300: 0811unet - Iteration: [289] Training_Loss: 15.671895 Training_Accuracy: 0.000000 2022-08-11 10:00:44.025725: 0811unet - Iteration: [290] Training_Loss: 15.671336 Training_Accuracy: 0.062500 2022-08-11 10:00:45.326138: 0811unet - Iteration: [291] Training_Loss: 15.671785 Training_Accuracy: 0.062500 2022-08-11 10:00:46.718289: 0811unet - Iteration: [292] Training_Loss: 15.671815 Training_Accuracy: 0.125000 2022-08-11 10:00:48.119745: 0811unet - Iteration: [293] Training_Loss: 15.673264 Training_Accuracy: 0.000000 2022-08-11 10:00:49.482745: 0811unet - Iteration: [294] Training_Loss: 15.672828 Training_Accuracy: 0.000000 2022-08-11 10:00:50.744331: 0811unet - Iteration: [295] Training_Loss: 15.672508 Training_Accuracy: 0.000000 2022-08-11 10:00:51.860722: 0811unet - Iteration: [296] Training_Loss: 15.671854 Training_Accuracy: 0.187500 2022-08-11 10:00:53.275444: 0811unet - Iteration: [297] Training_Loss: 15.671474 Training_Accuracy: 0.187500 2022-08-11 10:00:54.596415: 0811unet - Iteration: [298] Training_Loss: 15.671822 Training_Accuracy: 0.187500 2022-08-11 10:00:55.850015: 0811unet - Iteration: [299] Training_Loss: 15.672259 Training_Accuracy: 0.125000 2022-08-11 10:00:57.190248: 0811unet - Iteration: [300] Training_Loss: 15.672073 Training_Accuracy: 0.062500 2022-08-11 10:00:58.397464: 0811unet - Iteration: [301] Training_Loss: 15.670971 Training_Accuracy: 0.125000 2022-08-11 10:00:59.843439: 0811unet - Iteration: [302] Training_Loss: 15.671003 Training_Accuracy: 0.125000 2022-08-11 10:01:01.183775: 0811unet - Iteration: [303] Training_Loss: 15.672009 Training_Accuracy: 0.062500 2022-08-11 10:01:02.568894: 0811unet - Iteration: [304] Training_Loss: 15.672955 Training_Accuracy: 0.000000 2022-08-11 10:01:04.025399: 0811unet - Iteration: [305] Training_Loss: 15.672600 Training_Accuracy: 0.000000 2022-08-11 10:01:05.355327: 0811unet - Iteration: [306] Training_Loss: 15.672375 Training_Accuracy: 0.062500 2022-08-11 10:01:06.746954: 0811unet - Iteration: [307] Training_Loss: 15.671938 Training_Accuracy: 0.062500 2022-08-11 10:01:07.968467: 0811unet - Iteration: [308] Training_Loss: 15.671972 Training_Accuracy: 0.125000 2022-08-11 10:01:09.340287: 0811unet - Iteration: [309] Training_Loss: 15.671703 Training_Accuracy: 0.187500 2022-08-11 10:01:10.656609: 0811unet - Iteration: [310] Training_Loss: 15.671160 Training_Accuracy: 0.187500 2022-08-11 10:01:11.924044: 0811unet - Iteration: [311] Training_Loss: 15.672188 Training_Accuracy: 0.062500 2022-08-11 10:01:13.283802: 0811unet - Iteration: [312] Training_Loss: 15.671762 Training_Accuracy: 0.125000 2022-08-11 10:01:14.583009: 0811unet - Iteration: [313] Training_Loss: 15.671877 Training_Accuracy: 0.187500 2022-08-11 10:01:15.784425: 0811unet - Iteration: [314] Training_Loss: 15.671447 Training_Accuracy: 0.125000 2022-08-11 10:01:17.181165: 0811unet - Iteration: [315] Training_Loss: 15.672251 Training_Accuracy: 0.062500

@ValentinaSanguineti
Copy link
Collaborator

The parameters have been set to work well on my dataset. Try to reduce the learning rate to 10^-5, change batch size. You can also add 1 or 2 skip connections. If you need latent space features encode to more information, increase the latent_loss parameter. Maybe you need to modify the loss, or change the number of layers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants