Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

fcnxs example validation accuracy does not change #1815

Closed
sinaj opened this issue Apr 10, 2016 · 9 comments
Closed

fcnxs example validation accuracy does not change #1815

sinaj opened this issue Apr 10, 2016 · 9 comments

Comments

@sinaj
Copy link

sinaj commented Apr 10, 2016

Hello,
I am trying to use the example code for image segmentation. I simply changed the train.lst and val.lst to replace my images, but when I run it, although the train accuracy changes, the validation accuracy does not change at all (it is always 0.730370). Is there any potential situation that it could happen?

Here is the log:

INFO:root:Start training with cpu(0)
INFO:root:Saved checkpoint to "model_pascal/FCN32s_VGG16-0001.params"
INFO:root:--->Epoch[0] Train-accuracy=0.500000
INFO:root: in eval process...
INFO:root:batch[4] Validation-accuracy=0.730370
INFO:root:Saved checkpoint to "model_pascal/FCN32s_VGG16-0002.params"
INFO:root:--->Epoch[1] Train-accuracy=0.500000
INFO:root: in eval process...
INFO:root:batch[4] Validation-accuracy=0.730370
INFO:root:Saved checkpoint to "model_pascal/FCN32s_VGG16-0003.params"
INFO:root:--->Epoch[2] Train-accuracy=0.750000
INFO:root: in eval process...
INFO:root:batch[4] Validation-accuracy=0.730370
INFO:root:Saved checkpoint to "model_pascal/FCN32s_VGG16-0004.params"
INFO:root:--->Epoch[3] Train-accuracy=1.000000
INFO:root: in eval process...
INFO:root:batch[4] Validation-accuracy=0.730370
INFO:root:Saved checkpoint to "model_pascal/FCN32s_VGG16-0005.params"

@xiaowei-hu
Copy link

during training, your labels may be all equaled to zeros, because 0.730370 is the background's ratio

@tornadomeet
Copy link
Contributor

@sinaj ,you may check you data, i think it may be something wrong there, especially the label.

@liangfu
Copy link
Member

liangfu commented May 8, 2017

@sinaj any update? as I was having exactly the same problem, even after checking the label.

@Hjy20255
Copy link

@sinaj i have same error,did you solve it

@sinaj
Copy link
Author

sinaj commented May 23, 2017

@Hjy20255 No, could not solve it.

@sinaj sinaj closed this as completed May 23, 2017
@liangfu
Copy link
Member

liangfu commented May 24, 2017

I was having similar problem when porting fcn8s from vgg16 to resnet, changing use_global_stats from True to False in symbol_fcnxs.py file solved the problem for me. see my local repository https://github.com/liangfu/mx-fcn .

@Hjy20255
Copy link

[
_025
]

@Hjy20255
Copy link

sorry,i can not find use_global_stats in symbol_fcnxs.py
and so i add use_global_stats=False in symbol_fcnxs.py
but val acc all the same value!
my experiment :
i train my own dataset using fcn—xs
for label images: The real color diagram becomes the index diagram
for example(0,0,0)--->0 ,USE the index diagram it can run successfly. BUT USE real color diagram meet errors.

@liangfu
Copy link
Member

liangfu commented May 24, 2017

Try debug the output of each layer by adding following code before executing forward pass (in this case, the code exists in solver.py file)

import time
def stat_helper(name, array):
  """wrapper for executor callback"""
  import ctypes
  from mxnet.ndarray import NDArray
  from mxnet.base import NDArrayHandle, py_str
  array = ctypes.cast(array, NDArrayHandle)
  array = NDArray(array, writable=False).asnumpy()
  # array.wait_to_read()
  print (name, array.shape, np.mean(array), np.std(array), ('%.1fms' % (float(time.time()-stat_helper.start_time)*1000)))
  # print (name, array.shape, ('%.1fms' % (float(time.time()-stat_helper.start_time)*1000)))
  stat_helper.start_time=time.time()
  stat_helper.start_time=float(time.time())
  self.executor.set_monitor_callback(stat_helper)

And there should be no numerical overflow (displayed as inf for example) in any layer.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants