Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory limits due to texture memory #1

Open
GoogleCodeExporter opened this issue Mar 29, 2015 · 12 comments
Open

Memory limits due to texture memory #1

GoogleCodeExporter opened this issue Mar 29, 2015 · 12 comments

Comments

@GoogleCodeExporter
Copy link

Nvidia cards don't allow textures bigger than 512MB. Because this code uses 
texture memory, this imposes a limit on the sizes of various buffers. For 
example if your layer has too many filters (such that its output size exceeds 
512MB), the code will crash.

TODO: add non-texture-using routines to bypass this. 

Original issue reported on code.google.com by [email protected] on 25 Jul 2014 at 1:28

@GoogleCodeExporter
Copy link
Author

Was just about to report this! 

I see checks in place for filter_acts, but not for img_acts or weight_acts.

Original comment by [email protected] on 28 Jul 2014 at 7:54

  • Added labels: ****
  • Removed labels: ****

@GoogleCodeExporter
Copy link
Author

Yeah. There's also some texture usage in NVMatrix, actually. So it'll take a 
bit more effort to bypass entirely. Hopefully I'll get a chance to fix this 
soon but in the meantime it's usually possible to just split a layer into two 
layers if it's too big. 

Original comment by [email protected] on 28 Jul 2014 at 11:27

  • Added labels: ****
  • Removed labels: ****

@GoogleCodeExporter
Copy link
Author

> but in the meantime it's usually possible to just split a layer into two 
layers if it's too big. 
by making it block-sparse using multiple groups? or two separate layers 
themselves?

Original comment by [email protected] on 29 Jul 2014 at 6:32

  • Added labels: ****
  • Removed labels: ****

@GoogleCodeExporter
Copy link
Author

Two separate layers. 

Original comment by [email protected] on 29 Jul 2014 at 6:49

  • Added labels: ****
  • Removed labels: ****

@GoogleCodeExporter
Copy link
Author

I've replicated the texture kernels and changed tex1Dfetch to regular pointer 
addressing and added logic to use these kernels if the inputs are bigger than 
texture memory. I've only done this for the convolution kernels.
I can send you a patch if that's the approach you want to take.

Original comment by [email protected] on 4 Aug 2014 at 3:34

  • Added labels: ****
  • Removed labels: ****

@GoogleCodeExporter
Copy link
Author

Yeah that is the approach. The only thing to watch out for is that sometimes 
making this change causes register usage to change sufficiently to change the 
kernel's occupancy which can have a significant effect on performance. So 
sometimes you have to do some stuff to try to get register usage back down 
again. 

Original comment by [email protected] on 4 Aug 2014 at 6:17

  • Added labels: ****
  • Removed labels: ****

@GoogleCodeExporter
Copy link
Author

okay cool, i'll look at the register count/spillage with --ptxas-options=-v and 
if there's going to be no change wrt occupancy, I'll prepare patches and send 
them your way.

Original comment by [email protected] on 4 Aug 2014 at 6:20

  • Added labels: ****
  • Removed labels: ****

@GoogleCodeExporter
Copy link
Author

Thanks, I appreciate it. But don't do too much work because I do have the 
originals somewhere in my source control history. I did start out without using 
texture memory. 

Original comment by [email protected] on 4 Aug 2014 at 6:32

  • Added labels: ****
  • Removed labels: ****

@GoogleCodeExporter
Copy link
Author

Hi,I defined a net, and I got this error code: 
"CUDA error at src/nvmatrix.cu:1471 code=11(cudaErrorInvalidValue) 
"cudaCreateTextureObject(&_texObj, &resDesc, &texDesc, NULL)" "

But I run your defined net config, it's ok. So I think this may relate with my 
net config. The first conv layer has 3X3 filter size, stride =1, pad = 1, and 
output channel is 64.Maybe it's out of texture memory size? and is it related 
to Issue 2?

Original comment by [email protected] on 10 Sep 2014 at 2:24

  • Added labels: ****
  • Removed labels: ****

@GoogleCodeExporter
Copy link
Author

clarkon, this is very likely because of texture memory limits. in my fork of 
this i rewrote the kernels to not use texture memory if the incoming tensor is 
greater than 512MB in footprint, but I haven't had time to port this over.

You can split your layer into two parallel layers to avoid this if you want to 
use cuda-convnet2 in the current state. 

Original comment by [email protected] on 12 Sep 2014 at 6:21

  • Added labels: ****
  • Removed labels: ****

@GoogleCodeExporter
Copy link
Author

Hi, on running one of my convnets architecture, I get the error

CUDA error at src/nvmatrix.cu:1548 code=11(cudaErrorInvalidValue) 
"cudaCreateTextureObject(&_texObj, &resDesc, &texDesc, NULL)"

As suggested above, it looks like the texture is bigger than 512Mb causing my 
program to crash.

My current config has multiple 128,256,512 channel filters. But they are all 
3*3 size. 

1. Is this error because of the presence of multiple layers (suggesting that 
convnet2 cannot be used for deeep configurations) or is it because of the 
presence of even one big channel filter say of size:3*3 and channel: 512

2. Also is it possible to find out which conv layer is causing this error?

3. A possible suggestion mentioned above is to seperate into 2 parrallel 
layers. Can you please suggest what the config file would like if I have to 
seprate say a conv layer with 512 channel and size: 3*3

Any help is appreciated...Thanks in advance! :)


Original comment by [email protected] on 18 Feb 2015 at 8:41

  • Added labels: ****
  • Removed labels: ****

@GoogleCodeExporter
Copy link
Author

In response to durvasul:

The issue is the size of the buffers for a particular layer and not necessarily 
the total memory footprint of your model.  Try reducing the number of channels 
until it doesn't crash, use the debugger, or insert some print statements in 
the python code if you want to see which layer is causing the problems.  

As soumith pointed out, the fprop conv code has checks on the buffer sizes, so 
it's most likely image_acts or weight_acts that is trying to use the texture 
kernels.  You can insert similar checks (around line 2120 of weight_acts.cu, 
for example) and just back off to the non-texture versions of the kernels as in 
line 1251 of filter_acts.cu.




Original comment by [email protected] on 25 Feb 2015 at 4:51

  • Added labels: ****
  • Removed labels: ****

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant