-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Memory usage for multi-label classification problem #2113
Comments
@tqchen We should tune down the default prefetching buffer size |
@piiswrong I should have noted that I have already changed the number of prefetched minibatches from 4 -> 1 on this line: https://github.com/dmlc/mxnet/blob/master/src/io/iter_image_recordio.cc#L325 and recompiled. This did not fix the out-of-memory error. |
@piiswrong @tqchen It seems when passing Better to consider using a data iter for labels? |
I'm getting similar issues with a EC2 instance, using prefetch_buffer=1. |
@martinbel , my experience is that the labels are loaded at once into RAM (I've got >100GB labels to load..). I'm actually using g2.8xlarge which only has 60GB RAM. |
@EasonD3 You mean, your training set has 100GB of images? I'm not sure what you mean by "labels". I've converted the images to the |
@martinbel , sorry about the confusion. My problem is very similar to the one stated in the original post, where each image is associated with hundreds of thousands of labels/classes (i.e., multi-label problem). If you feed |
@EasonD3 In my case the label's file isn't huge, but I get the same error. I guess something isn't working well with the |
#4299 Maybe, i have the same problem. |
Same problem. |
This issue is closed due to lack of activity in the last 90 days. Feel free to reopen if this is still an active issue. Thanks! |
Hi,
I am trying to train using 1M training examples with 8600 labels per example on a g2.2xlarge (15 GB of system memory). The code crashes with and out-of-memory error (std::bad_alloc) when creating the mx.io.ImageRecordIter for the training data. I am initializing the mx.io.ImageRecordIter with both the recordio and image list files as follows:
Does MXNet store the training labels in memory, or if not, how do the CPU memory requirements scale with number of labels and number of training examples? (My code works ok with 100k training examples.)
Thanks,
G.
The text was updated successfully, but these errors were encountered: