-
Notifications
You must be signed in to change notification settings - Fork 226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions about memory consumption of infinitely wide NTK #166
Comments
Thanks for the report, your code correct! This actually looks like two bugs on our side:
import tensorflow as tf
tf.config.set_visible_devices([], 'GPU')
import tensorflow_datasets as tfds (and I'm using Another idea is to binary search smaller training set sizes to figure out if we're really hitting the memory limit (e.g. it works for 40K, but not 50K), or if the GPU memory is just not available for some reason (e.g. it doesn't work even for 5K). Also, could you please post the whole error message? |
Thank you so much for the detailed reply! I have tried your code but still face the same issue. Below shows the complete error message for your reference: 2022-09-08 13:20:36.044808: W external/org_tensorflow/tensorflow/core/common_runtime/bfc_allocator.cc:479] Allocator (GPU_0_bfc) ran out of memory trying to allocate 9.31GiB (rounded to 10000000000)requested by op
Traceback (most recent call last):
|
I have also tried searching for the maximum number of samples before encountering the memory issue, which turned out to be 36000 in my case: num_samples = 36000
x_train = x_train[:num_samples]
y_train = y_train[:num_samples] |
Oh thanks for the error message, I realized what's actually failing is fx_train_inf, fx_test_inf = predict_fn(fx_train_0=fx_train_0, fx_test_0=fx_test_0, k_test_train=k_test_train) and not the kernel computation. Indeed 24Gb is not enough to run the Cholesky solver on the 50k x 50k matrix, so you'd need to be doing it on CPU. To make it happen on CPU, I think the easiest way should be to have Alternatively, but hopefully not necessarily, you can pin input tensors to CPU, to make sure the function called with them as inputs is executed on CPU: fx_train_0 = jax.device_put(fx_train_0, devices('cpu')[0])
fx_test_0 = jax.device_put(fx_test_0, devices('cpu')[0])
k_test_train = jax.device_put(k_test_train, devices('cpu')[0]) and/or k_train_train = jax.device_put(k_train_train, devices('cpu')[0])
y_train = jax.device_put(y_train, devices('cpu')[0]) before defining |
Thank you so much for the detailed follow-up! As you suggested, I have tried to move everything to the CPU before defining the |
How much RAM do you have? Does it work (on CPU, after your modifications) if you use 36k points? I suspect you'd need at least ~64 Gb of RAM, but I only ever tried it on a machine with >128Gb, so I'm not sure what is the exact requirement. To better debug this you can try to run the piece of code from |
I am working on a simple MNIST example. I found that I could not compute the NTK for the entire dataset without running out of memory. Below is the code snippet I used:
I am running this on two RTX3090 each having a 24Gb buffer.
Is there something I'm doing wrong, or is it normal for NTK to consume so much memory?
Thank you!
The text was updated successfully, but these errors were encountered: