Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: Resource exhausted: Out of memory while trying to allocate 12582912 bytes. #197

Closed
Ozcifci opened this issue Oct 4, 2021 · 11 comments
Labels
out of memory Out of memory error message

Comments

@Ozcifci
Copy link

Ozcifci commented Oct 4, 2021

Hi all,

I am running into a RuntimeError Resource exhausted: Out of memory while trying to allocate 12582912 bytes. while executing the command line python3 docker/run_docker.py --fasta_paths=T1050.fasta --max_template_date=2020-05-14 as introduced in the installation steps.
Some people got the same RuntimeError with long amino acid sequences, but in my case I get this error message anytime, even with short amino acid sequences.
I have traced the VRAM memory usage with watch nvidia-smi and it just increased ~100mb while running docker/run_docker.py. The total VRAM usage is ~800 mb out of 8192 mb while running.
I have mentioned my problem in this Issue but since it is not correlated to that topic anymore, I have created a new one.

Installed on WSL2. tested CUDA with many packages including a Nvidia sample ./BlackScholes in /usr/local/cuda/samples/4_Finance/BlackScholes and I get within a second 'Test passed' message.

@Ozcifci
Copy link
Author

Ozcifci commented Oct 7, 2021

Update:

I have tried to run TensorFlow commands in Python3 following this tutorial.
I had to install cuDNN 8.2.4.15, after that every command in the tutorial worked!
With a lot of hope, I ran AlphaFold - unfortunately, without any success.
I am running out of troubleshooting ideas.

@Ozcifci
Copy link
Author

Ozcifci commented Oct 15, 2021

Update:

After running
import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
tf.debugging.set_log_device_placement(True)
a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
c = tf.matmul(a, b)
print(c)
TensorFlow is able to allocate 7 out of 8 GB dedicated GPU memory. So there is no issue accessing the VRAM via WSL and TensorFlow.

I have installed AlphaFold without Docker following this Non Docker Setup of @sanjaysrikakulam.
I still got the CUDA_ERROR_OUT_OF_MEMORY error messages, but after some tries it could allocate some memory and continue with the script.

Later on I got another error message raise RuntimeError('HHblits failed\nstdout:\n%s\n\nstderr:\n%s\n' % ( RuntimeError: HHblits failed.
I could solve this issue by increasing the swap memory from 2 GB to 17 GB. After that I did not receive any serious error message and it seems like the script ended without any file in '/tmp/alphafold'.

Overall, I have serious memory usage issues with WSL. It is weird that I had to increase the swap memory and then it works.

@Ozcifci
Copy link
Author

Ozcifci commented Oct 15, 2021

Here is the output of my last try without any serious error message:

> /home/skems/.local/lib/python3.8/site-packages/absl/flags/_validators.py:203: UserWarning: Flag --preset has a non-None default value; therefore, mark_flag_as_required will pass even if flag is not specified in the command line!
  warnings.warn(
I1015 15:01:27.881464 140335681165120 templates.py:848] Using precomputed obsolete pdbs /home/skems/alphafold-data/pdb_mmcif/obsolete.dat.
I1015 15:01:28.764931 140335681165120 xla_bridge.py:231] Unable to initialize backend 'tpu_driver': Not found: Unable to find driver in registry given worker:
I1015 15:01:29.343135 140335681165120 xla_bridge.py:231] Unable to initialize backend 'tpu': Invalid argument: TpuPlatform is not available.
2021-10-15 15:01:29.566577: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:764] failed to alloc 34357903360 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-10-15 15:01:29.567270: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:764] failed to alloc 30922113024 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-10-15 15:01:29.567924: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:764] failed to alloc 27829901312 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-10-15 15:01:29.568598: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:764] failed to alloc 25046910976 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-10-15 15:01:29.569210: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:764] failed to alloc 22542219264 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-10-15 15:01:29.569827: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:764] failed to alloc 20287995904 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-10-15 15:01:29.570412: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:764] failed to alloc 18259195904 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-10-15 15:01:29.571086: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:764] failed to alloc 16433275904 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-10-15 15:01:29.571820: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:764] failed to alloc 14789948416 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-10-15 15:01:29.572844: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:764] failed to alloc 13310953472 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-10-15 15:01:29.572930: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:764] failed to alloc 11979857920 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-10-15 15:01:29.572951: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:764] failed to alloc 10781872128 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-10-15 15:01:29.573000: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:764] failed to alloc 9703685120 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-10-15 15:01:29.573059: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:764] failed to alloc 8733316096 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-10-15 15:01:29.573111: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:764] failed to alloc 7859984384 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-10-15 15:01:30.996236: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:764] failed to alloc 7073985536 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-10-15 15:01:32.556682: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:764] failed to alloc 6366586880 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-10-15 15:01:34.013874: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:764] failed to alloc 5729928192 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-10-15 15:01:35.530262: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:764] failed to alloc 5156935168 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-10-15 15:01:37.017102: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:764] failed to alloc 4641241600 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-10-15 15:01:38.416358: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:764] failed to alloc 4177117440 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-10-15 15:01:39.982618: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:764] failed to alloc 3759405568 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-10-15 15:01:41.468658: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:764] failed to alloc 3383464960 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-10-15 15:01:42.988521: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:764] failed to alloc 3045118464 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-10-15 15:01:44.577626: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:764] failed to alloc 2740606464 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-10-15 15:01:46.086485: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:764] failed to alloc 2466545664 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-10-15 15:01:47.529490: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:764] failed to alloc 2219890944 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-10-15 15:01:49.054840: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:764] failed to alloc 1997901824 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-10-15 15:01:50.523250: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:764] failed to alloc 1798111744 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-10-15 15:01:51.975646: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:764] failed to alloc 1618300672 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-10-15 15:01:53.438460: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:764] failed to alloc 1456470528 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-10-15 15:01:54.863242: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:764] failed to alloc 1310823424 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-10-15 15:01:56.665393: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:794] failed to alloc 67108864 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-10-15 15:01:56.665455: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 67108864
2021-10-15 15:01:56.707023: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:794] failed to alloc 60398080 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-10-15 15:01:56.707081: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 60398080
2021-10-15 15:01:56.749567: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:794] failed to alloc 54358272 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-10-15 15:01:56.749640: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 54358272
2021-10-15 15:01:56.791021: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:794] failed to alloc 67108864 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-10-15 15:01:56.791083: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 67108864
2021-10-15 15:02:06.832964: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:794] failed to alloc 67108864 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-10-15 15:02:06.833025: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 67108864
2021-10-15 15:02:06.903117: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:794] failed to alloc 67108864 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-10-15 15:02:06.903173: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 67108864
2021-10-15 15:02:06.903221: W external/org_tensorflow/tensorflow/core/common_runtime/bfc_allocator.cc:457] Allocator (xla_gpu_host_bfc) ran out of memory trying to allocate 48.00MiB (rounded to 50331648)requested by op
2021-10-15 15:02:06.903309: W external/org_tensorflow/tensorflow/core/common_runtime/bfc_allocator.cc:468] ____________________________________________________________________________________________________
2021-10-15 15:02:07.173282: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:794] failed to alloc 67108864 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-10-15 15:02:07.173350: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 67108864
2021-10-15 15:02:07.213203: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:794] failed to alloc 67108864 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-10-15 15:02:07.213267: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 67108864
2021-10-15 15:02:17.257924: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:794] failed to alloc 67108864 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-10-15 15:02:17.258021: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 67108864
2021-10-15 15:02:17.299096: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:794] failed to alloc 67108864 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-10-15 15:02:17.299159: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 67108864
2021-10-15 15:02:17.299178: W external/org_tensorflow/tensorflow/core/common_runtime/bfc_allocator.cc:457] Allocator (xla_gpu_host_bfc) ran out of memory trying to allocate 48.00MiB (rounded to 50331648)requested by op
2021-10-15 15:02:17.299292: W external/org_tensorflow/tensorflow/core/common_runtime/bfc_allocator.cc:468] ____________________________________________________________________________________________________
2021-10-15 15:02:17.506568: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:794] failed to alloc 67108864 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-10-15 15:02:17.506631: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 67108864
2021-10-15 15:02:17.547710: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:794] failed to alloc 67108864 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-10-15 15:02:17.547773: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 67108864
2021-10-15 15:02:27.587396: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:794] failed to alloc 67108864 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-10-15 15:02:27.587456: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 67108864
2021-10-15 15:02:27.626304: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:794] failed to alloc 67108864 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-10-15 15:02:27.626370: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 67108864
2021-10-15 15:02:27.626422: W external/org_tensorflow/tensorflow/core/common_runtime/bfc_allocator.cc:457] Allocator (xla_gpu_host_bfc) ran out of memory trying to allocate 24.00MiB (rounded to 25165824)requested by op
2021-10-15 15:02:27.626508: W external/org_tensorflow/tensorflow/core/common_runtime/bfc_allocator.cc:468] ____________________________________________________________________________________________________
I1015 15:02:28.126807 140335681165120 run_alphafold.py:267] Have 1 models: ['model_1']
I1015 15:02:28.130131 140335681165120 run_alphafold.py:280] Using random seed 3625752914001005753 for the data pipeline
I1015 15:02:28.135952 140335681165120 jackhmmer.py:130] Launching subprocess "/home/skems/miniconda3/envs/alphafold/bin/jackhmmer -o /dev/null -A /tmp/tmp11qdkhyw/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /home/skems/alphafold_non_docker/example/query.fasta /home/skems/alphafold-data/uniref90/uniref90.fasta"
I1015 15:02:28.146629 140335681165120 utils.py:36] Started Jackhmmer (uniref90.fasta) query
I1015 15:08:13.523266 140335681165120 utils.py:40] Finished Jackhmmer (uniref90.fasta) query in 345.376 seconds
I1015 15:08:13.526773 140335681165120 jackhmmer.py:130] Launching subprocess "/home/skems/miniconda3/envs/alphafold/bin/jackhmmer -o /dev/null -A /tmp/tmp9wf82uh0/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /home/skems/alphafold_non_docker/example/query.fasta /home/skems/alphafold-data/mgnify/mgy_clusters.fa"
I1015 15:08:13.536610 140335681165120 utils.py:36] Started Jackhmmer (mgy_clusters.fa) query
I1015 15:15:29.849993 140335681165120 utils.py:40] Finished Jackhmmer (mgy_clusters.fa) query in 436.313 seconds
I1015 15:15:29.853775 140335681165120 hhsearch.py:76] Launching subprocess "/home/skems/miniconda3/envs/alphafold/bin/hhsearch -i /tmp/tmp8pycesob/query.a3m -o /tmp/tmp8pycesob/output.hhr -maxseq 1000000 -d /home/skems/alphafold-data/pdb70/pdb70"
I1015 15:15:29.865261 140335681165120 utils.py:36] Started HHsearch query
I1015 15:15:59.856646 140335681165120 utils.py:40] Finished HHsearch query in 29.991 seconds
I1015 15:15:59.879396 140335681165120 hhblits.py:128] Launching subprocess "/home/skems/miniconda3/envs/alphafold/bin/hhblits -i /home/skems/alphafold_non_docker/example/query.fasta -cpu 4 -oa3m /tmp/tmp5jmgjgq5/output.a3m -o /dev/null -n 3 -e 0.001 -maxseq 1000000 -realign_max 100000 -maxfilt 100000 -min_prefilter_hits 1000 -d /home/skems/alphafold-data/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt -d /home/skems/alphafold-data/uniclust30/uniclust30_2018_08/uniclust30_2018_08"
I1015 15:15:59.890883 140335681165120 utils.py:36] Started HHblits query
I1015 15:26:36.442497 140335681165120 utils.py:40] Finished HHblits query in 636.489 seconds
I1015 15:26:36.631172 140335681165120 templates.py:860] Searching for template for: None
W1015 15:26:36.894060 140335681165120 templates.py:131] Template structure not in release dates dict: 6mrr
I1015 15:26:36.895924 140335681165120 templates.py:710] None: hit 6mrr_A did not pass prefilter: Template is an exact subsequence of query with large coverage. Length ratio: 1.0.
I1015 15:26:36.900985 140335681165120 templates.py:905] Skipped invalid hit 6MRR_A foldit1; De novo protein, Foldit; 1.18A {synthetic construct}, error: None, warning: None
W1015 15:26:36.902260 140335681165120 templates.py:131] Template structure not in release dates dict: 6q64
I1015 15:26:36.905469 140335681165120 templates.py:727] Reading PDB entry from /home/skems/alphafold-data/pdb_mmcif/mmcif_files/6q64.cif. Query: GWSTELEKHREELKEFLKKEGITNVEIRIDNGRLEVRVEGGTERLKRFLEELRQKLEKKGYTVDIKIE, template: VKAYANAICDSIEKYNLDGFDIDYQPGYGHSGTLANYQTISPSGNNKMQVFIETLSARLRPAGRMLVM
I1015 15:26:37.519767 140335681165120 templates.py:277] Found an exact template match 6q64_A.
W1015 15:26:37.648050 140335681165120 templates.py:131] Template structure not in release dates dict: 4s3k
I1015 15:26:37.648231 140335681165120 templates.py:727] Reading PDB entry from /home/skems/alphafold-data/pdb_mmcif/mmcif_files/4s3k.cif. Query: GWSTELEKHREELKEFLKKEGITNVEIRIDNGRLEVRVEGGTERLKRFLEELRQKLEKKGYTVDIKIE, template: DTLLNNIVKTAKEQNFRDIHFDFEFLRPADKEAYIAFLQKAKKRLQDEQLLMSVA
I1015 15:26:47.697747 140335681165120 templates.py:277] Found an exact template match 4s3k_A.
W1015 15:26:47.724254 140335681165120 templates.py:131] Template structure not in release dates dict: 5jh8
I1015 15:26:47.724439 140335681165120 templates.py:727] Reading PDB entry from /home/skems/alphafold-data/pdb_mmcif/mmcif_files/5jh8.cif. Query: GWSTELEKHREELKEFLKKEGITNVEIRIDNGRLEVRVEGGTERLKRFLEELRQKLEKKGYTVDIKIE, template: SQAVANLVKFAQDKRFSGINVDFEAVAQGDRNNFSHFIQVLGRALHAKGLKLIVS
I1015 15:26:47.999323 140335681165120 templates.py:277] Found an exact template match 5jh8_A.
W1015 15:26:48.017316 140335681165120 templates.py:131] Template structure not in release dates dict: 1jnd
I1015 15:26:48.017499 140335681165120 templates.py:727] Reading PDB entry from /home/skems/alphafold-data/pdb_mmcif/mmcif_files/1jnd.cif. Query: GWSTELEKHREELKEFLKKEGITNVEIRIDNGRLEVRVEGGTERLKRFLEELRQKLEKKGYTVDIKIE, template: GFIRSAYELVKTYGFDGLDLAYQFPKNKPRKVHGDLGLAWKSIKKLFTGDFIVDPHAALHKEQFTALVRDVKDSLRADGFLLSLT
I1015 15:26:48.164794 140335681165120 templates.py:277] Found an exact template match 1jnd_A.
W1015 15:26:48.181986 140335681165120 templates.py:131] Template structure not in release dates dict: 5y2a
I1015 15:26:48.182138 140335681165120 templates.py:727] Reading PDB entry from /home/skems/alphafold-data/pdb_mmcif/mmcif_files/5y2a.cif. Query: GWSTELEKHREELKEFLKKEGITNVEIRIDNGRLEVRVEGGTERLKRFLEELRQKLEKKGYTVDIKIE, template: RKFVVHAVDFLEQYGFDGLDLDWEYPKCWQVECEKGPDSDKQGFADLVKELRKAFNRRGMLLSAA
I1015 15:26:48.517071 140335681165120 templates.py:277] Found an exact template match 5y2a_B.
W1015 15:26:48.534681 140335681165120 templates.py:131] Template structure not in release dates dict: 4wiw
I1015 15:26:48.534842 140335681165120 templates.py:727] Reading PDB entry from /home/skems/alphafold-data/pdb_mmcif/mmcif_files/4wiw.cif. Query: GWSTELEKHREELKEFLKKEGITNVEIRIDNGRLEVRVEGGTERLKRFLEELRQKLEKKGYTVDIKIE, template: EKLIGEIVVLLKNTNADGVVIDFETPLDYGDVKDPYDGVRNDLTAFMESLHSELQSMNKLVVM
I1015 15:26:50.084921 140335681165120 templates.py:277] Found an exact template match 4wiw_B.
W1015 15:26:50.112894 140335681165120 templates.py:131] Template structure not in release dates dict: 4wiw
I1015 15:26:50.113142 140335681165120 templates.py:727] Reading PDB entry from /home/skems/alphafold-data/pdb_mmcif/mmcif_files/4wiw.cif. Query: GWSTELEKHREELKEFLKKEGITNVEIRIDNGRLEVRVEGGTERLKRFLEELRQKLEKKGYTVDIKIE, template: EKLIGEIVVLLKNTNADGVVIDFETPLDYGDVKDPYDGVRNDLTAFMESLHSELQSMNKLVVM
I1015 15:26:51.232174 140335681165120 templates.py:277] Found an exact template match 4wiw_D.
W1015 15:26:51.256093 140335681165120 templates.py:131] Template structure not in release dates dict: 6jm7
I1015 15:26:51.256328 140335681165120 templates.py:727] Reading PDB entry from /home/skems/alphafold-data/pdb_mmcif/mmcif_files/6jm7.cif. Query: GWSTELEKHREELKEFLKKEGITNVEIRIDNGRLEVRVEGGTERLKRFLEELRQKLEKKGYTVDIKIE, template: QNFIQTSLAMILEYNFDGLDVDWEYPNRRDTVHGEDDIEQFSTLLKELREEFDNYGLLLTV
I1015 15:26:51.378043 140335681165120 templates.py:277] Found an exact template match 6jm7_A.
W1015 15:26:51.392621 140335681165120 templates.py:131] Template structure not in release dates dict: 6jmb
I1015 15:26:51.392796 140335681165120 templates.py:727] Reading PDB entry from /home/skems/alphafold-data/pdb_mmcif/mmcif_files/6jmb.cif. Query: GWSTELEKHREELKEFLKKEGITNVEIRIDNGRLEVRVEGGTERLKRFLEELRQKLEKKGYTVDIKIE, template: QNFIQTSLAMILEYNFDGLDVDWEYPNRRDTVHGEDDIEQFSTLLKELREEFDNYGLLLTV
I1015 15:26:51.608811 140335681165120 templates.py:277] Found an exact template match 6jmb_A.
W1015 15:26:51.623733 140335681165120 templates.py:131] Template structure not in release dates dict: 4q6t
I1015 15:26:51.623880 140335681165120 templates.py:727] Reading PDB entry from /home/skems/alphafold-data/pdb_mmcif/mmcif_files/4q6t.cif. Query: GWSTELEKHREELKEFLKKEGITNVEIRIDNGRLEVRVEGGTERLKRFLEELRQKLEKKGYTVDIKIE, template: GTVKQLVKLAKEGGFAGINLDFEKVEPRNRAAFCAFVKTLGNALHASNKKLIIS
I1015 15:26:51.804447 140335681165120 templates.py:277] Found an exact template match 4q6t_A.
W1015 15:26:51.818300 140335681165120 templates.py:131] Template structure not in release dates dict: 3oa5
I1015 15:26:51.818641 140335681165120 templates.py:727] Reading PDB entry from /home/skems/alphafold-data/pdb_mmcif/mmcif_files/3oa5.cif. Query: GWSTELEKHREELKEFLKKEGITNVEIRIDNGRLEVRVEGGTERLKRFLEELRQKLEKKGYTVDIKIE, template: TNFVEGIKDFFQRFPMFSHLDIDWEYPGSIGAGNPNSPDDGANFAILIQQITDAKISNLKGISI
I1015 15:26:52.146717 140335681165120 templates.py:277] Found an exact template match 3oa5_B.
W1015 15:26:52.168421 140335681165120 templates.py:131] Template structure not in release dates dict: 5y2c
I1015 15:26:52.168607 140335681165120 templates.py:727] Reading PDB entry from /home/skems/alphafold-data/pdb_mmcif/mmcif_files/5y2c.cif. Query: GWSTELEKHREELKEFLKKEGITNVEIRIDNGRLEVRVEGGTERLKRFLEELRQKLEKKGYTVDIKIE, template: RKFVVHAVDFLEQYGFDGLDLDWLYPKCWQVECEKGPDSDKQGFADLVKELRKAFNRRGMLLSA
I1015 15:26:52.486926 140335681165120 templates.py:277] Found an exact template match 5y2c_A.
W1015 15:26:52.503098 140335681165120 templates.py:131] Template structure not in release dates dict: 5cuk
I1015 15:26:52.503271 140335681165120 templates.py:727] Reading PDB entry from /home/skems/alphafold-data/pdb_mmcif/mmcif_files/5cuk.cif. Query: GWSTELEKHREELKEFLKKEGITNVEIRIDNGRLEVRVEGGTERLKRFLEELRQKLEKKGYTVDIKIE, template: RLQVRLPQLGAVEVQVLHGHGQLQIEISASPGSLALLQQARGELLERLQRLHPEQPVQLTF
I1015 15:26:52.538353 140335681165120 templates.py:277] Found an exact template match 5cuk_A.
W1015 15:26:52.543342 140335681165120 templates.py:131] Template structure not in release dates dict: 4a5q
I1015 15:26:52.543471 140335681165120 templates.py:727] Reading PDB entry from /home/skems/alphafold-data/pdb_mmcif/mmcif_files/4a5q.cif. Query: GWSTELEKHREELKEFLKKEGITNVEIRIDNGRLEVRVEGGTERLKRFLEELRQKLEKKGYTVDIKIE, template: NFVEGIKDFFQRFPMFSHLDIDWEYPGSIGAGNPNSPDDGANFAILIQQITDAKISNLKGISI
I1015 15:26:53.414012 140335681165120 templates.py:277] Found an exact template match 4a5q_E.
W1015 15:26:53.441402 140335681165120 templates.py:131] Template structure not in release dates dict: 5y2b
I1015 15:26:53.441620 140335681165120 templates.py:727] Reading PDB entry from /home/skems/alphafold-data/pdb_mmcif/mmcif_files/5y2b.cif. Query: GWSTELEKHREELKEFLKKEGITNVEIRIDNGRLEVRVEGGTERLKRFLEELRQKLEKKGYTVDIKIE, template: KFVQHAVAFLEKYGFDGLDLDWEYPKCWQVDCSKGPDSDKQGFADLVHELSAVLKPKGLLLSA
I1015 15:26:53.565924 140335681165120 templates.py:277] Found an exact template match 5y2b_A.
W1015 15:26:53.581913 140335681165120 templates.py:131] Template structure not in release dates dict: 4lgx
I1015 15:26:53.582049 140335681165120 templates.py:727] Reading PDB entry from /home/skems/alphafold-data/pdb_mmcif/mmcif_files/4lgx.cif. Query: GWSTELEKHREELKEFLKKEGITNVEIRIDNGRLEVRVEGGTERLKRFLEELRQKLEKKGYTVDIKIE, template: AVFIRSVQQVIKQYHLDGIDLDWEYPVNGAWGLVESQPADRANFTLLLAELHKALDKGKLLTI
I1015 15:26:53.724340 140335681165120 templates.py:277] Found an exact template match 4lgx_A.
W1015 15:26:53.740086 140335681165120 templates.py:131] Template structure not in release dates dict: 4w5u
I1015 15:26:53.740243 140335681165120 templates.py:727] Reading PDB entry from /home/skems/alphafold-data/pdb_mmcif/mmcif_files/4w5u.cif. Query: GWSTELEKHREELKEFLKKEGITNVEIRIDNGRLEVRVEGGTERLKRFLEELRQKLEKKGYTVDIKIE, template: GFAQSCYNLVHDPRWDGVFDGIDIDWEYPNACGLTCDSSGPDAFRNLMAALRSTFGDELVTAA
I1015 15:26:54.168678 140335681165120 templates.py:277] Found an exact template match 4w5u_A.
W1015 15:26:54.185291 140335681165120 templates.py:131] Template structure not in release dates dict: 6jav
I1015 15:26:54.185494 140335681165120 templates.py:727] Reading PDB entry from /home/skems/alphafold-data/pdb_mmcif/mmcif_files/6jav.cif. Query: GWSTELEKHREELKEFLKKEGITNVEIRIDNGRLEVRVEGGTERLKRFLEELRQKLEKKGYTVDIKIE, template: FVQHAVAFLEKYGFDGLDLDWEYPKCWQVDCSKGPDSDKQGFADLVHELSAVLKPKGLLLSA
I1015 15:26:54.313122 140335681165120 templates.py:277] Found an exact template match 6jav_A.
W1015 15:26:54.328136 140335681165120 templates.py:131] Template structure not in release dates dict: 3cz8
I1015 15:26:54.328286 140335681165120 templates.py:727] Reading PDB entry from /home/skems/alphafold-data/pdb_mmcif/mmcif_files/3cz8.cif. Query: GWSTELEKHREELKEFLKKEGITNVEIRIDNGRLEVRVEGGTERLKRFLEELRQKLEKKGYTVDIKIE, template: NLVNNIYDLVSTRGYGGVTIDFEQVSAADRDLFTGFLRQLRDRLQAGGYVLTI
I1015 15:26:54.515573 140335681165120 templates.py:277] Found an exact template match 3cz8_A.
W1015 15:26:54.528684 140335681165120 templates.py:131] Template structure not in release dates dict: 3cz8
I1015 15:26:54.528850 140335681165120 templates.py:727] Reading PDB entry from /home/skems/alphafold-data/pdb_mmcif/mmcif_files/3cz8.cif. Query: GWSTELEKHREELKEFLKKEGITNVEIRIDNGRLEVRVEGGTERLKRFLEELRQKLEKKGYTVDIKIE, template: NLVNNIYDLVSTRGYGGVTIDFEQVSAADRDLFTGFLRQLRDRLQAGGYVLTI
I1015 15:26:54.793119 140335681165120 templates.py:277] Found an exact template match 3cz8_B.
I1015 15:26:54.814222 140335681165120 pipeline.py:200] Uniref90 MSA size: 2 sequences.
I1015 15:26:54.814362 140335681165120 pipeline.py:201] BFD MSA size: 1 sequences.
I1015 15:26:54.814434 140335681165120 pipeline.py:202] MGnify MSA size: 2 sequences.
I1015 15:26:54.814512 140335681165120 pipeline.py:203] Final (deduplicated) MSA size: 2 sequences.
I1015 15:26:54.814698 140335681165120 pipeline.py:205] Total number of templates (NB: this can include bad templates and is later filtered to top 4): 20.
I1015 15:26:54.831695 140335681165120 run_alphafold.py:142] Running model model_1
2021-10-15 15:27:01.863537: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 27082496 exceeds 10% of free system memory.
I1015 15:27:01.886157 140335681165120 model.py:131] Running predict with shape(feat) = {'aatype': (4, 68), 'residue_index': (4, 68), 'seq_length': (4,), 'template_aatype': (4, 4, 68), 'template_all_atom_masks': (4, 4, 68, 37), 'template_all_atom_positions': (4, 4, 68, 37, 3), 'template_sum_probs': (4, 4, 1), 'is_distillation': (4,), 'seq_mask': (4, 68), 'msa_mask': (4, 508, 68), 'msa_row_mask': (4, 508), 'random_crop_to_size_seed': (4, 2), 'template_mask': (4, 4), 'template_pseudo_beta': (4, 4, 68, 3), 'template_pseudo_beta_mask': (4, 4, 68), 'atom14_atom_exists': (4, 68, 14), 'residx_atom14_to_atom37': (4, 68, 14), 'residx_atom37_to_atom14': (4, 68, 37), 'atom37_atom_exists': (4, 68, 37), 'extra_msa': (4, 5120, 68), 'extra_msa_mask': (4, 5120, 68), 'extra_msa_row_mask': (4, 5120), 'bert_mask': (4, 508, 68), 'true_msa': (4, 508, 68), 'extra_has_deletion': (4, 5120, 68), 'extra_deletion_value': (4, 5120, 68), 'msa_feat': (4, 508, 68, 49), 'target_feat': (4, 68, 22)}
2021-10-15 15:28:47.315059: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:794] failed to alloc 67108864 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-10-15 15:28:47.315121: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 67108864
2021-10-15 15:28:47.361320: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:794] failed to alloc 67108864 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-10-15 15:28:47.361367: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 67108864
2021-10-15 15:28:47.410037: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:794] failed to alloc 67108864 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-10-15 15:28:47.410100: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 67108864
2021-10-15 15:28:47.458114: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:794] failed to alloc 67108864 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-10-15 15:28:47.458176: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 67108864
2021-10-15 15:28:47.505443: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:794] failed to alloc 67108864 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-10-15 15:28:47.505510: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 67108864
2021-10-15 15:28:57.509013: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:794] failed to alloc 67108864 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-10-15 15:28:57.509074: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 67108864
2021-10-15 15:28:57.552074: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:794] failed to alloc 67108864 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-10-15 15:28:57.552113: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 67108864
2021-10-15 15:28:57.552127: W external/org_tensorflow/tensorflow/core/common_runtime/bfc_allocator.cc:457] Allocator (xla_gpu_host_bfc) ran out of memory trying to allocate 25.83MiB (rounded to 27082496)requested by op
2021-10-15 15:28:57.553166: W external/org_tensorflow/tensorflow/core/common_runtime/bfc_allocator.cc:468] ____________________________________________________________________________________________________
I1015 15:34:02.744741 140335681165120 model.py:139] Output shape was {'distogram': {'bin_edges': (63,), 'logits': (68, 68, 64)}, 'experimentally_resolved': {'logits': (68, 37)}, 'masked_msa': {'logits': (508, 68, 23)}, 'predicted_lddt': {'logits': (68, 50)}, 'structure_module': {'final_atom_mask': (68, 37), 'final_atom_positions': (68, 37, 3)}, 'plddt': (68,)}
I1015 15:34:02.744922 140335681165120 run_alphafold.py:152] Total JAX model model_1 predict time (includes compilation time, see --benchmark): 421?
I1015 15:34:06.321281 140335681165120 amber_minimize.py:176] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {<Residue 67 (GLU) of chain 0>: ['OXT']}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I1015 15:34:06.689771 140335681165120 amber_minimize.py:404] Minimizing protein, attempt 1 of 100.
I1015 15:34:06.982795 140335681165120 amber_minimize.py:68] Restraining 574 / 1170 particles.
I1015 15:34:11.955974 140335681165120 amber_minimize.py:176] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I1015 15:34:16.077919 140335681165120 amber_minimize.py:490] Iteration completed: Einit 1505.16 Efinal -2093.68 Time 4.63 s num residue violations 0 num residue exclusions 0
I1015 15:34:17.312715 140335681165120 amber_minimize.py:176] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {<Residue 67 (GLU) of chain 0>: ['OXT']}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I1015 15:34:17.462030 140335681165120 run_alphafold.py:208] Final timings for query: {'features': 1466.683573961258, 'process_features_model_1': 7.053988933563232, 'predict_and_compile_model_1': 420.85899019241333, 'relax_model_1': 13.598760843276978}

@johnnytam100
Copy link

johnnytam100 commented Feb 26, 2022

I met the same issue and my solution:

comment out the following lines in run_docker.py

‘TF_FORCE_UNIFIED_MEMORY’: ‘1’,
‘XLA_PYTHON_CLIENT_MEM_FRACTION’: ‘4.0’,

@andycowie
Copy link
Collaborator

@Ozcifci The memory issues that you are seeing with hhblits are expected - generally we recommend only running full_dbs if you have 32GB of RAM or more.

You could try running with --db_preset=reduced_dbs, which reduces the required memory, however you may find that you do not have enough RAM for the GPU computation.

@Ozcifci
Copy link
Author

Ozcifci commented Feb 28, 2022

@andycowie I have upgraded to 64 GB RAM and yet I got the same issue with WSL.
I found that Windows is not supporting unified memory and thus also not WSL.
Finally, I have decided to install a native Linux system and it worked without any issue.

@johnnytam100 thanks for the info. However, now I don't have WSL with AlphaFold to test it. Hope it helps others.

@johnnytam100
Copy link

@Ozcifci The problem of using unified memory on WSL CUDA was documented https://docs.nvidia.com/cuda/wsl-user-guide/index.html
so commenting out the usage of unified memory worked for me. However, it may be problematic with larger proteins, which need to be tested.

@Ikajiro
Copy link

Ikajiro commented Mar 14, 2022

@johnnytam100 Hi, I had the same issue as described above. According to your comment, I commented out the two lines in run_docker.py. It worked for small proteins, which requires less volume than the GPU memory size, but it returned RuntimeError: Resource exhausted: Out of memory while trying to allocate 26158667192 bytes. for a large complex. As you pointed, the machine cannot accommodate the prediction larger than the GPU memory size and just stops.
So, I'm going to dual-boot Windows11 and Ubuntu keeping the latter in USB flash drive. I believe this is the best solution at this moment.

@Augustin-Zidek
Copy link
Collaborator

Fixed in v2.3.0 and v2.3.1.

@Ozcifci
Copy link
Author

Ozcifci commented Jul 4, 2023

@Augustin-Zidek after you telling that this issue was fixed in the newest versions, I gave AlphaFold on WSL another chance.
However, I get exactly the same issue failed to alloc 34357641216 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
Does your fix only work for proteins small enough so it can be handled by the GPU's vRAM?

@Pipiiido
Copy link

Pipiiido commented Oct 30, 2023

@Ozcifci v2.3.0 after i comment out

‘TF_FORCE_UNIFIED_MEMORY’: ‘1’,
‘XLA_PYTHON_CLIENT_MEM_FRACTION’: ‘4.0’,

It was working perfectly fine. Even with big complex I can see it used not just my 3090's 24GB memory but also my 64GB RAM with WSL. I never tired older version, but at least i think now after you comment that out, WSL is able to use both GPU and RAM to run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
out of memory Out of memory error message
Projects
None yet
Development

No branches or pull requests

6 participants