You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
However, when I used tensorboard to visualize it, unlike the official document, there is nothing visualized at all.
The running output is:
bash experiments/run_test.sh
2025-01-14 20:26:36.691284: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1736857596.708528 370661 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1736857596.713608 370661 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
I0114 20:26:41.098771 139841220543104 folding_input.py:1044] Detected /home/user3/alphafold/data/processed/301aa_3DB6.json is an AlphaFold 3 JSON since the top-level is not a list.
Running AlphaFold 3. Please note that standard AlphaFold 3 model parameters are
only available under terms of use provided at
https://github.com/google-deepmind/alphafold3/blob/main/WEIGHTS_TERMS_OF_USE.md.
If you do not agree to these terms and are using AlphaFold 3 derived model
parameters, cancel execution of AlphaFold 3 inference with CTRL-C, and do not
use the model parameters.
Skipping running the data pipeline.
Found local devices: [CudaDevice(id=0), CudaDevice(id=1)]
Building model from scratch...
Processing 1 fold inputs.
Processing fold input 301aa_3DB6
Checking we can load the model parameters...
2025-01-14 20:26:41.192283: W external/xla/xla/service/gpu/nvptx_compiler.cc:930] The NVIDIA driver's CUDA version is 12.2 which is older than the PTX compiler version 12.6.77. Because the driver is older than the PTX compiler version, XLA is disabling parallel compilation, which may slow down compilation. You should update your NVIDIA driver or use the NVIDIA-provided CUDA forward compatibility packages.
Skipping data pipeline...
Output directory: /home/user3/alphafold/workspaces/jyj/output/gpu/baseline/301aa_3db6
Writing model input JSON to /home/user3/alphafold/workspaces/jyj/output/gpu/baseline/301aa_3db6
Predicting 3D structure for 301aa_3DB6 for seed(s) (1,)...
Featurising data for seeds (1,)...
Featurising 301aa_3DB6 with rng_seed 1.
I0114 20:26:50.429166 139841220543104 pipeline.py:160] processing 301aa_3DB6, random_seed=1
Featurising 301aa_3DB6 with rng_seed 1 took 7.66 seconds.
Featurising data for seeds (1,) took 11.94 seconds.
Running model inference for seed 1...
Running model inference for seed 1 took 77.58 seconds.
Extracting output structures (one per sample) for seed 1...
/home/user3/alphafold/workspaces/jyj/alphafold3-3.0.0/src/alphafold3/model/confidences.py:332: RuntimeWarning: invalid value encountered in divide
return np.nanmean(value * mask_with_nan, axis=axis) / np.nanmean(
/home/user3/alphafold/workspaces/jyj/alphafold3-3.0.0/src/alphafold3/model/confidences.py:522: RuntimeWarning: Mean of empty slice
xchain = np.nanmean(
/home/user3/alphafold/workspaces/jyj/alphafold3-3.0.0/src/alphafold3/model/confidences.py:548: RuntimeWarning: Mean of empty slice
return np.nanmean(np.stack([xchain_row_agg, xchain_col_agg], axis=0), axis=0)
Extracting output structures (one per sample) for seed 1 took 0.38 seconds.
Running model inference and extracting output structures for seed 1 took 77.96 seconds.
Running model inference and extracting output structures for seeds (1,) took 77.96 seconds.
Writing outputs for 301aa_3DB6 for seed(s) (1,)...
Done processing fold input 301aa_3DB6.
Done processing 1 fold inputs.
While loading profile data using TensorBoard, the output is:
tensorboard --logdir /tmp/alphafold-run/jax-trace/
2025-01-14 19:28:54.258701: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-01-14 19:28:54.275882: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1736854134.297137 353978 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1736854134.305643 353978 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-01-14 19:28:54.323284: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 AVX512_FP16 AVX_VNNI AMX_TILE AMX_INT8 AMX_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
W0114 19:28:59.056233 139871098802816 server_ingester.py:187] Failed to communicate with data server at localhost:38081: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "failed to connect to all addresses; last error: UNKNOWN: ipv4:59.79.4.198:41595: Endpoint is neither UDS or TCP loopback address."
debug_error_string = "UNKNOWN:Error received from peer {created_time:"2025-01-14T19:28:59.055938619+08:00", grpc_status:14, grpc_message:"failed to connect to all addresses; last error: UNKNOWN: ipv4:59.79.4.198:41595: Endpoint is neither UDS or TCP loopback address."}"
>
Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all
TensorBoard 2.18.0 at http://localhost:6006/ (Press CTRL+C to quit)
In the TensorBoard it turns out like this:
As shown in the output, the inference time is 77s, preprocess time is 7 + 12 s, however in the figure above, its a lot shorter...
In the ops_profile, it turns out to be without any FLOPS staff:
In the kernel stats it seems to be fine, but I don't know.
I really don't know what happened.
For further validation, I also tested the given example in JAX Doc:
import jax
with jax.profiler.trace('/tmp/jax-trace'):
# Run the operations to be profiled
key = jax.random.key(0)
x = jax.random.normal(key, (5000, 5000))
y = x @ x
y.block_until_ready()
Its running output is:
python ../test/test-tensorboard.py
2025-01-14 19:32:25.844744: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1736854345.862460 355628 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1736854345.867625 355628 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-01-14 19:32:27.389759: W external/xla/xla/service/gpu/nvptx_compiler.cc:930] The NVIDIA driver's CUDA version is 12.2 which is older than the PTX compiler version 12.6.77. Because the driver is older than the PTX compiler version, XLA is disabling parallel compilation, which may slow down compilation. You should update your NVIDIA driver or use the NVIDIA-provided CUDA forward compatibility packages.
Guessing I forgot using the y.block_until_ready() staff?
I am a rookie, and don't know much about this, I really don't know if I am doing right or not.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I am currently running AlphaFold3 (implemented by JAX), and I want to profile the whole project to find out the bottleneck in it.
Following the official document:
https://jax.readthedocs.io/en/latest/profiling.html
I wrote the following code:
It has been perfect since then.
However, when I used tensorboard to visualize it, unlike the official document, there is nothing visualized at all.
The running output is:
While loading profile data using TensorBoard, the output is:
In the TensorBoard it turns out like this:
As shown in the output, the inference time is 77s, preprocess time is 7 + 12 s, however in the figure above, its a lot shorter...
In the ops_profile, it turns out to be without any FLOPS staff:
In the kernel stats it seems to be fine, but I don't know.
I really don't know what happened.
For further validation, I also tested the given example in JAX Doc:
Its running output is:
Guessing I forgot using the
y.block_until_ready()
staff?I am a rookie, and don't know much about this, I really don't know if I am doing right or not.
Beta Was this translation helpful? Give feedback.
All reactions