Fast data loading feedback (`--load_fast=true`; “RustBoard”) #4784

wchargin · 2021-03-16T20:25:27Z

This thread is for tracking feedback about TensorBoard’s experimental
mode for fast data loading. Typical speedups range from 100× to 400×.

Who should try this: Anyone who’s found TensorBoard’s data loading
to be slower than they’d like.

Who shouldn’t try this: Windows users (for now).

Feedback: Feedback form, or reply on this thread.

Try it out

To try this out, please uninstall all copies of TensorBoard and then
install the latest version of tb-nightly:

pip uninstall -y tensorboard tb-nightly &&
pip install tb-nightly  # must have at least tb-nightly==2.5.0a20210316

Then, invoke TensorBoard with the --load_fast=true flag:

tensorboard --logdir /path/to/logs --load_fast true

Use TensorBoard as you usually would. It should work the same way, just
faster.

Feedback

You can respond to this anonymous Google Form, or reply on this
thread, or open a new issue. Let us know: did it work? how much faster
was it? any suggestions or requests?

Known issues

We know about these, but please let us know if they matter for you, so
that we can prioritize working on them:

Windows is not supported out of the box.
Some third-party plugins may need to be updated to work with this
mode (e.g., the profile plugin).
…

FAQ

What does “data loading” include?

It includes time spent reading files in your logdir. It does not include
time spent painting charts on the frontend.

What is the `--load_fast` flag?

Pass --load_fast=true to tell TensorBoard to use a new data loading
mechanism, which is generally hundreds of times faster.

Is `--load_fast=true` right for me?

Currently, this mode is supported on Linux and macOS. If you are
interested in using it on other platforms, ping @wchargin and I’ll show
you how to build it.

Most features of TensorBoard are expected to work with the new data
loading mechanism. All standard TensorBoard dashboards (scalars, images,
etc.) should work, and flags like --reload_interval should work, too.
You can use logdirs on local disk or on GCS buckets (public or private).

Do I need to have TensorFlow installed?

No.

What’s happening under the hood?

Instead of crawling your logdir in a mixture of Python and C++ code with
a lot of locking, cross-language marshalling, and slow data manipulation
in Python, we read the data in a dedicated subprocess. This program is
written in Rust and is optimized for concurrent reading and serving.
More design details here.

The text was updated successfully, but these errors were encountered:

Summary: We’d like to set `--load_fast=auto` as the default for TensorBoard 2.5. To make that less surprising, we now print an informational message when `--load_fast` is set to `auto` and the data server is actually used. We don’t show it with `--load_fast=true`; if you pass that, we assume that you know what you’re doing. The message looks like: ``` $ tensorboard --logdir /tmp/logs --bind_all --load_fast=auto 2021-03-17 11:41:51.151546: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory 2021-03-17 11:41:51.151567: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. NOTE: Using experimental fast data loading logic. To disable, pass "--load_fast=false" and report issues on GitHub. More details: #4784 TensorBoard 2.5.0a0 at http://localhost:6007/ (Press CTRL+C to quit) ``` Test Plan: Run with `--load_fast` set to `false`, `auto`, and `true`, and note that the message only appears when set to `auto`. Then uninstall the data server and run with `auto`, and note that the message does not appear. wchargin-branch: cli-data-server-message wchargin-source: ff24dc84b7b225b5351295c45d106f136933997a

Summary: We’d like to set `--load_fast=auto` as the default for TensorBoard 2.5. To make that less surprising, we now print an informational message when `--load_fast` is set to `auto` and the data server is actually used. We don’t show it with `--load_fast=true`; if you pass that, we assume that you know what you’re doing. The message looks like: ``` $ tensorboard --logdir /tmp/logs --bind_all --load_fast=auto 2021-03-17 11:41:51.151546: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory 2021-03-17 11:41:51.151567: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. NOTE: Using experimental fast data loading logic. To disable, pass "--load_fast=false" and report issues on GitHub. More details: #4784 TensorBoard 2.5.0a0 at http://localhost:6007/ (Press CTRL+C to quit) ``` Test Plan: Run with `--load_fast` set to `false`, `auto`, and `true`, and note that the message only appears when set to `auto`. Then uninstall the data server and run with `auto`, and note that the message does not appear. wchargin-branch: cli-data-server-message

tgolsson · 2021-03-19T10:02:40Z

Hello!

Very much interested in this, as we currently maintain a custom entrypoint to make Tensorboard work at all with our data sizes. Unfortunately, I can't get this to work anywhere. Using the latest nightly docker image I get the following error:

root@15bc33cc211f:/# tensorboard --logdir foobar --load_fast=true
Error: Os { code: 99, kind: AddrNotAvailable, message: "Cannot assign requested address" }
Traceback (most recent call last):
  File "/usr/local/bin/tensorboard", line 8, in <module>
    sys.exit(run_main())
  File "/usr/local/lib/python3.6/dist-packages/tensorboard/main.py", line 46, in run_main
    app.run(tensorboard.main, flags_parser=tensorboard.configure)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 303, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "/usr/local/lib/python3.6/dist-packages/tensorboard/program.py", line 267, in main
    return runner(self.flags) or 0
  File "/usr/local/lib/python3.6/dist-packages/tensorboard/program.py", line 283, in _run_serve_subcommand
    server = self._make_server()
  File "/usr/local/lib/python3.6/dist-packages/tensorboard/program.py", line 433, in _make_server
    (data_provider, deprecated_multiplexer) = self._make_data_provider()
  File "/usr/local/lib/python3.6/dist-packages/tensorboard/program.py", line 425, in _make_data_provider
    ingester.start()
  File "/usr/local/lib/python3.6/dist-packages/tensorboard/data/server_ingester.py", line 150, in start
    % popen.poll()
RuntimeError: Data server exited with 1; check stderr for details

Presumably it tries to bind some port that's already in use by another process; unfortunately it doesn't say which one.

Also, it doesn't seem to work with logdir_spec, only logdir. This isn't a huge pain, but the error message just states that I didn't pass logdir -- it should probably explicitly state that load_fast and logdir_spec are incompatible.

wchargin · 2021-03-19T17:34:43Z

@tgolsson: Hi; thank you for your feedback! I hadn’t looked into Docker
at all. We bind to port 0, which requests an arbitrary free port to the
OS, so it looks like it’s not a port issue but an IPv6 host issue. I’ve
filed #4801 and will take a look. I’ve posted therein what I think
should be a workaround, in case you’re interested in that sort of thing.

edit: Fixed in #4804; confirmed fix in Docker nightlies.

Also, it doesn't seem to work with logdir_spec, only logdir. This isn't a huge pain, but the error message just states that I didn't pass logdir -- it should probably explicitly state that load_fast and logdir_spec are incompatible.

Yep. As of #4794, if you use --load_fast=auto, we’ll automatically
detect unsupported invocations (including --logdir_spec) and fall back
to the old codepaths. I can also try to make the error more explicit
particularly for --logdir_spec. Filed #4802.

This is super helpful feedback; thank you.

brychcy · 2021-04-08T07:53:11Z

With tensorboard-plugin-profile (2.4.0) installed, I'm getting errors in the log:

Exception in thread DynamicProfilePluginIsActiveThread:
Traceback (most recent call last):
  File "/Users/till/homebrew2/opt/[email protected]/Frameworks/Python.framework/Versions/3.8/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/Users/till/homebrew2/opt/[email protected]/Frameworks/Python.framework/Versions/3.8/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/till/tfnightly-py3.8/lib/python3.8/site-packages/tensorboard_plugin_profile/profile_plugin.py", line 311, in compute_is_active
    self._is_active = any(self.generate_run_to_tools())
  File "/Users/till/tfnightly-py3.8/lib/python3.8/site-packages/tensorboard_plugin_profile/profile_plugin.py", line 693, in generate_run_to_tools
    plugin_assets = self.multiplexer.PluginAssets(PLUGIN_NAME)
AttributeError: 'NoneType' object has no attribute 'PluginAssets'

(They disappear with --load_fast=false)

wchargin · 2021-04-08T16:42:02Z

Hi @brychcy—thanks! Yes, this is true. The profile plugin uses
non-standard approaches to load its data and so won’t work out of the
box with --load_fast. I’ll see if we can get it to work, but in the
meantime you’ll have to either pass --load_fast=false (if you want to
use the profile plugin) or uninstall the profile plugin package (if you
don’t care about it and want to silence the errors).

Added a note to the “Known issues” section; thank you!

wchargin · 2021-04-08T17:10:21Z

@brychcy: I’ve sent the profiler folks a patch:
tensorflow/profiler#298

Their build appears to be pretty broken, so I’m not sure how long it
will take them to integrate this and push a release.

tgolsson · 2021-04-29T18:27:09Z

@wchargin Not quite feedback, but I'm wondering if there's any thoughts on multi-directory Rustboard (--logdir dir_a,dir_b in old syntax)? I started doing the work but figured I might ask in case it was intentionally removed or there's a WIP somewhere I'm not seeing.

wchargin · 2021-04-29T22:11:11Z

@tgolsson: Good question! I was thinking of instead supporting a more
general mechanism that also resolves requests like #1708. Imagine
something like:

$ tensorboard daemon start
$ tensorboard daemon add dir_a
$ tensorboard --daemon --bind_all
$ tensorboard daemon add dir_b

That is, you could add or remove log directories at runtime without
having to relaunch TensorBoard or discarding existing loading progress,
and also in a way that naturally supports remote filesystems and doesn't
require setting up symlink trees.

Opened #4923 to track this, and would be happy to hear your thoughts.

Raphtor · 2021-05-11T14:42:54Z

I am getting a lot of warnings about too many open files -- is there a way to reduce or cap the number of open file descriptors?

2021-05-11T14:31:46Z WARN rustboard_core::run] Failed to open event file EventFileBuf("[RUN NAME]"): Os { code: 24, kind: Other, message: "Too many open files" }

I don't have that many runs (~2000), so it shouldn't really be an issue. Using lsof to count the number of open FDs shows over 12k being used...

>> lsof | awk '{print $1}' | sort | uniq -c | sort -r -n | head
   6210 tokio-run
   6210 Reloader-
   1035 StdinWatc
   1035 server
   1035 Reloader
    184 gmain
    168 gdbus
    134 grpc_glob
     85 bash
     80 snapd

Compared to <500 in "slow" mode.

>> lsof | awk '{print $1}' | sort | uniq -c | sort -r -n | head
    427 tensorboa
    184 gmain
    168 gdbus
     85 bash
     80 snapd
     72 systemd
     71 screen
     52 dconf\x20
     51 dbus-daem
     48 llvmpipe-

In my case, the "slow" mode actually loads files faster since it doesn't run into this issue.

wchargin · 2021-05-11T16:39:35Z

@Raphtor: interesting, thank you! Both the old and new codepaths keep an
open fd for each event file, so I had considered this but expected it
not to be a big problem. Let’s follow up in #4955.

sjincho · 2021-06-26T04:25:45Z

Using --load_fast under GKE with workload identity causes 401 Unauthorized error in rustboard_core::logdir when accessing GCS buckets.

It works fine if I set --load_fast=false.

8bitmp3 · 2021-07-22T14:46:07Z

Fast data loading may be causing issues with the profiler tensorflow/profiler#344 (one of several issues mentioning this problem recently) - a possible solution for now is to switch it off with %tensorboard --logdir=logs --load_fast=false cc @Terranlee @jimicy @yisitu

8bitmp3 · 2021-08-02T20:19:53Z

Update: try the latest Profiler plugin v2.5 (pip install tensorboard_plugin_profile (or tensorboard_plugin_profile==2.5.0)). Then, launch (e.g. %tensorboard --logdir=logs without the --load_fast switch) and select Profiler. Thanks @yisitu 👍

yisitu · 2021-08-02T22:23:30Z

You're welcome, happy to help!

jstremme · 2021-09-03T15:19:55Z

Anyone else landing here because they're following instructions from this link regarding using Tensorboard in AzureML?

yisitu · 2021-09-03T16:07:40Z

Closing as the issue has been resolved after I have released tensorboard_plugin_profile 2.5.0.

stephanwlee · 2021-09-03T16:08:55Z

Ah, we would like to keep this issue opened to solicit more feedbacks on the feature. Reopening.

Corwinpro · 2023-08-11T08:14:41Z

authentication via default service account is indeed not working when using logdir in 2.8.0, we had to run with --load_fast=false to get it to work. Any plans to support default service account credentials? Also why was this experimental feature turned on by default?

Hi, would you mind sharing a bit more information? I might be able to help but that I would need to know how to reproduce your issue. (I am replying here because I contributed to a similar issue in the past, but of course it is up for the repo owners to make the decision). Thank you!

samos123 · 2023-08-14T19:06:26Z

We have a fairly exotic setup, but you might be able to reproduce it by creating a GCE VM with a custom service account that has GCS permissions, then running tensoarboard --logdir gcs://your-bucket --load_fast=True, this will automatically use the credentials using the GCE metadata server and shoudl result in permission errors. Try the same with --load_fast=False and it works with default Service Account credentials.

Corwinpro · 2023-08-14T19:22:08Z

@samos123 I assume you meant GKE... The error should not be there as I thought I fixed that. Could you please check which server version you are using? I guess something like rustboard --version. There was a release a few weeks ago but that is only applicable for tf>=2.12 IIUC

samos123 · 2023-08-14T22:00:41Z

GKE + Workload Identity would use a similar mechanism and I would expect to have same issue. We were using 2.8.0. Could you share the code where the authentication happens with --load_fast=True . I would be able to pin point if it would work with our custom setup.

Corwinpro · 2023-08-14T22:08:39Z

@samos123 sorry for confusion, I didn't know that the GCE abbreviation exists.

The PR was #5939 , in particular it gets a GCP Access Token using the gcp_auth::AuthenticationManager (gcp_auth is a 3rd party crate) in tensorboard/data/server/gcs/auth.rs. Overall, I'd try to see if gcp_auth works for your setup.

mueller91 · 2023-08-20T13:34:14Z

On my ubuntu 20.04.6LTS Nvidia A-100 DGX, i cannot get fast loading to work:

Could not start data server: exited with 1; check stderr for details. Try with --load_fast=false and report issues on GitHub. Details: https://github.com/tensorflow/tensorboard/issues/4784

that is all that I get.

Corwinpro · 2023-08-20T13:56:22Z

@samos123 @mueller91 Can you try https://github.com/tensorflow/tensorboard/releases/tag/2.12.0 or above?

mueller91 · 2023-09-04T07:29:44Z

@Corwinpro

Does not change it.
GLIBC missing might be responsible? However, I have installed it via apt install glibc-source.

[...]
Successfully installed tensorboard-2.14.0
> tensorboard --logdir=. --bind_all --load_fast=true                                                                                           (tensorboard) 
TensorFlow installation not found - running with reduced feature set.
[...]anaconda3/envs/tensorboard/lib/python3.10/site-packages/tensorboard_data_server/bin/server: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.33' not found (required by [...]anaconda3/envs/tensorboard/lib/python3.10/site-packages/tensorboard_data_server/bin/server)
[...]anaconda3/envs/tensorboard/lib/python3.10/site-packages/tensorboard_data_server/bin/server: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by [...]anaconda3/envs/tensorboard/lib/python3.10/site-packages/tensorboard_data_server/bin/server)

Could not start data server: exited with 1; check stderr for details.
    Try with --load_fast=false and report issues on GitHub. Details:
    https://github.com/tensorflow/tensorboard/issues/4784

wookayin · 2023-09-13T00:28:25Z

Important

UPDATE after #6578: As of tensorboard_data_server==0.7.2 for tensorboard 2.15+, GLIBC 2.29 or higher is required.
The pre-built wheel shipped with tensorboard >= 2.12 (tensorboard_data_server == 0.7, 0.7.1), download from PyPI, will require GLIBC version 2.34 or higher.

On Ubuntu 20.04 Linux machines where glibc version is 2.31, the rustboard server will fail to launch, trying to find glibc 2.32 - 2.34. Ubuntu 22.04 will be fine, as it's shipped with GLIBC 2.35.

TensorFlow installation not found - running with reduced feature set.
$CONDA_PREFIX/lib/python3.11/site-packages/tensorboard_data_server/bin/server: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.33' not found (required by $CONDA_PREFIX/lib/python3.11/site-packages/tensorboard_data_server/bin/server)
$CONDA_PREFIX/lib/python3.11/site-packages/tensorboard_data_server/bin/server: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by $CONDA_PREFIX/lib/python3.11/site-packages/tensorboard_data_server/bin/server)
$CONDA_PREFIX/lib/python3.11/site-packages/tensorboard_data_server/bin/server: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by $CONDA_PREFIX/lib/python3.11/site-packages/tensorboard_data_server/bin/server)
Could not start data server: exited with 1; check stderr for details.

Workaround: On ~~Ubuntu 20.04~~ or other old systems where GLIBC version is too old, use tensorboard == 2.11 (and tensorboard_data_server == 0.6.1).

FYI, how to figure out the GLIBC version on the system:

$ ldd --version | grep GLIB
ldd (Ubuntu GLIBC 2.31-0ubuntu9.9) 2.31
$ cat /etc/lsb-release | grep DESCRIPTION
DISTRIB_DESCRIPTION="Ubuntu 20.04.5 LTS"

Verifying that tensorboard_data_server>=0.7 is built on too high version of GLIBC:

$ objdump -T $(python -c "from tensorboard_data_server import server_binary; print(server_binary())")  | grep GLIBC
...
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.34  pthread_create
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.34  __libc_start_main

I'd like to kindly ask the tensorboard team to lower the GLIBC requirement in future releases. I will open an issue if needed. -> #6578

bmd3k · 2023-09-13T14:18:35Z

@wookayin . Thanks for flagging. Yes, please open a new issue!

profPlum · 2023-10-13T17:39:55Z

@wchargin

Currently, this mode is supported on Linux and macOS.

Hello, I'm very excited for this feature as tensorboard's speed has been a big pain point so far. BUT when I try to use it, it tells me it's not supported on MacOS:

Option --load_fast=true not available: TensorBoard data server not supported on this platform.

You say it is supported on MacOS though, so what's going on here? I've got MacBookPro17,1; Apple M1 chips; MacOS Ventura, version 13.4.1; tb-nightly Version: 2.15.0a20231013; tf-nightly-macos Version: 2.16.0.dev20231013.

P.S. I've gotten same results using non-nightly tensorflow-macos & no tensorflow at all. Also I followed your instructions exactly to uninstall tensorboard & tb_nightly before reinstalling tb_nightly.

wchargin · 2023-10-13T19:38:24Z

@profPlum: Hazarding a guess:

Apple M1 chips

That's probably your problem. The tensorboard-data-server package
currently ships macOS wheels for x86-64 but not for arm64.

If interested, you can build it yourself easily. I just tested it on my
laptop from scratch and had it running in three minutes. Here's how:

If you don't already have a recent version of the Rust toolchain,
install it from https://www.rust-lang.org/.
Clone this repository (TensorBoard) into, say, ~/git/tensorboard.
In the clone, change into the tensorboard/data/server/ directory.
Run cargo build --release. This will build a data server binary
into target/release/rustboard/.
Set the TENSORBOARD_DATA_SERVER_BINARY environment variable to the
full path to that binary: e.g.,
```
export TENSORBOARD_DATA_SERVER_BINARY=~/git/tensorboard/tensorboard/data/server/target/release/rustboard
```
(edit: fixed var name)
Change directories out of the TensorBoard repository to avoid Python
import issues, then launch tensorboard with --load_fast true.

If you want to double-check that it's using the data server, you can
navigate to http://localhost:6006/data/environment and see whether the
debug.data_provider field lists a GrpcDataProvider (fast) or a
MultiplexerDataProvider (slow). Or, you can set the environment
variable RUST_LOG=debug to see the data server logs.

(I don't currently work on TensorBoard, so consider this not an official
response but just a community member who at one point knew this part of
the code very well. :-) )

profPlum · 2023-10-16T16:19:35Z

@wchargin Thanks I appreciate the help! (& I'll let you know if it works)
Do you think it is likely that TB devs will give official support to M1 chips soon?

profPlum · 2023-10-19T19:29:29Z

@wchargin Hi again, I tried your instructions verbatim and it says roughly the same:

TensorFlow installation not found - running with reduced feature set.
Option --load_fast=true not available: TensorBoard data server not supported on this platform.

But to clarify: did you want to me to launch the original (pip) tensorboard again? That point confused me and it is what I did but I'm not sure if it's what you meant.

P.S. With: fresh install of tb_nightly==2.15.0a20231019 & cargo version: 1.73.0 (9c4383fb5 2023-08-26). Also I got same results on a linux docker container.

Frn1nd0 · 2023-10-21T11:56:17Z

@wchargin Hi, I got issue when running this:
%load_ext tensorboard
%tensorboard --logdir output

It shows google interface with:
403. That’s an error.
That’s all we know.

Could you please guide me with this? Thanks

bmd3k · 2023-10-23T17:42:56Z

@Frn1nd0 , your issue is unrelated to fast data loading. Instead you have run into a recent regression with compatibility with Chrome. The Colab team have been investigating. We expect them to keep us updated at the following issue:

googlecolab/colabtools#3990

Frn1nd0 · 2023-10-23T23:49:50Z

@bmd3k Thanks for the clarification, appreciate that! Hope they can fix this soon.

wookayin · 2023-10-24T01:15:36Z

Update: #6578 is fixed; as of tensorboard 2.15 GLIBC minimum requirement is 2.29 (compatible with Ubuntu 20.04)

wchargin · 2023-11-12T22:04:20Z

@profPlum: Oops, sorry, I wrote the environment variable wrong: it
should be TENSORBOARD_DATA_SERVER_BINARY. Maybe try again thus?

did you want to me to launch the original (pip) tensorboard again?

Yes.

davidxia · 2024-02-24T05:04:44Z

Using --load_fast under GKE with workload identity causes 401 Unauthorized error in rustboard_core::logdir when accessing GCS buckets.

It works fine if I set --load_fast=false.

Is this still a bug in the recent versions? I can repro with version 2.11.2.

zerzerzerz · 2024-03-24T08:04:56Z

I would like to share my experience about how to solve the problem about

Option --load_fast=true not available: TensorBoard data server not supported on this platform.

My OS is Ubuntu 18.04.5 LTS, Python 3.10.14 and tensorboard 2.12.1.
The problem is caused by the miss of binary of tensorboard-data-server. At first, I use pip to install it:

pip install tensorboard-data-server

It runs successfully, but when I run following Python codes, it outputs None:

import tensorboard_data_server

res = tensorboard_data_server.server_binary()
print(res)

It seems that the binary of tensorboard-data-server is not installed properly.
So I use conda to install it like

conda install tensorboard
conda install chardet

After installation, I run the above Python codes again and it successfully outputs the path to binary of tensorboard-data-server like /home/<username>/miniconda3/envs/py310/lib/python3.10/site-packages/tensorboard_data_server/bin/server
It seems that pip cannot install binary of tensorboard-data-server, but conda can.

Finally, I can run tensorboard --logdir=<path/to/logdir> --load_fast=true and it becomes much faster than before.

valerie-lth · 2024-03-25T22:44:14Z

I'm using Chrome on Macbook, I get these errors:

The localhost page either shows nothing or empty grids when --load_fast=false.

It shows the plots in the grids when --load_fast=false but the error messages persist.

rajnish159 · 2024-04-16T08:19:29Z

2024-04-16 13:41:07.757664: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2024-04-16 13:41:07.757723: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2024-04-16 13:41:07.801242: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-04-16 13:41:09.133758: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2024-04-16 13:41:09.133903: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2024-04-16 13:41:09.133924: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2024-04-16 13:41:11.219441: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2024-04-16 13:41:11.219624: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory
2024-04-16 13:41:11.219723: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory
2024-04-16 13:41:11.223136: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory
2024-04-16 13:41:11.223275: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory
2024-04-16 13:41:11.223359: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory
2024-04-16 13:41:11.223383: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1934] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.

bhack · 2024-09-11T12:56:11Z

it is not usable with Google gcsfuse. See #6790

wchargin added type:support core:rustboard //tensorboard/data/server/... labels Mar 16, 2021

wchargin mentioned this issue Mar 19, 2021

In Docker, RustBoard can't bind to ::1 #4801

Closed

wchargin mentioned this issue Mar 31, 2021

Large memory consumption 0.4 #766

Open

stephanwlee mentioned this issue Apr 9, 2021

Tensorboard does not load all data at once, have to refresh multiple times #4852

Closed

wchargin mentioned this issue May 11, 2021

Too many open files under RustBoard (EMFILE) #4955

Open

bmd3k mentioned this issue Jun 25, 2021

Profiler in colab does not work with rustboard. #5088

Closed

yisitu self-assigned this Sep 3, 2021

yisitu closed this as completed Sep 3, 2021

stephanwlee reopened this Sep 3, 2021

yisitu assigned stephanwlee Sep 3, 2021

wookayin mentioned this issue Sep 13, 2023

rustboard's GLIBC dependency is too high, need to lower GLIBC requirement #6578

Closed

gabrielbcn mentioned this issue Dec 4, 2023

VSCode Tensorboard extension not finding events data microsoft/vscode-tensorboard#9

Closed

2575044704 mentioned this issue Jan 30, 2024

检测不到GPU是怎么回事？ Akegarasu/lora-scripts#345

Closed

jajawunderbar mentioned this issue Feb 10, 2024

Tensorflow not found arplaboratory/learning-to-fly#5

Open

rajnish159 mentioned this issue Apr 12, 2024

Unable to display the output. IBM/Grapher#9

Open

yuchao532 mentioned this issue Nov 22, 2024

新版在阿里云克隆后会各种兼容性报错 Akegarasu/lora-scripts#587

Open

Fast data loading feedback (--load_fast=true; “RustBoard”) #4784

Fast data loading feedback (--load_fast=true; “RustBoard”) #4784

Comments

wchargin commented Mar 16, 2021 • edited Loading

Try it out

Feedback

Known issues

FAQ

What does “data loading” include?

What is the --load_fast flag?

Is --load_fast=true right for me?

Do I need to have TensorFlow installed?

What’s happening under the hood?

tgolsson commented Mar 19, 2021

wchargin commented Mar 19, 2021 • edited Loading

brychcy commented Apr 8, 2021

wchargin commented Apr 8, 2021

wchargin commented Apr 8, 2021

tgolsson commented Apr 29, 2021

wchargin commented Apr 29, 2021

Raphtor commented May 11, 2021

wchargin commented May 11, 2021

sjincho commented Jun 26, 2021

8bitmp3 commented Jul 22, 2021

8bitmp3 commented Aug 2, 2021

yisitu commented Aug 2, 2021

jstremme commented Sep 3, 2021

yisitu commented Sep 3, 2021

stephanwlee commented Sep 3, 2021 • edited Loading

Corwinpro commented Aug 11, 2023

samos123 commented Aug 14, 2023

Corwinpro commented Aug 14, 2023

samos123 commented Aug 14, 2023

Corwinpro commented Aug 14, 2023 • edited Loading

mueller91 commented Aug 20, 2023

Corwinpro commented Aug 20, 2023 • edited Loading

mueller91 commented Sep 4, 2023

wookayin commented Sep 13, 2023 • edited Loading

bmd3k commented Sep 13, 2023

profPlum commented Oct 13, 2023 • edited Loading

wchargin commented Oct 13, 2023 • edited Loading

profPlum commented Oct 16, 2023 • edited Loading

profPlum commented Oct 19, 2023 • edited Loading

Frn1nd0 commented Oct 21, 2023

bmd3k commented Oct 23, 2023

Frn1nd0 commented Oct 23, 2023

wookayin commented Oct 24, 2023

wchargin commented Nov 12, 2023

davidxia commented Feb 24, 2024

zerzerzerz commented Mar 24, 2024

valerie-lth commented Mar 25, 2024 • edited Loading

rajnish159 commented Apr 16, 2024

bhack commented Sep 11, 2024

Fast data loading feedback (`--load_fast=true`; “RustBoard”) #4784

Fast data loading feedback (`--load_fast=true`; “RustBoard”) #4784

wchargin commented Mar 16, 2021 •

edited

Loading

What is the `--load_fast` flag?

Is `--load_fast=true` right for me?

wchargin commented Mar 19, 2021 •

edited

Loading

stephanwlee commented Sep 3, 2021 •

edited

Loading

Corwinpro commented Aug 14, 2023 •

edited

Loading

Corwinpro commented Aug 20, 2023 •

edited

Loading

wookayin commented Sep 13, 2023 •

edited

Loading

profPlum commented Oct 13, 2023 •

edited

Loading

wchargin commented Oct 13, 2023 •

edited

Loading

profPlum commented Oct 16, 2023 •

edited

Loading

profPlum commented Oct 19, 2023 •

edited

Loading

valerie-lth commented Mar 25, 2024 •

edited

Loading