Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast data loading feedback (--load_fast=true; “RustBoard”) #4784

Open
wchargin opened this issue Mar 16, 2021 · 54 comments
Open

Fast data loading feedback (--load_fast=true; “RustBoard”) #4784

wchargin opened this issue Mar 16, 2021 · 54 comments
Labels
core:rustboard //tensorboard/data/server/... type:support

Comments

@wchargin
Copy link
Contributor

wchargin commented Mar 16, 2021

This thread is for tracking feedback about TensorBoard’s experimental
mode for fast data loading. Typical speedups range from 100× to 400×.

Who should try this: Anyone who’s found TensorBoard’s data loading
to be slower than they’d like.

Who shouldn’t try this: Windows users (for now).

Feedback: Feedback form, or reply on this thread.

Try it out

To try this out, please uninstall all copies of TensorBoard and then
install the latest version of tb-nightly:

pip uninstall -y tensorboard tb-nightly &&
pip install tb-nightly  # must have at least tb-nightly==2.5.0a20210316

Then, invoke TensorBoard with the --load_fast=true flag:

tensorboard --logdir /path/to/logs --load_fast true

Use TensorBoard as you usually would. It should work the same way, just
faster.

Feedback

You can respond to this anonymous Google Form, or reply on this
thread, or open a new issue. Let us know: did it work? how much faster
was it? any suggestions or requests?

Known issues

We know about these, but please let us know if they matter for you, so
that we can prioritize working on them:

  • Windows is not supported out of the box.
  • Some third-party plugins may need to be updated to work with this
    mode (e.g., the profile plugin).

FAQ

What does “data loading” include?

It includes time spent reading files in your logdir. It does not include
time spent painting charts on the frontend.

What is the --load_fast flag?

Pass --load_fast=true to tell TensorBoard to use a new data loading
mechanism, which is generally hundreds of times faster.

Is --load_fast=true right for me?

Currently, this mode is supported on Linux and macOS. If you are
interested in using it on other platforms, ping @wchargin and I’ll show
you how to build it.

Most features of TensorBoard are expected to work with the new data
loading mechanism. All standard TensorBoard dashboards (scalars, images,
etc.) should work, and flags like --reload_interval should work, too.
You can use logdirs on local disk or on GCS buckets (public or private).

Do I need to have TensorFlow installed?

No.

What’s happening under the hood?

Instead of crawling your logdir in a mixture of Python and C++ code with
a lot of locking, cross-language marshalling, and slow data manipulation
in Python, we read the data in a dedicated subprocess. This program is
written in Rust and is optimized for concurrent reading and serving.
More design details here.

@wchargin wchargin added type:support core:rustboard //tensorboard/data/server/... labels Mar 16, 2021
wchargin added a commit that referenced this issue Mar 17, 2021
Summary:
We’d like to set `--load_fast=auto` as the default for TensorBoard 2.5.
To make that less surprising, we now print an informational message when
`--load_fast` is set to `auto` and the data server is actually used. We
don’t show it with `--load_fast=true`; if you pass that, we assume that
you know what you’re doing. The message looks like:

```
$ tensorboard --logdir /tmp/logs --bind_all --load_fast=auto
2021-03-17 11:41:51.151546: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-03-17 11:41:51.151567: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.

NOTE: Using experimental fast data loading logic. To disable, pass
    "--load_fast=false" and report issues on GitHub. More details:
    #4784

TensorBoard 2.5.0a0 at http://localhost:6007/ (Press CTRL+C to quit)
```

Test Plan:
Run with `--load_fast` set to `false`, `auto`, and `true`, and note that
the message only appears when set to `auto`. Then uninstall the data
server and run with `auto`, and note that the message does not appear.

wchargin-branch: cli-data-server-message
wchargin-source: ff24dc84b7b225b5351295c45d106f136933997a
wchargin added a commit that referenced this issue Mar 17, 2021
Summary:
We’d like to set `--load_fast=auto` as the default for TensorBoard 2.5.
To make that less surprising, we now print an informational message when
`--load_fast` is set to `auto` and the data server is actually used. We
don’t show it with `--load_fast=true`; if you pass that, we assume that
you know what you’re doing. The message looks like:

```
$ tensorboard --logdir /tmp/logs --bind_all --load_fast=auto
2021-03-17 11:41:51.151546: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-03-17 11:41:51.151567: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.

NOTE: Using experimental fast data loading logic. To disable, pass
    "--load_fast=false" and report issues on GitHub. More details:
    #4784

TensorBoard 2.5.0a0 at http://localhost:6007/ (Press CTRL+C to quit)
```

Test Plan:
Run with `--load_fast` set to `false`, `auto`, and `true`, and note that
the message only appears when set to `auto`. Then uninstall the data
server and run with `auto`, and note that the message does not appear.

wchargin-branch: cli-data-server-message
@tgolsson
Copy link

Hello!

Very much interested in this, as we currently maintain a custom entrypoint to make Tensorboard work at all with our data sizes. Unfortunately, I can't get this to work anywhere. Using the latest nightly docker image I get the following error:

root@15bc33cc211f:/# tensorboard --logdir foobar --load_fast=true
Error: Os { code: 99, kind: AddrNotAvailable, message: "Cannot assign requested address" }
Traceback (most recent call last):
  File "/usr/local/bin/tensorboard", line 8, in <module>
    sys.exit(run_main())
  File "/usr/local/lib/python3.6/dist-packages/tensorboard/main.py", line 46, in run_main
    app.run(tensorboard.main, flags_parser=tensorboard.configure)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 303, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "/usr/local/lib/python3.6/dist-packages/tensorboard/program.py", line 267, in main
    return runner(self.flags) or 0
  File "/usr/local/lib/python3.6/dist-packages/tensorboard/program.py", line 283, in _run_serve_subcommand
    server = self._make_server()
  File "/usr/local/lib/python3.6/dist-packages/tensorboard/program.py", line 433, in _make_server
    (data_provider, deprecated_multiplexer) = self._make_data_provider()
  File "/usr/local/lib/python3.6/dist-packages/tensorboard/program.py", line 425, in _make_data_provider
    ingester.start()
  File "/usr/local/lib/python3.6/dist-packages/tensorboard/data/server_ingester.py", line 150, in start
    % popen.poll()
RuntimeError: Data server exited with 1; check stderr for details

Presumably it tries to bind some port that's already in use by another process; unfortunately it doesn't say which one.

Also, it doesn't seem to work with logdir_spec, only logdir. This isn't a huge pain, but the error message just states that I didn't pass logdir -- it should probably explicitly state that load_fast and logdir_spec are incompatible.

@wchargin
Copy link
Contributor Author

wchargin commented Mar 19, 2021

@tgolsson: Hi; thank you for your feedback! I hadn’t looked into Docker
at all. We bind to port 0, which requests an arbitrary free port to the
OS, so it looks like it’s not a port issue but an IPv6 host issue. I’ve
filed #4801 and will take a look. I’ve posted therein what I think
should be a workaround, in case you’re interested in that sort of thing.

edit: Fixed in #4804; confirmed fix in Docker nightlies.

Also, it doesn't seem to work with logdir_spec, only logdir. This isn't a huge pain, but the error message just states that I didn't pass logdir -- it should probably explicitly state that load_fast and logdir_spec are incompatible.

Yep. As of #4794, if you use --load_fast=auto, we’ll automatically
detect unsupported invocations (including --logdir_spec) and fall back
to the old codepaths. I can also try to make the error more explicit
particularly for --logdir_spec. Filed #4802.

This is super helpful feedback; thank you.

@brychcy
Copy link

brychcy commented Apr 8, 2021

With tensorboard-plugin-profile (2.4.0) installed, I'm getting errors in the log:

Exception in thread DynamicProfilePluginIsActiveThread:
Traceback (most recent call last):
  File "/Users/till/homebrew2/opt/[email protected]/Frameworks/Python.framework/Versions/3.8/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/Users/till/homebrew2/opt/[email protected]/Frameworks/Python.framework/Versions/3.8/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/till/tfnightly-py3.8/lib/python3.8/site-packages/tensorboard_plugin_profile/profile_plugin.py", line 311, in compute_is_active
    self._is_active = any(self.generate_run_to_tools())
  File "/Users/till/tfnightly-py3.8/lib/python3.8/site-packages/tensorboard_plugin_profile/profile_plugin.py", line 693, in generate_run_to_tools
    plugin_assets = self.multiplexer.PluginAssets(PLUGIN_NAME)
AttributeError: 'NoneType' object has no attribute 'PluginAssets'

(They disappear with --load_fast=false)

@wchargin
Copy link
Contributor Author

wchargin commented Apr 8, 2021

Hi @brychcy—thanks! Yes, this is true. The profile plugin uses
non-standard approaches to load its data and so won’t work out of the
box with --load_fast. I’ll see if we can get it to work, but in the
meantime you’ll have to either pass --load_fast=false (if you want to
use the profile plugin) or uninstall the profile plugin package (if you
don’t care about it and want to silence the errors).

Added a note to the “Known issues” section; thank you!

@wchargin
Copy link
Contributor Author

wchargin commented Apr 8, 2021

@brychcy: I’ve sent the profiler folks a patch:
tensorflow/profiler#298

Their build appears to be pretty broken, so I’m not sure how long it
will take them to integrate this and push a release.

@tgolsson
Copy link

@wchargin Not quite feedback, but I'm wondering if there's any thoughts on multi-directory Rustboard (--logdir dir_a,dir_b in old syntax)? I started doing the work but figured I might ask in case it was intentionally removed or there's a WIP somewhere I'm not seeing.

@wchargin
Copy link
Contributor Author

@tgolsson: Good question! I was thinking of instead supporting a more
general mechanism that also resolves requests like #1708. Imagine
something like:

$ tensorboard daemon start
$ tensorboard daemon add dir_a
$ tensorboard --daemon --bind_all
$ tensorboard daemon add dir_b

That is, you could add or remove log directories at runtime without
having to relaunch TensorBoard or discarding existing loading progress,
and also in a way that naturally supports remote filesystems and doesn't
require setting up symlink trees.

Opened #4923 to track this, and would be happy to hear your thoughts.

@Raphtor
Copy link

Raphtor commented May 11, 2021

I am getting a lot of warnings about too many open files -- is there a way to reduce or cap the number of open file descriptors?

2021-05-11T14:31:46Z WARN rustboard_core::run] Failed to open event file EventFileBuf("[RUN NAME]"): Os { code: 24, kind: Other, message: "Too many open files" }

I don't have that many runs (~2000), so it shouldn't really be an issue. Using lsof to count the number of open FDs shows over 12k being used...

>> lsof | awk '{print $1}' | sort | uniq -c | sort -r -n | head
   6210 tokio-run
   6210 Reloader-
   1035 StdinWatc
   1035 server
   1035 Reloader
    184 gmain
    168 gdbus
    134 grpc_glob
     85 bash
     80 snapd

Compared to <500 in "slow" mode.

>> lsof | awk '{print $1}' | sort | uniq -c | sort -r -n | head
    427 tensorboa
    184 gmain
    168 gdbus
     85 bash
     80 snapd
     72 systemd
     71 screen
     52 dconf\x20
     51 dbus-daem
     48 llvmpipe-

In my case, the "slow" mode actually loads files faster since it doesn't run into this issue.

@wchargin
Copy link
Contributor Author

@Raphtor: interesting, thank you! Both the old and new codepaths keep an
open fd for each event file, so I had considered this but expected it
not to be a big problem. Let’s follow up in #4955.

@sjincho
Copy link

sjincho commented Jun 26, 2021

Using --load_fast under GKE with workload identity causes 401 Unauthorized error in rustboard_core::logdir when accessing GCS buckets.

It works fine if I set --load_fast=false.

@8bitmp3
Copy link

8bitmp3 commented Jul 22, 2021

Fast data loading may be causing issues with the profiler tensorflow/profiler#344 (one of several issues mentioning this problem recently) - a possible solution for now is to switch it off with %tensorboard --logdir=logs --load_fast=false cc @Terranlee @jimicy @yisitu

@8bitmp3
Copy link

8bitmp3 commented Aug 2, 2021

Update: try the latest Profiler plugin v2.5 (pip install tensorboard_plugin_profile (or tensorboard_plugin_profile==2.5.0)). Then, launch (e.g. %tensorboard --logdir=logs without the --load_fast switch) and select Profiler. Thanks @yisitu 👍

@yisitu
Copy link

yisitu commented Aug 2, 2021

You're welcome, happy to help!

@jstremme
Copy link

jstremme commented Sep 3, 2021

Anyone else landing here because they're following instructions from this link regarding using Tensorboard in AzureML?

@yisitu yisitu self-assigned this Sep 3, 2021
@yisitu
Copy link

yisitu commented Sep 3, 2021

Closing as the issue has been resolved after I have released tensorboard_plugin_profile 2.5.0.

@yisitu yisitu closed this as completed Sep 3, 2021
@stephanwlee
Copy link
Contributor

stephanwlee commented Sep 3, 2021

Ah, we would like to keep this issue opened to solicit more feedbacks on the feature. Reopening.

@Corwinpro
Copy link
Contributor

authentication via default service account is indeed not working when using logdir in 2.8.0, we had to run with --load_fast=false to get it to work. Any plans to support default service account credentials? Also why was this experimental feature turned on by default?

Hi, would you mind sharing a bit more information? I might be able to help but that I would need to know how to reproduce your issue. (I am replying here because I contributed to a similar issue in the past, but of course it is up for the repo owners to make the decision). Thank you!

@samos123
Copy link

We have a fairly exotic setup, but you might be able to reproduce it by creating a GCE VM with a custom service account that has GCS permissions, then running tensoarboard --logdir gcs://your-bucket --load_fast=True, this will automatically use the credentials using the GCE metadata server and shoudl result in permission errors. Try the same with --load_fast=False and it works with default Service Account credentials.

@Corwinpro
Copy link
Contributor

@samos123 I assume you meant GKE... The error should not be there as I thought I fixed that. Could you please check which server version you are using? I guess something like rustboard --version. There was a release a few weeks ago but that is only applicable for tf>=2.12 IIUC

@samos123
Copy link

GKE + Workload Identity would use a similar mechanism and I would expect to have same issue. We were using 2.8.0. Could you share the code where the authentication happens with --load_fast=True . I would be able to pin point if it would work with our custom setup.

@Corwinpro
Copy link
Contributor

Corwinpro commented Aug 14, 2023

@samos123 sorry for confusion, I didn't know that the GCE abbreviation exists.

The PR was #5939 , in particular it gets a GCP Access Token using the gcp_auth::AuthenticationManager (gcp_auth is a 3rd party crate) in tensorboard/data/server/gcs/auth.rs. Overall, I'd try to see if gcp_auth works for your setup.

@mueller91
Copy link

On my ubuntu 20.04.6LTS Nvidia A-100 DGX, i cannot get fast loading to work:

Could not start data server: exited with 1; check stderr for details. Try with --load_fast=false and report issues on GitHub. Details: https://github.com/tensorflow/tensorboard/issues/4784

that is all that I get.

@Corwinpro
Copy link
Contributor

Corwinpro commented Aug 20, 2023

@mueller91
Copy link

@Corwinpro

Does not change it.
GLIBC missing might be responsible? However, I have installed it via apt install glibc-source.

[...]
Successfully installed tensorboard-2.14.0
> tensorboard --logdir=. --bind_all --load_fast=true                                                                                           (tensorboard) 
TensorFlow installation not found - running with reduced feature set.
[...]anaconda3/envs/tensorboard/lib/python3.10/site-packages/tensorboard_data_server/bin/server: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.33' not found (required by [...]anaconda3/envs/tensorboard/lib/python3.10/site-packages/tensorboard_data_server/bin/server)
[...]anaconda3/envs/tensorboard/lib/python3.10/site-packages/tensorboard_data_server/bin/server: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by [...]anaconda3/envs/tensorboard/lib/python3.10/site-packages/tensorboard_data_server/bin/server)

Could not start data server: exited with 1; check stderr for details.
    Try with --load_fast=false and report issues on GitHub. Details:
    https://github.com/tensorflow/tensorboard/issues/4784

@wookayin
Copy link

wookayin commented Sep 13, 2023

Important

UPDATE after #6578: As of tensorboard_data_server==0.7.2 for tensorboard 2.15+, GLIBC 2.29 or higher is required.
The pre-built wheel shipped with tensorboard >= 2.12 (tensorboard_data_server == 0.7, 0.7.1), download from PyPI, will require GLIBC version 2.34 or higher.

On Ubuntu 20.04 Linux machines where glibc version is 2.31, the rustboard server will fail to launch, trying to find glibc 2.32 - 2.34. Ubuntu 22.04 will be fine, as it's shipped with GLIBC 2.35.

TensorFlow installation not found - running with reduced feature set.
$CONDA_PREFIX/lib/python3.11/site-packages/tensorboard_data_server/bin/server: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.33' not found (required by $CONDA_PREFIX/lib/python3.11/site-packages/tensorboard_data_server/bin/server)
$CONDA_PREFIX/lib/python3.11/site-packages/tensorboard_data_server/bin/server: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by $CONDA_PREFIX/lib/python3.11/site-packages/tensorboard_data_server/bin/server)
$CONDA_PREFIX/lib/python3.11/site-packages/tensorboard_data_server/bin/server: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by $CONDA_PREFIX/lib/python3.11/site-packages/tensorboard_data_server/bin/server)
Could not start data server: exited with 1; check stderr for details.

Workaround: On Ubuntu 20.04 or other old systems where GLIBC version is too old, use tensorboard == 2.11 (and tensorboard_data_server == 0.6.1).

FYI, how to figure out the GLIBC version on the system:

$ ldd --version | grep GLIB
ldd (Ubuntu GLIBC 2.31-0ubuntu9.9) 2.31
$ cat /etc/lsb-release | grep DESCRIPTION
DISTRIB_DESCRIPTION="Ubuntu 20.04.5 LTS"

Verifying that tensorboard_data_server>=0.7 is built on too high version of GLIBC:

$ objdump -T $(python -c "from tensorboard_data_server import server_binary; print(server_binary())")  | grep GLIBC
...
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.34  pthread_create
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.34  __libc_start_main

I'd like to kindly ask the tensorboard team to lower the GLIBC requirement in future releases. I will open an issue if needed. -> #6578

@bmd3k
Copy link
Contributor

bmd3k commented Sep 13, 2023

@wookayin . Thanks for flagging. Yes, please open a new issue!

@profPlum
Copy link

profPlum commented Oct 13, 2023

@wchargin

Currently, this mode is supported on Linux and macOS.

Hello, I'm very excited for this feature as tensorboard's speed has been a big pain point so far. BUT when I try to use it, it tells me it's not supported on MacOS:

Option --load_fast=true not available: TensorBoard data server not supported on this platform.

You say it is supported on MacOS though, so what's going on here? I've got MacBookPro17,1; Apple M1 chips; MacOS Ventura, version 13.4.1; tb-nightly Version: 2.15.0a20231013; tf-nightly-macos Version: 2.16.0.dev20231013.

P.S. I've gotten same results using non-nightly tensorflow-macos & no tensorflow at all. Also I followed your instructions exactly to uninstall tensorboard & tb_nightly before reinstalling tb_nightly.

@wchargin
Copy link
Contributor Author

wchargin commented Oct 13, 2023

@profPlum: Hazarding a guess:

Apple M1 chips

That's probably your problem. The tensorboard-data-server package
currently ships macOS wheels for x86-64 but not for arm64.

If interested, you can build it yourself easily. I just tested it on my
laptop from scratch and had it running in three minutes. Here's how:

  1. If you don't already have a recent version of the Rust toolchain,
    install it from https://www.rust-lang.org/.

  2. Clone this repository (TensorBoard) into, say, ~/git/tensorboard.

  3. In the clone, change into the tensorboard/data/server/ directory.

  4. Run cargo build --release. This will build a data server binary
    into target/release/rustboard/.

  5. Set the TENSORBOARD_DATA_SERVER_BINARY environment variable to the
    full path to that binary: e.g.,

    export TENSORBOARD_DATA_SERVER_BINARY=~/git/tensorboard/tensorboard/data/server/target/release/rustboard

    (edit: fixed var name)

  6. Change directories out of the TensorBoard repository to avoid Python
    import issues, then launch tensorboard with --load_fast true.

If you want to double-check that it's using the data server, you can
navigate to http://localhost:6006/data/environment and see whether the
debug.data_provider field lists a GrpcDataProvider (fast) or a
MultiplexerDataProvider (slow). Or, you can set the environment
variable RUST_LOG=debug to see the data server logs.

(I don't currently work on TensorBoard, so consider this not an official
response but just a community member who at one point knew this part of
the code very well. :-) )

@profPlum
Copy link

profPlum commented Oct 16, 2023

@wchargin Thanks I appreciate the help! (& I'll let you know if it works)
Do you think it is likely that TB devs will give official support to M1 chips soon?

@profPlum
Copy link

profPlum commented Oct 19, 2023

@wchargin Hi again, I tried your instructions verbatim and it says roughly the same:

TensorFlow installation not found - running with reduced feature set.
Option --load_fast=true not available: TensorBoard data server not supported on this platform.

But to clarify: did you want to me to launch the original (pip) tensorboard again? That point confused me and it is what I did but I'm not sure if it's what you meant.

P.S. With: fresh install of tb_nightly==2.15.0a20231019 & cargo version: 1.73.0 (9c4383fb5 2023-08-26). Also I got same results on a linux docker container.

@Frn1nd0
Copy link

Frn1nd0 commented Oct 21, 2023

@wchargin Hi, I got issue when running this:
%load_ext tensorboard
%tensorboard --logdir output

It shows google interface with:
403. That’s an error.
That’s all we know.

Could you please guide me with this? Thanks

@bmd3k
Copy link
Contributor

bmd3k commented Oct 23, 2023

@Frn1nd0 , your issue is unrelated to fast data loading. Instead you have run into a recent regression with compatibility with Chrome. The Colab team have been investigating. We expect them to keep us updated at the following issue:

googlecolab/colabtools#3990

@Frn1nd0
Copy link

Frn1nd0 commented Oct 23, 2023

@bmd3k Thanks for the clarification, appreciate that! Hope they can fix this soon.

@wookayin
Copy link

Update: #6578 is fixed; as of tensorboard 2.15 GLIBC minimum requirement is 2.29 (compatible with Ubuntu 20.04)

@wchargin
Copy link
Contributor Author

@profPlum: Oops, sorry, I wrote the environment variable wrong: it
should be TENSORBOARD_DATA_SERVER_BINARY. Maybe try again thus?

did you want to me to launch the original (pip) tensorboard again?

Yes.

@davidxia
Copy link

Using --load_fast under GKE with workload identity causes 401 Unauthorized error in rustboard_core::logdir when accessing GCS buckets.

It works fine if I set --load_fast=false.

Is this still a bug in the recent versions? I can repro with version 2.11.2.

@zerzerzerz
Copy link

I would like to share my experience about how to solve the problem about

Option --load_fast=true not available: TensorBoard data server not supported on this platform.

My OS is Ubuntu 18.04.5 LTS, Python 3.10.14 and tensorboard 2.12.1.
The problem is caused by the miss of binary of tensorboard-data-server. At first, I use pip to install it:

pip install tensorboard-data-server

It runs successfully, but when I run following Python codes, it outputs None:

import tensorboard_data_server

res = tensorboard_data_server.server_binary()
print(res)

It seems that the binary of tensorboard-data-server is not installed properly.
So I use conda to install it like

conda install tensorboard
conda install chardet

After installation, I run the above Python codes again and it successfully outputs the path to binary of tensorboard-data-server like /home/<username>/miniconda3/envs/py310/lib/python3.10/site-packages/tensorboard_data_server/bin/server
It seems that pip cannot install binary of tensorboard-data-server, but conda can.

Finally, I can run tensorboard --logdir=<path/to/logdir> --load_fast=true and it becomes much faster than before.

@valerie-lth
Copy link

valerie-lth commented Mar 25, 2024

I'm using Chrome on Macbook, I get these errors:
image
image

The localhost page either shows nothing or empty grids when --load_fast=false.

It shows the plots in the grids when --load_fast=false but the error messages persist.

@rajnish159
Copy link

2024-04-16 13:41:07.757664: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2024-04-16 13:41:07.757723: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2024-04-16 13:41:07.801242: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-04-16 13:41:09.133758: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2024-04-16 13:41:09.133903: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2024-04-16 13:41:09.133924: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2024-04-16 13:41:11.219441: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2024-04-16 13:41:11.219624: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory
2024-04-16 13:41:11.219723: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory
2024-04-16 13:41:11.223136: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory
2024-04-16 13:41:11.223275: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory
2024-04-16 13:41:11.223359: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory
2024-04-16 13:41:11.223383: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1934] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.

@bhack
Copy link

bhack commented Sep 11, 2024

it is not usable with Google gcsfuse. See #6790

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core:rustboard //tensorboard/data/server/... type:support
Projects
None yet
Development

No branches or pull requests