[projector] support tensor stored in binary format #3685

RustingSword · 2020-05-30T06:02:04Z

Motivation for features / changes

Support loading tensor stored in binary format to reduce tensor saving/loading time and file size. This is already supported in the standalone embedding projector, but not in the opensource tensorboard projector.

Technical description of changes

Added a function _read_tensor_binary_file to load tensor in binary format. The speedup is about 2-3 orders of magnitude for tensors of moderate size.

In [1]: import numpy as np
In [2]: data = np.random.random((10000, 128)).astype(np.float32)
In [3]: data.tofile("vec.bin")
In [4]: np.savetxt("vec.txt", data, delimiter="\t")
In [5]: def read_tensor_tsv_file(fpath):
   ...:     with open(fpath) as f:
   ...:         tensor = []
   ...:         for line in f:
   ...:             line = line.rstrip("\n")
   ...:             if line:
   ...:                 tensor.append(list(map(float, line.split("\t"))))
   ...:     return np.array(tensor, dtype="float32")
   ...:
In [6]: %timeit np.fromfile("vec.bin", dtype="float32").reshape((10000, 128))
1.11 ms ± 14.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [7]: %timeit read_tensor_tsv_file("vec.txt")
537 ms ± 1.77 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Screenshots of UI changes
Detailed steps to verify changes work correctly (as executed by you)

> ls binary_embedding
projector_config.pbtxt  vec.bin
> cat binary_embedding/projector_config.pbtxt
embeddings {
  tensor_name: "embedding"
  tensor_path: "vec.bin"
  tensor_shape: 10000
  tensor_shape: 128
}
> tensorboard --logdir binary_embedding

Alternate designs / implementations considered
N/A

hfiller

Looks great! Thanks for contributing.

Note: I believe that the dtype float32 assumes the binary file was written in native endian, so it may not be possible to transfer binary files between all processors. Shouldn't be a problem for your use case 👍

[projector] support tensor stored in binary format

02372a4

googlebot added the cla: yes label May 30, 2020

hfiller added the plugin:projector label Jun 1, 2020

hfiller requested review from hfiller and tensorboard-gardener June 1, 2020 20:09

hfiller approved these changes Jun 1, 2020

View reviewed changes

hfiller merged commit 1ba4f06 into tensorflow:master Jun 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[projector] support tensor stored in binary format #3685

[projector] support tensor stored in binary format #3685

RustingSword commented May 30, 2020

hfiller left a comment

[projector] support tensor stored in binary format #3685

[projector] support tensor stored in binary format #3685

Conversation

RustingSword commented May 30, 2020

hfiller left a comment

Choose a reason for hiding this comment