Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[projector] support tensor stored in binary format #3685

Merged
merged 1 commit into from
Jun 1, 2020
Merged

[projector] support tensor stored in binary format #3685

merged 1 commit into from
Jun 1, 2020

Conversation

RustingSword
Copy link
Contributor

  • Motivation for features / changes

Support loading tensor stored in binary format to reduce tensor saving/loading time and file size. This is already supported in the standalone embedding projector, but not in the opensource tensorboard projector.

  • Technical description of changes

Added a function _read_tensor_binary_file to load tensor in binary format. The speedup is about 2-3 orders of magnitude for tensors of moderate size.

In [1]: import numpy as np
In [2]: data = np.random.random((10000, 128)).astype(np.float32)
In [3]: data.tofile("vec.bin")
In [4]: np.savetxt("vec.txt", data, delimiter="\t")
In [5]: def read_tensor_tsv_file(fpath):
   ...:     with open(fpath) as f:
   ...:         tensor = []
   ...:         for line in f:
   ...:             line = line.rstrip("\n")
   ...:             if line:
   ...:                 tensor.append(list(map(float, line.split("\t"))))
   ...:     return np.array(tensor, dtype="float32")
   ...:
In [6]: %timeit np.fromfile("vec.bin", dtype="float32").reshape((10000, 128))
1.11 ms ± 14.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [7]: %timeit read_tensor_tsv_file("vec.txt")
537 ms ± 1.77 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
  • Screenshots of UI changes

  • Detailed steps to verify changes work correctly (as executed by you)

> ls binary_embedding
projector_config.pbtxt  vec.bin
> cat binary_embedding/projector_config.pbtxt
embeddings {
  tensor_name: "embedding"
  tensor_path: "vec.bin"
  tensor_shape: 10000
  tensor_shape: 128
}
> tensorboard --logdir binary_embedding
  • Alternate designs / implementations considered
    N/A

Copy link
Contributor

@hfiller hfiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks for contributing.

Note: I believe that the dtype float32 assumes the binary file was written in native endian, so it may not be possible to transfer binary files between all processors. Shouldn't be a problem for your use case 👍

@hfiller hfiller merged commit 1ba4f06 into tensorflow:master Jun 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants