Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using filesystem from tensorboard #335

Open
damienpontifex opened this issue Jul 9, 2019 · 9 comments
Open

Using filesystem from tensorboard #335

damienpontifex opened this issue Jul 9, 2019 · 9 comments

Comments

@damienpontifex
Copy link
Contributor

This is a question or maybe request for documentation:

How would we use a tensorflow/io filesystem from tensorboard from the command line? Similar to how we can do it with GCS as tensorboard --logdir=gs://bucket/path/to/logs I'd like to be able to do the same with the recent azure blob storage file system.

I believe the GCS file system is built into the main tensorflow repo and so I assume gets packaged up and available when tensorboard is built also.

@ms-lolo
Copy link

ms-lolo commented Nov 17, 2020

can someone help me understand if this is possible? @damienpontifex, were you able to get this working?

@yongtang
Copy link
Member

@ms-lolo At the moment azure blob storage file system has been fully built into tensorflow-io, so you should be able to use import tensorflow_io as tfio and the file system will be ready when you run tensorflow:

import tensorflow as tf
import tensorflow_io as tfio
...
...

# using az://accountname/path/to/logs the same way as gs://bucket/path/to/logs

For tensorboard, in theory it should be similar as long as import tensorflow_io as tfio is present in your python program. Please give it a try and let us know if running into any issues.

Also a tutorial about azfs is available in https://www.tensorflow.org/io/tutorials/azure

@ms-lolo
Copy link

ms-lolo commented Nov 17, 2020

thanks for the response, @yongtang, I'm a little confused by this ticket then: tensorflow/tensorboard#2424

I guess it's not clear to me how this support works when I am trying to run tensorboard as a terminal command and not as part of a notebooks. I am specifically trying to run something like tensorboard --logsdir az://accountname/path/to/logs but it sounds like the import you are referring to needs to happen in the tensorboard startup process. Is that right?

@yongtang
Copy link
Member

@ms-lolo Yes the import tensorflow_io as tfio needs to happen inside the python script when tensorboard tries to run tensorflow. I am not very familiar with tensorboard but I would assume the change will not be big (except need to find the right place to add it).

@ms-lolo
Copy link

ms-lolo commented Nov 17, 2020

ok last question! the docs you linked mention TF_AZURE_STORAGE_KEY, do you know if tensorflow-io supports using user and system managed identities for the blob operations? This should be possible if tensorflow-io is using the python blob sdk and uses something like the DefaultAzureCredential class: https://github.com/Azure/azure-sdk-for-python/tree/master/sdk/identity/azure-identity#authenticating-with-defaultazurecredential

The main requirement for us is to not have access to SAS tokens or other secrets.

@yongtang
Copy link
Member

@ms-lolo We use Azure Storage CPP SDK (https://github.com/Azure/azure-storage-cpplite) so in theory it conforms to the same methods like python SDK. Can you give it a try, and, in case it is not supported on tensorflow-io yet, report back? We will fix any issues if needed.

@ms-lolo
Copy link

ms-lolo commented Nov 17, 2020

Just tested and I am seeing 404 errors when trying to run tf.io.gfile.mkdir(pathname). I think this is the default error azure returns when you lack permissions to a location (so it doesn't expose information about the location existing). I didn't set any environment variables and just ran something like this:

pip install tensorflow-io
import os
import tensorflow as tf
import tensorflow_io as tfio
pathname = 'az://[account]/[container]/foo'
tf.io.gfile.mkdir(pathname)

The command hangs for a dozen seconds or so before giving me a 404 error.

The docs also mention a azfs:// scheme being registered but using that gives me an immediate error of UnimplementedError: File system scheme 'azfs' not implemented when I try that. I'm guessing the docs are maybe out of date and az:// is the new scheme.

@ms-lolo
Copy link

ms-lolo commented Nov 17, 2020

if it helps, this is how I would access the same location using the python blob sdk (just listing the blobs at this path):

from azure.identity import DefaultAzureCredential
from azure.storage.blob import BlobServiceClient

credential = DefaultAzureCredential()
container_client = ContainerClient(
        account_url=f"https://[account].blob.core.windows.net/",
        container_name="[container]",
        credential=credential)
blobs = container_client.list_blobs(name_starts_with="foo/")

The important part is that this works without specifying any authentication secrets. If this code runs on a machine that has the appropriate access permissions, the code will run and be authenticated automatically.

@janbernloehr
Copy link
Contributor

  1. Has tensorboard been changed to import tf io already? @yongtang
  2. I think we can add support for MSI to the azure file system bindings. @ms-lolo maybe let’s have a separate issue for that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants