Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ready for review] use gfile to support remote directories #2164

Merged
merged 1 commit into from
Aug 9, 2020

Conversation

f4hy
Copy link
Contributor

@f4hy f4hy commented Jun 12, 2020

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?
  • If you made a notable change (that affects users), did you update the CHANGELOG?

What does this PR do?

Fixes #2161

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@mergify mergify bot requested a review from a team June 12, 2020 20:47
@Borda Borda added the feature Is an improvement or enhancement label Jun 13, 2020
@f4hy
Copy link
Contributor Author

f4hy commented Jun 13, 2020

Thanks @samj1912 fix the one missing call. I also cleaned up how this was being called and fixed up the tests. The tests were using the tmpdir pytest fixture which gives a py.path.local object where the contract in many places is type annotated str or Optional[str]. So I felt the best choice was to update the tests. Howerver it might be a better solution to just make the contracts some sort of path like object.

@f4hy
Copy link
Contributor Author

f4hy commented Jun 13, 2020

Rebased since the requirements paths changed on master.

@f4hy
Copy link
Contributor Author

f4hy commented Jun 13, 2020

I will leave this a draft PR in case a different solution is suggested for issue #2161

@f4hy f4hy mentioned this pull request Jun 13, 2020
5 tasks
@f4hy
Copy link
Contributor Author

f4hy commented Jun 14, 2020

Hmm. ok the tests here are now failing on just the 'minimal' builds where it passes on the lastest. A better approach here might be to wrap all the IO in something that can detect what is installed. If running on a minimum of dependencies only support local disk writes, and if more/later deps are installed support cloud file paths.

@Borda Borda self-requested a review June 24, 2020 20:10
@@ -501,7 +501,7 @@ def test_benchmark_option(tmpdir):

# fit model
trainer = Trainer(
default_root_dir=tmpdir,
default_root_dir=str(tmpdir),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather so this casting in Trainer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. will make that change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason to do it here is that the trainer and other places correctly type annotate that it should be a str or Optional[str] and it is hte tests which are wrong in passing a path object. @Borda I think if the tests are going to do this we should update the type annotations to accept a union of string and path. what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, then pls update annotation :]

@mergify mergify bot requested a review from a team June 24, 2020 20:11
@codecov
Copy link

codecov bot commented Jun 24, 2020

Codecov Report

Merging #2164 into master will decrease coverage by 0%.
The diff coverage is 86%.

@@          Coverage Diff           @@
##           master   #2164   +/-   ##
======================================
- Coverage      90%     90%   -0%     
======================================
  Files          79      79           
  Lines        7236    7275   +39     
======================================
+ Hits         6530    6541   +11     
- Misses        706     734   +28     

@pep8speaks
Copy link

pep8speaks commented Jun 25, 2020

Hello @f4hy! Thanks for updating this PR.

Line 41:50: E231 missing whitespace after ':'

Comment last updated at 2020-08-08 23:37:43 UTC

@f4hy f4hy force-pushed the tb_use_gfile branch 4 times, most recently from de0fd62 to 42b914a Compare June 25, 2020 02:53
@williamFalcon
Copy link
Contributor

@f4hy is this ready for 0.8.2 today?

@f4hy
Copy link
Contributor Author

f4hy commented Jun 26, 2020

@f4hy is this ready for 0.8.2 today?

Yep! Just removed some extra cruft. If you @williamFalcon think this is a good solution to this then I think its ready to go now. I was worried the tests are not passing on the 'minimal' builds, I tried to fix it but looks like its failing on master for minimal right now.

@f4hy f4hy marked this pull request as ready for review June 26, 2020 16:26
@Borda
Copy link
Member

Borda commented Jun 26, 2020

btw, how does it goes with #2175?

@f4hy
Copy link
Contributor Author

f4hy commented Jun 27, 2020

btw, how does it goes with #2175?

So s3 support is just one cloud hosting provider. My pr should support any of the backends that tensorboard currently supports which is s3/hdfs/gcs/local and maybe others have been added. So if #2175 is merged it wont solve the issue I have of #2161 .

If the other PR does something this one does not we can add it here, but my understanding is that this is a superset of that.

@mergify
Copy link
Contributor

mergify bot commented Jun 27, 2020

This pull request is now in conflict... :(

1 similar comment
@mergify
Copy link
Contributor

mergify bot commented Jun 29, 2020

This pull request is now in conflict... :(

@mergify
Copy link
Contributor

mergify bot commented Jul 22, 2020

This pull request is now in conflict... :(

@f4hy f4hy changed the title [Ready for review] use gfile to support remote directories [wip] use gfile to support remote directories Aug 7, 2020
@mergify
Copy link
Contributor

mergify bot commented Aug 7, 2020

This pull request is now in conflict... :(

@f4hy
Copy link
Contributor Author

f4hy commented Aug 7, 2020

@Borda I have rebased master and updated the tests so now all pass except against windows. I am not quite sure how to test against windows and some tips would be welcome. I believe it is just due to line-ending issues.

On a larger issue though, is this design acceptable? I know others have tried to solve similar problems. My issue is my training happens on a Kubernetes cluster where I only have remote file systems to store data so I want all logs/checkpoints/outputs to be written to remote/cloud storage. I think this is a reasonable way to approach it since tensorboard has this capability and is already a dependency but other approaches can work as well.

@Borda Borda added the Important label Aug 7, 2020
@mergify
Copy link
Contributor

mergify bot commented Aug 7, 2020

This pull request is now in conflict... :(

@awaelchli awaelchli modified the milestones: 0.9.0, 0.9.x Aug 8, 2020
commit 29fb0506cd38a15c359e369cc8bc4435916b0c78
Author: Brendan Fahy <[email protected]>
Date:   Sat Aug 8 19:35:30 2020 +0000

    fix checking for version for docs to build

commit 467fd64
Author: Brendan Fahy <[email protected]>
Date:   Sat Aug 8 18:56:05 2020 +0000

    remove no local test

commit a7cc9f8
Author: Brendan Fahy <[email protected]>
Date:   Sat Aug 8 18:46:44 2020 +0000

    fix

commit 3fdbb72
Author: Brendan Fahy <[email protected]>
Date:   Sat Aug 8 18:23:30 2020 +0000

    revert requirements

commit 9b8686b
Author: Brendan Fahy <[email protected]>
Date:   Sat Aug 8 18:16:42 2020 +0000

    make it a fixture

commit eec7495
Author: Brendan Fahy <[email protected]>
Date:   Sat Aug 8 18:01:32 2020 +0000

    fix up the testing

commit 896d94a
Author: Brendan Fahy <[email protected]>
Date:   Sat Aug 8 17:47:28 2020 +0000

    fix some tests

commit 6d22bde
Merge: 6175d4e 6ebe0d7
Author: Brendan Fahy <[email protected]>
Date:   Sat Aug 8 10:20:47 2020 +0000

    Merge remote-tracking branch 'origin/master' into tb_use_gfile

commit 6175d4e
Author: Brendan Fahy <[email protected]>
Date:   Fri Aug 7 10:16:36 2020 +0000

    Use tensorboard.compat.gfile to support remote writing
@f4hy f4hy changed the title [wip] use gfile to support remote directories [Ready for review] use gfile to support remote directories Aug 8, 2020
@f4hy
Copy link
Contributor Author

f4hy commented Aug 9, 2020

@Borda Rebased again. Now passes all tests. Should be ready for review.

Let me know any ideas on how to improve this. I would like better ways of testing it. The concern I have is ensuring we don't later add some methods to write files that only work locally. So somehow testing this can actually work with remote files would be good, but not sure how to do that without something like docker compose to set that up. Ideas welcome.

@Borda Borda added the priority: 0 High priority task label Aug 9, 2020
@williamFalcon williamFalcon merged commit 6e77181 into Lightning-AI:master Aug 9, 2020
@f4hy f4hy deleted the tb_use_gfile branch August 9, 2020 19:55
@edenlightning edenlightning linked an issue Aug 18, 2020 that may be closed by this pull request
@Borda Borda modified the milestones: 0.9.x, 0.9.0 Aug 20, 2020
@igorgad
Copy link
Contributor

igorgad commented Dec 29, 2020

Hi, @f4hy. Was this already merged? I cannot see it on release 1.1.2. Did it got removed somehow?

@f4hy
Copy link
Contributor Author

f4hy commented Dec 29, 2020

We went with fsspec instead. See the other PRs by me. Remote directories should be supported but with fsspec instead

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Is an improvement or enhancement priority: 0 High priority task
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Generalize i/o to other storage systems tensorboard logger should support remote directories
6 participants