Skip to content
This repository was archived by the owner on Jul 1, 2023. It is now read-only.

Conversation

@eaplatanios
Copy link
Contributor

@eaplatanios eaplatanios commented Apr 21, 2019

Moves all of the TensorFlow module here. All tests pass locally.

Friend PR: swiftlang/swift#24452.

@rxwei
Copy link
Contributor

rxwei commented Apr 21, 2019

Could you create a PR that contains all the changes? I reverted #70 and #106 in order to maintain a working build in this repo.

@rxwei
Copy link
Contributor

rxwei commented Apr 21, 2019

Sorry for the inconvenience!

@eaplatanios
Copy link
Contributor Author

@rxwei No worries. I hope I did the merge correctly :) The only error that remains when I try to compile this now is the ambiguous subscript.

@eaplatanios
Copy link
Contributor Author

@rxwei I tried removing the subscript by putting it in a #if COMPILING_TENSORFLOW_MODULE/#endif block but now we get a crash from swift::SILFunctionBuilder::addFunctionAttributes(swift::SILFunction*, swift::DeclAttributes&, swift::SILModule&, swift::SILDeclRef), which I've also seen before due to ambiguous functions. I believe this is because all of the extension functions are ambiguous in this case.

@eaplatanios
Copy link
Contributor Author

Also, shouldn't we add @_semantics("autodiff.nonvarying") to all functions that we know to be non-varying? E.g., .>, .<, .==, etc. It would also be nice to expose this to users using a nicer name. Is that possible?

@rxwei
Copy link
Contributor

rxwei commented Apr 21, 2019

Also, shouldn't we add @_semantics("autodiff.nonvarying") to all functions that we know to be non-varying? E.g., .>, .<, .==, etc. It would also be nice to expose this to users using a nicer name. Is that possible?

I think it's better not to do that because we want non-differentiable functions to produce an error instead of silently giving a zero gradient at runtime. This will help ensure a predictable programming model. The @_semantics("autodiff.nonvarying") attribute can be considered as equivalent to @noDerivative, and I think they should only apply to properties, not functions. Also, adding an attribute on everything often implies that it should be defaulted, but we've tried zero-gradient-by-default and some initial users didn't like it.

My preference is that we explore adding a good API first before considering changing language defaults or library behavior. I see no problem adding a NoDerivative type today, because eventually we can just add a @propertyDelegate attribute on it when that's available. I'm interested in hearing what you think!

@eaplatanios
Copy link
Contributor Author

I see your point and I agree. I'll look into the GPU error as soon as I can.

In the meantime, it turns out it's quite hard to debug TF errors in S4TF because no stack trace is every printed. Oftentimes I get a fatal error about mismatching shapes, etc, but no other information and it's very hard to debug. Is there a way to have S4TF print the whole stack trace whenever a fatal error occurs?

@rxwei
Copy link
Contributor

rxwei commented Apr 22, 2019

Yeah, that is a known pain point. I think that @pschuh has more context on the current status of pretty-printing TensorFlow runtime errors.

@eaplatanios
Copy link
Contributor Author

Even just a plain stack trace dump would be super helpful at this point to be honest. Because right now it's hard to know which part of the user's code is causing the error.

@rxwei
Copy link
Contributor

rxwei commented Apr 22, 2019

You can step through the code using LLDB.

@eaplatanios
Copy link
Contributor Author

Yes, that is what I'm doing currently, but I feel that won't be an option for a lot of ML researchers using S4TF.

@Shashi456
Copy link
Contributor

I'm sorry to divert the thread, @eaplatanios but generally vjps for a function can be thought of returning the dual for the automatic differentiation?
so if i wanted to write a vjp for a function which squares a number, i would write the vjp such that it returns 2*x?
I was just looking over your min max vjp function definitions. I'm still trying to wrap my head around writing vjps.

@rxwei
Copy link
Contributor

rxwei commented Apr 22, 2019

Yes, that is what I'm doing currently, but I feel that won't be an option for a lot of ML researchers using S4TF.

I and the team really agree, and it's definitely on our TODO list.

@rxwei
Copy link
Contributor

rxwei commented Apr 22, 2019

I'm sorry to divert the thread, @eaplatanios but generally vjps for a function can be thought of returning the dual for the automatic differentiation?

A VJP function represents an efficient derivative function that takes original arguments and computes both

  • the original result, and
  • a closure (pullback) that takes a vector and chains it together with the function’s Jacobian, producing vector-Jacobian products, i.e. gradient along the vector.

so if i wanted to write a vjp for a function which squares a number, i would write the vjp such that it returns 2*x?

Say if you have a square(_:) function:

func square(_ x: Float) -> Float {
    return x * x
}

You can define a derivative in the form of a function that has a @differentiating attribute.

@differentiating(square)
func squareDerivative(_ x: Float) -> (value: Float, pullback: (Float) -> Float) {
    return (x * x, { v in v * 2 * x })
}

Note here that it's not just 2x, but a linear combination function { v in v * 2 * x }. v is the "vector" which you can interpret as whatever's chaining with the current derivative. For scalar operations, it's multiplication or element-wise multiplication. For matrix operations (e.g. TensorFlow's matmul(_:_:)), it's matrix multiplication.

@Shashi456
Copy link
Contributor

@rxwei thanks a lot for the explanation.

@eaplatanios
Copy link
Contributor Author

@rxwei I added a couple separate PRs, #136 for the tests, #137 for the bindings fetching script, and swiftlang/swift#25091 for replacing InitTensorFlowRuntime with a Swift implementation. I believe that once these are merged, then we may be able to merge this one directly since it mostly consists of moving over everything from apple/swift. I'm not sure how a partial move would work without adding a significant amount of work to properly separate things for each PR. Please let me know what you think.

@rxwei
Copy link
Contributor

rxwei commented May 28, 2019

Thanks! @bgogul and I will help you merge these PRs this week.

@eaplatanios
Copy link
Contributor Author

eaplatanios commented May 30, 2019

@rxwei @bgogul All tests pass locally for me and I have started the CI tests. Tests will probably run successfully with the current toolchain but I tested locally with one built from the corresponding PR swiftlang/swift#24452.

Given the discussion and issues with #137 I believe we should just go ahead with merging this PR directly as partial moves cause more issues that may be hard to work around, whereas this seems to pass all tests fine already.

@eaplatanios
Copy link
Contributor Author

The CI tests seem to fail and I can't tell why. Is there a way to test wit the toolchain from swiftlang/swift#24452 directly?

@rxwei
Copy link
Contributor

rxwei commented May 30, 2019

The CI tests seem to fail and I can't tell why. Is there a way to test wit the toolchain from apple/swift#24452 directly?

There's no way to do that until the PRs have been merged. Let me build a toolchain locally and test things (~20 minutes).

@eaplatanios
Copy link
Contributor Author

Sounds good, thanks! One minor thing I just noticed. There is a test in LayerTests that fails due to a difference in some numbers. I'm not sure which one is correct and why the numbers are different now, but I pushed the changed numbers (lines 203-212 in Tests/TensorFlowTests/LayerTests.swift) so you can take a look.

@rxwei
Copy link
Contributor

rxwei commented May 30, 2019

Sounds good, thanks! One minor thing I just noticed. There is a test in LayerTests that fails due to a difference in some numbers. I'm not sure which one is correct and why the numbers are different now, but I pushed the changed numbers (lines 203-212 in Tests/TensorFlowTests/LayerTests.swift) so you can take a look.

I think verifying the numeric results against Python TF is the best way to know.

@rxwei
Copy link
Contributor

rxwei commented May 30, 2019

I'm getting a bunch of undefined identifier errors when building a toolchain using this PR and the corresponding apple/swift PR.

/usr/local/src/swift-build/tensorflow-swift-apis/Sources/DeepLearning/Optimizer.swift:175:60: error: use of unresolved identifier 'Tensor'
        for kp in alpha.recursivelyAllWritableKeyPaths(to: Tensor<Float>.self) {
                                                           ^~~~~~

@Shashi456
Copy link
Contributor

@eaplatanios @rxwei Sorry to suddenly jump into the convo. So the RNN layer tests seems to have some sorta variability in output and with #130 we did change those values again. Could we just comment it out for now and look into it later?

@eaplatanios
Copy link
Contributor Author

@rxwei that’s because you’re trying to build it with the old checkout of swift-apis. You’d need to checkout this PR in the tensorflow-swift-apis directory.

@eaplatanios
Copy link
Contributor Author

@Shashi456 I don’t see why there should be any variability. Thanks for that information. I’ll try to look into this tomorrow.

@rxwei
Copy link
Contributor

rxwei commented May 30, 2019

@rxwei that’s because you’re trying to build it with the old checkout of swift-apis. You’d need to checkout this PR in the tensorflow-swift-apis directory.

Oops, just figured that. I fetched the PR but didn't checkout.

@rxwei
Copy link
Contributor

rxwei commented May 30, 2019

All tests are passing. I'm about to merge this PR, update the commit hash in the other PR, merge it, and start a new draft PR to test CI once a new toolchain is built.

@rxwei rxwei merged commit 1d484e1 into tensorflow:master May 30, 2019
@Shashi456
Copy link
Contributor

@rxwei I'm sorry for asking this again, but what does this PR exactly do? and why was it needed in the long run?

rxwei pushed a commit to swiftlang/swift that referenced this pull request May 30, 2019
Moves the entirety of the `TensorFlow` module to tensorflow/swift-apis.

Friend PR: tensorflow/swift-apis#109.
@eaplatanios
Copy link
Contributor Author

@Shashi456 This PR mainly moves all of the TensorFlow module in stdlib from apple/swift (i.e., the compiler repository) to this repository. There is also some other minor changes and support for a couple new ops and VJPs, as well as a restructuring of the Operators.swift file to a directory.

This move allows us to makes changes to the stdlib and test them without requiring us to recompile the Swift compiler (which is expensive) and it also separates the compiler code from the stdlib code more clearly.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants