Skip to content

tensorflow-haskell: Unbreak / update to Tensorflow 2.4#119411

Closed
mikesperber wants to merge 1 commit intoNixOS:haskell-updatesfrom
mikesperber:haskell-tensorflow-2.4
Closed

tensorflow-haskell: Unbreak / update to Tensorflow 2.4#119411
mikesperber wants to merge 1 commit intoNixOS:haskell-updatesfrom
mikesperber:haskell-tensorflow-2.4

Conversation

@mikesperber
Copy link
Contributor

Motivation for this change

This just updates the Haskell Tensorflow bindings to Tensorflow 2.4, which has been the default version for a few months.

This supersedes

#111399

... which has stalled. As per the discussion there, I'm submitting a fresh pull request.

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS linux)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.

@github-actions github-actions bot added the 6.topic: haskell General-purpose, statically typed, purely functional programming language label Apr 14, 2021
@ofborg ofborg bot added 10.rebuild-darwin: 1-10 This PR causes between 1 and 10 packages to rebuild on Darwin. 10.rebuild-darwin: 1 This PR causes 1 package to rebuild on Darwin. 10.rebuild-linux: 1-10 This PR causes between 1 and 10 packages to rebuild on Linux. 10.rebuild-linux: 1 This PR causes 1 package to rebuild on Linux. labels Apr 14, 2021
Copy link
Member

@cdepillabout cdepillabout left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like tensorflow-core-ops, tensorflow-logging, tensorflow, tensorflow-opgen, and tensorflow-ops are all marked as broken in pkgs/development/haskell-modules/configuration-hackage2nix.yaml.

Could you remove them from the broken packages list (assuming that they all compile)?

@mikesperber
Copy link
Contributor Author

@cdepillabout Ah, sorry - forgot that bit again. Done.

@cdepillabout
Copy link
Member

cdepillabout commented Apr 15, 2021

Thanks for fixing this.

When compiling tensorflow-ops, I'm seeing a problem with the tests:

$ nix-build -A haskellPackages.tensorflow-ops
these derivations will be built:
  /nix/store/3aws1msvblwnpwig3whn8r5dh4gfias6-tensorflow-ops-0.2.0.1.drv
building '/nix/store/3aws1msvblwnpwig3whn8r5dh4gfias6-tensorflow-ops-0.2.0.1.drv'...
setupCompilerEnvironmentPhase
Build with /nix/store/2ip3lwzqswai3zz33407h4b2q3har28p-ghc-8.10.4.
unpacking sources
unpacking source archive /nix/store/7yn2qvqpvbm460fkdhzvwzja985pfgmb-source
source root is source/tensorflow-ops
patching sources
compileBuildDriverPhase
setupCompileFlags: -package-db=/build/setup-package.conf.d -j4 +RTS -A64M -RTS -threaded -rtsopts
[1 of 1] Compiling Main             ( Setup.hs, /build/Main.o )
Linking Setup ...
configuring
configureFlags: --verbose --prefix=/nix/store/vx502lr8cqhamy7yp4vwp338lnki7qfg-tensorflow-ops-0.2.0.1 --libdir=$prefix/lib/$compiler --libsubdir=$abi/$libname --docdir=/nix/store/0603zibs9q8
vc88d964ngpm8yw20c4gx-tensorflow-ops-0.2.0.1-doc/share/doc/tensorflow-ops-0.2.0.1 --with-gcc=gcc --package-db=/build/package.conf.d --ghc-options=-j4 +RTS -A64M -RTS --disable-split-objs --e
nable-library-profiling --profiling-detail=exported-functions --disable-profiling --enable-shared --disable-coverage --enable-static --disable-executable-dynamic --enable-tests --disable-ben
chmarks --enable-library-vanilla --disable-library-for-ghci --ghc-option=-split-sections --extra-lib-dirs=/nix/store/hdpihl2yn8cpdqmc9sysbh3fvwsxchky-ncurses-6.2/lib --extra-lib-dirs=/nix/st
ore/vzqia3jcpy0xdqh4nzmw5qmdv6hx27dp-libffi-3.3/lib --extra-lib-dirs=/nix/store/ak9n4w3nsnvn5gxqyi3dhc342yk9ia06-gmp-6.2.1/lib
Using Parsec parser
Configuring tensorflow-ops-0.3.0.0...

...

[1 of 1] Compiling Main             ( tests/QueueTest.hs, dist/build/QueueTest/QueueTest-tmp/Main.o )
Linking dist/build/QueueTest/QueueTest ...
Preprocessing test suite 'BuildTest' for tensorflow-ops-0.3.0.0..
Building test suite 'BuildTest' for tensorflow-ops-0.3.0.0..
[1 of 1] Compiling Main             ( tests/BuildTest.hs, dist/build/BuildTest/BuildTest-tmp/Main.o )
Linking dist/build/BuildTest/BuildTest ...
running tests
Running 14 test suites...
Test suite OpsTest: RUNNING...
Test suite OpsTest: PASS
Test suite logged to: dist/test/tensorflow-ops-0.3.0.0-OpsTest.log
Test suite GradientTest: RUNNING...
Test suite GradientTest: PASS
Test suite logged to: dist/test/tensorflow-ops-0.3.0.0-GradientTest.log
Test suite MatrixTest: RUNNING...
Test suite MatrixTest: PASS
Test suite logged to: dist/test/tensorflow-ops-0.3.0.0-MatrixTest.log
Test suite VariableTest: RUNNING...
Test suite VariableTest: PASS
Test suite logged to: dist/test/tensorflow-ops-0.3.0.0-VariableTest.log
Test suite ArrayOpsTest: RUNNING...
Test suite ArrayOpsTest: PASS
Test suite logged to: dist/test/tensorflow-ops-0.3.0.0-ArrayOpsTest.log
Test suite NNTest: RUNNING...
Test suite NNTest: PASS
Test suite logged to: dist/test/tensorflow-ops-0.3.0.0-NNTest.log
Test suite RegressionTest: RUNNING...
Test suite RegressionTest: PASS
Test suite logged to: dist/test/tensorflow-ops-0.3.0.0-RegressionTest.log
Test suite TypesTest: RUNNING...
Test suite TypesTest: PASS
Test suite logged to: dist/test/tensorflow-ops-0.3.0.0-TypesTest.log
Test suite MiscTest: RUNNING...
Test suite MiscTest: PASS
Test suite logged to: dist/test/tensorflow-ops-0.3.0.0-MiscTest.log
Test suite EmbeddingOpsTest: RUNNING...
Test suite EmbeddingOpsTest: PASS
Test suite logged to: dist/test/tensorflow-ops-0.3.0.0-EmbeddingOpsTest.log
Test suite DataFlowOpsTest: RUNNING...
Test suite DataFlowOpsTest: PASS
Test suite logged to: dist/test/tensorflow-ops-0.3.0.0-DataFlowOpsTest.log
Test suite TracingTest: RUNNING...
Test suite TracingTest: PASS
Test suite logged to: dist/test/tensorflow-ops-0.3.0.0-TracingTest.log
Test suite QueueTest: RUNNING...
2021-04-15 01:46:48.721476: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU
 instructions in performance-critical operations:  AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-04-15 01:46:48.738886: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 1991865000 Hz
testBasic: [Failed]
ERROR: Malformed TF_STRING tensor; Result: incomplete input
CallStack (from HasCallStack):
  error, called at src/TensorFlow/Types.hs:326:25 in tensorflow-0.3.0.0-6G0yW4KI5NJCJocu4cHCKp:TensorFlow.Types
testPump: [Failed]
ERROR: Malformed TF_STRING tensor; Result: incomplete input
CallStack (from HasCallStack):
  error, called at src/TensorFlow/Types.hs:326:25 in tensorflow-0.3.0.0-6G0yW4KI5NJCJocu4cHCKp:TensorFlow.Types
TensorFlowException TF_CANCELLED "Run call was cancelled"
testAsync: [Failed]
ERROR: Malformed TF_STRING tensor; Result: incomplete input
CallStack (from HasCallStack):
  error, called at src/TensorFlow/Types.hs:326:25 in tensorflow-0.3.0.0-6G0yW4KI5NJCJocu4cHCKp:TensorFlow.Types

         Test Cases  Total      
 Passed  0           0          
 Failed  3           3          
 Total   3           3          
Test suite QueueTest: FAIL
Test suite logged to: dist/test/tensorflow-ops-0.3.0.0-QueueTest.log
Test suite BuildTest: RUNNING...
Test suite BuildTest: PASS
Test suite logged to: dist/test/tensorflow-ops-0.3.0.0-BuildTest.log
13 of 14 test suites (13 of 14 test cases) passed.
builder for '/nix/store/3aws1msvblwnpwig3whn8r5dh4gfias6-tensorflow-ops-0.2.0.1.drv' failed with exit code 1
error: build of '/nix/store/3aws1msvblwnpwig3whn8r5dh4gfias6-tensorflow-ops-0.2.0.1.drv' failed

Are you not seeing this error?

@mikesperber
Copy link
Contributor Author

@cdepillabout Yes, I see it: Sorry for that - I ran the tests in a slightly different environment, I see now. I'll investigate, but might take me a few days.

Thanks for the feedback!

@cdepillabout
Copy link
Member

@mikesperber No problem, thanks for looking into this :-)

@mikesperber
Copy link
Contributor Author

Just logging some debugging work:

  • the problem is in decoding bytestrings, the general queue functionality seems to work
  • building from source in Stack, using nightly-2021-04-06 (same version as the Nix branch is using), does not exhibit this problem

@maralorn maralorn closed this May 7, 2021
@maralorn maralorn deleted the branch NixOS:haskell-updates May 7, 2021 21:55
@maralorn maralorn reopened this May 7, 2021
@mikesperber mikesperber requested a review from maralorn as a code owner May 10, 2021 11:14
@sternenseemann sternenseemann deleted the branch NixOS:haskell-updates May 19, 2021 01:53
@ofborg ofborg bot added the 2.status: merge conflict This PR has merge conflicts with the target branch label May 19, 2021
@mikesperber
Copy link
Contributor Author

I'm still on this one, it's just slow-going.

@sternenseemann
Copy link
Member

sternenseemann commented May 19, 2021 via email

@mikesperber mikesperber force-pushed the haskell-tensorflow-2.4 branch from ce2f81c to 1de01b4 Compare June 18, 2021 07:55
@mikesperber mikesperber requested a review from expipiplus1 as a code owner June 18, 2021 07:55
@mikesperber
Copy link
Contributor Author

Just rebased the patch and did a bit more debugging.

The offsets for decoding strings seem to be out of whack. In QueueTest, the correct FFI.TensorData records for the strings look like this:

tensorData: TensorData {tensorDataDimensions = [], tensorDataType = DT_STRING, tensorDataBytes = [0,0,0,0,0,0,0,0,2,72,105]}
tensorData: TensorData {tensorDataDimensions = [], tensorDataType = DT_STRING, tensorDataBytes = [0,0,0,0,0,0,0,0,3,66,97,114]}
tensorData: TensorData {tensorDataDimensions = [], tensorDataType = DT_STRING, tensorDataBytes = [0,0,0,0,0,0,0,0,3,66,97,122]}
tensorData: TensorData {tensorDataDimensions = [], tensorDataType = DT_STRING, tensorDataBytes = [0,0,0,0,0,0,0,0,3,66,97,122]}
tensorData: TensorData {tensorDataDimensions = [], tensorDataType = DT_STRING, tensorDataBytes = [0,0,0,0,0,0,0,0,5,65,115,121,110,99]}
tensorData: TensorData {tensorDataDimensions = [], tensorDataType = DT_STRING, tensorDataBytes = [0,0,0,0,0,0,0,0,5,65,115,121,110,99]}

The broken ones I'm seeing look like this:

tensorData: TensorData {tensorDataDimensions = [], tensorDataType = DT_STRING, tensorDataBytes = [8,72,105,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]}
dataBytes: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
tensorData: TensorData {tensorDataDimensions = [], tensorDataType = DT_STRING, tensorDataBytes = [12,66,97,122,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]}
dataBytes: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
tensorData: TensorData {tensorDataDimensions = [], tensorDataType = DT_STRING, tensorDataBytes = [20,65,115,121,110,99,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]}
dataBytes: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Feels like there's an offset of 8 going on.

@ofborg ofborg bot removed the 2.status: merge conflict This PR has merge conflicts with the target branch label Jun 18, 2021
@stale
Copy link

stale bot commented Jan 3, 2022

I marked this as stale due to inactivity. → More info

@stale stale bot added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Jan 3, 2022
@sheepforce sheepforce mentioned this pull request Jul 28, 2022
12 tasks
@mikesperber
Copy link
Contributor Author

Superseded by #217812.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md 6.topic: haskell General-purpose, statically typed, purely functional programming language 10.rebuild-darwin: 1-10 This PR causes between 1 and 10 packages to rebuild on Darwin. 10.rebuild-darwin: 1 This PR causes 1 package to rebuild on Darwin. 10.rebuild-linux: 1-10 This PR causes between 1 and 10 packages to rebuild on Linux. 10.rebuild-linux: 1 This PR causes 1 package to rebuild on Linux.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Comments