ARROW-1693: [JS] Expand JavaScript implementation, build system, fix integration tests #1294

trxcllnt · 2017-11-09T04:20:17Z

This PR adds a workaround for reading the metadata layout for C++ dictionary-encoded vectors.

I added tests that validate against the C++/Java integration suite. In order to make the new tests pass, I had to update the generated flatbuffers format and add a few types the JS version didn't have yet (Bool, Date32, and Timestamp). It also uses the new isDelta flag on DictionaryBatches to determine whether the DictionaryBatch vector should replace or append to the existing dictionary.

I also added a script for generating test arrow files from the C++ and Java implementations, so we don't break the tests updating the format in the future. I saved the generated Arrow files in with the tests because I didn't see a way to pipe the JSON test data through the C++/Java json-to-arrow commands without writing to a file. If I missed something and we can do it all in-memory, I'd be happy to make that change!

This PR is marked WIP because I added an integration test that validates the JS reader reads C++ and Java files the same way, but unfortunately it doesn't. Debugging, I noticed a number of other differences between the buffer layout metadata between the C++ and Java versions. If we go ahead with @jacques-n comment in ARROW-1693 and remove/ignore the metadata, this test should pass too.

cc @TheNeuralBit

wesm · 2017-11-09T20:13:50Z

Want to go ahead and ignore the vector layout metadata? Per ARROW-1785. I will wait a day or so for further feedback to circulate, then proceed with a removal of this metadata. We'll need to update the Flatbuffers files in JS again as part of this, I will give you write access on my fork so you can push directly to the PR branch as needed

trxcllnt · 2017-11-09T20:42:05Z

@wesm yep, working on this tonight. do you mind if I add to this PR vs starting a new branch?

wesm · 2017-11-09T21:03:34Z

no problem, feel free to keep adding here. I will take a little time to review also

trxcllnt · 2017-11-12T17:14:06Z

@wesm I believe this branch is ready. We should revalidate datetime and dictionary once we can get Pandas to generate a CSV from the test data. The integration tests are finally passing, so I feel good about this one:

Test Suites: 4 passed, 4 total
Tests:       404 passed, 404 total
Snapshots:   113940 passed, 113940 total

The last commit re-enables node_js job in travis so we can verify the above.

wesm · 2017-11-12T19:28:51Z

awesome, thanks @trxcllnt! I'm going to work through the patch to leave any comments that jump out but this is really exciting

wesm · 2017-11-12T19:31:17Z

js/gulp/test-task.js

Can this version number here be gotten from the environment / pom file per chance?

I included this in my response below

wesm · 2017-11-12T19:32:01Z

js/gulp/test-task.js

Simple way to make this file path more easily configurable / less hard-coded?

I included this in my response below

wesm · 2017-11-12T19:34:57Z

js/npm-release.sh

Aside: what we're going to want to do for ASF release purposes: one script to produce a tarball of the JS project that is sufficient to publish to NPM after. Then a script in the tarball that can publish the project to NPM. So the ASF signed artifact that we upload to SVN will have everything that's needed to publish the project to NPM

@wesm That makes sense. The way I have things set up, we compile and publish multiple modules to npm:

one large-ish module that you can get via npm install apache-arrow

the rest as smaller/specialized modules under the @apache-arrow npm organization, which can be installed via the formula npm install @apache-arrow/<target-name>. For example, npm install @apache-arrow/es5-cjs installs the slimmed down ES5/CommonJS target

The npm run build command compiles all the output targets to the (gitignored) targets directory. The lerna publish --yes --skip-git --cd-version $bump --force-publish=* command publishes all the targets to npm. So from the sound of it, all we need to do is tar up the targets directory with a shell script that installs and runs lerna publish, and we're good to go? If so, I can do that tonight.

Seems reasonable. The other side of this is creating the signed tarball for voting, and making sure the tarball is sufficient for the post-release upload to NPM. We'll need to copy some files from the root directory (like the license and notice files). I can help with this

The build also copies extra files from the js folder into each of the packages, so we just need to change this line to ['../LICENSE.txt', '../NOTICE.txt', 'README.md']. Do we also need to add the info in js/LICENSE to the top-level notice.txt?

Yes, what's in js/LICENSE should go in the top level LICENSE.txt at the bottom. Then we can copy that one license into the JS tarball

wesm · 2017-11-13T05:05:33Z

js/src/reader/arrow.ts

What do you recommend as a resource for getting up to speed on TypeScript? Where is the line between TypeScript and ES6?

@wesm anything type-related (type annotations, interfaces, generics, and declarations like type = { foo: string }) is TypeScript, the rest is ES. If you're curious about an individual feature, you can run the build (npm run build) and compare the transpiled output (in the targets directory) with the TS source. We transpile to multiple JS versions and module formats, but it's probably easiest to compare against the targets/es2015/esm or targets/esnext/esm.

TS code-gens/polyfills missing features depending on the target environment. For example, ES5 doesn't have generators, so TS code-gens an iterator state machine into generator functions, but the ES2015 target leaves them as-is. And ES2015 doesn't have async iterator functions, so TS coge-gens the async iterator state machine with Promises, but the ESNext target leaves them as-is.

wesm · 2017-11-13T05:27:10Z

js/test/arrows/json/datetime.json

I'm not sure about checking in all these .json and .arrow files, is there some way we can automate their generation as part of the integration testing? Then they don't have to be modified when we expand the integration test suite

@wesm yes, that would be ideal.

I generated the JSON from the python integration tests like this (click to expand)

from integration_test import generate_nested_case from integration_test import generate_decimal_case from integration_test import generate_datetime_case from integration_test import generate_primitive_case from integration_test import generate_dictionary_case generate_nested_case().write("../js/test/arrows/json/nested.json") generate_decimal_case().write("../js/test/arrows/json/decimal.json") generate_datetime_case().write("../js/test/arrows/json/datetime.json") generate_dictionary_case().write("../js/test/arrows/json/dictionary.json") generate_primitive_case([7, 10]).write("../js/test/arrows/json/primitive.json") generate_primitive_case([0, 0, 0]).write("../js/test/arrows/json/primitive-empty.json")

Then I had to manually edit the "primitive.json" file to remove the "binary_nullable" and "binary_nonnullable" columns, because the C++ command fails if they're present (click to expand)

$ ../cpp/build/release/json-integration-test \ --integration --mode=JSON_TO_ARROW \ --json=./test/arrows/json/primitive.json \ --arrow=./test/arrows/cpp/file/primitive.arrow Found schema: bool_nullable: bool bool_nonnullable: bool not null int8_nullable: int8 int8_nonnullable: int8 not null int16_nullable: int16 int16_nonnullable: int16 not null int32_nullable: int32 int32_nonnullable: int32 not null int64_nullable: int64 int64_nonnullable: int64 not null uint8_nullable: uint8 uint8_nonnullable: uint8 not null uint16_nullable: uint16 uint16_nonnullable: uint16 not null uint32_nullable: uint32 uint32_nonnullable: uint32 not null uint64_nullable: uint64 uint64_nonnullable: uint64 not null float32_nullable: float float32_nonnullable: float not null float64_nullable: double float64_nonnullable: double not null binary_nullable: binary binary_nonnullable: binary not null utf8_nullable: string utf8_nonnullable: string not null Error message: Invalid: Encountered non-hex digit

The unit tests rely heavily on snapshot testing to validate the actual values in the vectors. I manually validated the data in the snapshots against the buffers using pyarrow and pandas, but that approach won't scale. Typically the snapshot files get checked into version control, but now that we have 11k snapshots, the snapshot files are around 21mb. I removed them from the repo b/c we don't want huge files. Now the CI server generates the snapshots once up front, then validates the compilation targets against those.

This will catch any cases where compiling the JS to different targets leads to failures (e.g. if the minifiers mangle names they weren't supposed to), but since we're not checking in the snapshot files, the CI server won't be able to tell us if a new PR causes a snapshot test to break. We can know that if we run the tests locally, but we can't rely us running the tests for each PR locally before merging.

I'm a bit torn here. On the one hand, I don't want to check in 21mb worth of tests to source control. On the other hand, I don't want to hand-write the 11k assertions that the snapshot tests represent (and would also presumably be many-MBs worth of tests anyway).

I believe git compresses files across the network? And if space-on-disk is an issue, I could add a post-clone script to automatically compress the snapshot files after checkout (about 3mb gzipped). Jest doesn't work with compressed snapshot files out of the box, but I could add some steps to the test runner to decompress the snapshots before running.

To your point about using the C++/Java writers to convert the JSON to Arrow buffers on the fly, we should 100% do that. This PR is marginally better since we can at least regenerate the arrow files easily enough, but ideally we don't have them at all and we can pipe them to the node process on the fly, or at a minimum, write to files then clean up after. We'll want a mode for local dev that skips this step, as incurring the JVM overhead to convert JSON to Arrow files is painful for debugging.

I left the code in there (commented out) to draw attention to this idea. I was on a plane when I checked this in and didn't have bandwidth to debug the CI scripts, otherwise I would have. We can pass the paths to the C++ and Java executables as CLI and/or env variables no problem.

I wasn't sure if the executables were already available in CI, or whether I would need to build them before running the JS tests. Building the C++ and Java projects in order to run the JS tests doesn't seem ideal, but I'm fine doing it if we have to.

wesm · 2017-11-13T05:33:05Z

js/test/integration-tests.ts

I'm a bit confused by these integration tests, maybe I'm missing something. Shouldn't JavaScript be reading the JSON and comparing what it thinks are the contents of the JSON with the binary coming from Java and C++? Here it looks like it is reading the C++ and Java binary files and comparing them with each other rather than its own version of the truth.

@wesm I think for this PR the goal was more to just replace the binary files in the repo with generated files, and adding tests to make sure our reader works the same for both C++ and Java generated files is just an added bonus. It might be a bit of a misnomer to call it an integration test.

I think the next step will be to add a JSON reader/writer and an Arrow IPC writer so we can integrate with the actual integration tests.

Got it. I think here we should comment exactly what's being tested (since these tests will pass if (and only if?) the Java <-> C++ integration tests are passing). If we can run these particular JS tests only in the Travis CI entry where we are already running the integration_test.py then we don't need to be checking in binary/JSON files to git

@wesm yes they should. @TheNeuralBit is right, the goal here was to get the JS version reading C++ and Java arrows identically, and this test is how I validate that it does. Right now it doesn't validate that it reads correctly (that's what the other tests do), just that there's no difference between the two. I think the plan is to update this test when we do have the JSON reader/writer in JS

TheNeuralBit

It's awesome to see support for so many more types, and consistency when reading both C++ and Java-generated files 🎉

TheNeuralBit · 2017-11-13T14:47:34Z

js/src/vector/arrow.ts

Why move the calls to nullableMixin and fieldMixin from here and out to each individual call? Are there some subtle differences in some vectors that I'm missing?

@TheNeuralBit yeah so the ES6 class spec states that the name property of a class constructor is immutable (and they're also not allowed to be computed properties, like let x = 'myClass'; class [x] extends Foo {}). And anonymous class names default to "Object", reading as Object { data: Int32Array } instead of Int32Vector when debugging. While this is ugly and hard to scale if we want to add more mixin behaviors, I figure it's a win for anyone using the library in the real world to see descriptive class names.

Ah makes sense, I figured there must be a good reason. This seems like a great application for some kind of preprocessor with macros or a code generator... but I don't know of any that would integrate well with JS build/dev tools

@TheNeuralBit yeah we could use babel-codegen, babel-preval, or babel-macros if we want. I was hoping to avoid babel if possible, but since we're webpacking the es2015+ UMD bundles anyway, it wouldn't be too much of a headache.

@TheNeuralBit but if we do want to do more compilation steps beyond what the TS compiler does, it'd be neat to also run prepack on the flatbuffers generated code

TheNeuralBit · 2017-11-13T15:15:53Z

js/test/integration-tests.ts

@wesm I think for this PR the goal was more to just replace the binary files in the repo with generated files, and adding tests to make sure our reader works the same for both C++ and Java generated files is just an added bonus. It might be a bit of a misnomer to call it an integration test.

I think the next step will be to add a JSON reader/writer and an Arrow IPC writer so we can integrate with the actual integration tests.

TheNeuralBit · 2017-11-13T16:07:35Z

js/gulpfile.js

Should createTestData and cleanTestData be uncommented so we can remove the arrow files from the repo? I'm thinking these are probably commented now so that other contributors will be able to run the tests without building the Java and C++ impls - if that's the case, maybe we should separate out integration tests, which require the other libraries, and unit tests, which can be run stand-alone?

@TheNeuralBit yes, definitely. I put these in to remind us to generate test data on the fly (and how to do it) for the tests, and remove the arrow files from the test directory. Making sure the CI environment had the C++ and Java libs built before the JS tests run was a bit more than I could bite off yesterday on my flight home :)

wesm · 2017-11-14T02:57:32Z

I'm a bit torn here. On the one hand, I don't want to check in 21mb worth of tests to source control. On the other hand, I don't want to hand-write the 11k assertions that the snapshot tests represent (and would also presumably be many-MBs worth of tests anyway).

I believe git compresses files across the network? And if space-on-disk is an issue, I could add a post-clone script to automatically compress the snapshot files after checkout (about 3mb gzipped). Jest doesn't work with compressed snapshot files out of the box, but I could add some steps to the test runner to decompress the snapshots before running.

I guess I'm not quite understanding what snapshot tests accomplish here that normal array comparisons would not. In Java and C++ we have functions that compare the contents of arrays. So when you say hand-writing the snapshot test assertions, what's being tested and why is that the only way to test that behavior? Is there a concern that a programmatic comparison like

https://github.com/apache/arrow/blob/master/cpp/src/arrow/ipc/json-integration-test.cc#L180

might not be as strong of an assertion as a UI-based test (what the values from the arrays would actually appear as in the DOM)?

Having the possibility of a single PR bloating the git history by whatever the snap files gzip down to doesn't seem like a good idea. Even having large diffs as the result of automatically generated files on commit isn't ideal

trxcllnt · 2017-11-14T04:06:05Z

@wesm the Jest docs on snapshot testing highlight its utility testing React components, but it's really just a form of test code generation. The tests evaluate all combinations of source lib x arrow format (in reality: [c++, java] x [file, stream]) for each of the generated files (nested, simple, decimal, datetime, primitive, primitive-empty, dictionary, and struct_example), so there are quite a few assertions.

Snapshots capture a bit of runtime type info that would otherwise have to be asserted explicitly, for example that calling `uint64Vector.get(i)` returns a `Uint32Array` of two elements:

exports[`readBuffers cpp stream primitive reads each batch as an Array of Vectors 167`] = `
Uint32Array [
  12840890,
  0,
]
`;

They're also helpful catching regressions (or comparing against pandas) in `Table.toString()`:

exports[`Table cpp file nested toString({ index: true }) prints a pretty Table with an Index column 1`] = `
"Index,                            list_nullable,        struct_nullable
    0,                                     null,       [null,\\"tmo7qBM\\"]
    1,                             [1685103474],      [-583988484,null]
    2,                             [1981297353], [-749108100,\\"yGRfkmw\\"]
    3,     [-2032422645,-2111456179,-895490422],       [820115077,null]
    4,                                     null,                   null
    5,             [null,-434891054,-864560986],                   null
    6,                                     null,  [986507083,\\"U6xvhr7\\"]
    7,                                     null,                   null
    8,                                     null,            [null,null]
    9,                                     null,                   null
   10,                             [-498865952],                   null
   11,                                     null,       [null,\\"ctyWPJf\\"]
   12,                                     null,            [null,null]
   13, [-1076160763,-792439045,-656549144,null],                   null
   14,                                     null,      [1234093448,null]
   15,                   [null,null,1882910932],                   null
   16,                                     null,  [934007407,\\"9QUyEm5\\"]"
`;

It also gives reviewers a chance to see what the tests produce, so if get on a Uint64Array starts returning a Long object instead of a Uint32Array, we can flag that in a code review. That said, it sounds like the JSON reader should be able to do most of this validation.

wesm · 2017-11-14T04:28:33Z

It sounds like this is useful for catching code regressions from PR to PR, but it clashes with my sensibilities about testing data libraries, which is that we should have explicit unit tests asserting the correct behavior. In the example you gave ("Snapshots capture a bit of runtime type info that would otherwise have to be asserted explicitly, for example that calling uint64Vector.get(i) returns a Uint32Array of two elements:"), my prior would be that this behavior should be asserted in a unit test.

I'm no expert, so there may be things I'm missing -- are some of the test assertions dependent on the flavor of the deployment target?

trxcllnt · 2017-11-14T05:09:47Z

Snapshots are just a different, data-centric way of writing assertions. Give it a lot of test data, validate the tests results once, then compare diffs after that. If you can eyeball test results and know whether it works, then the computer codegens all the dreadful bits about comparing types, values, etc. (even when it's late and you might otherwise forget to test an edge case).

I'm no expert, so there may be things I'm missing -- are some of the test assertions dependent on the flavor of the deployment target?

No, the assertions should be identical regardless of the compilation target -- they're generated once at the beginning, then all the targets are compared against the same snapshots.

I may have mentioned this before, but they've also helped catching minification bugs. Like before when we did return Long instances, Closure Compiler minified the class name down to something like "zw", so the snapshot tests failed for just the ES5/UMD target.

But on the whole I can't argue with your position. All I can say is I'm proably pretty lazy by normal standards, so I try to make my computer do as much of my homework as possible.

trxcllnt · 2017-11-14T05:22:37Z

Maybe another way to phrase it is, disk and network are cheap, my/our time is not. ;-)

edit: shit, I misread the snapshot count; we have 113,940 snapshots, not 11,394

wesm · 2017-11-14T22:29:03Z

Snapshots are just a different, data-centric way of writing assertions. Give it a lot of test data, validate the tests results once, then compare diffs after that. If you can eyeball test results and know whether it works, then the computer codegens all the dreadful bits about comparing types, values, etc. (even when it's late and you might otherwise forget to test an edge case).

This "validate the tests results once" part is where I'm getting lost. How do you know whether anything is correct if you don't write down what you expect to be true? I can help with rallying the troops to write more tests. I am a bit bandwidth constrained at the moment with all the 0.8.0 stuff in progress, but I am hopeful that some others can get involved and this will also help with beating on the API and finding rough edges. cc @leifwalsh @scottdraves

trxcllnt · 2017-11-14T23:10:10Z

@wesm This "validate the tests results once" part is where I'm getting lost. How do you know whether anything is correct if you don't write down what you expect to be true?

Ah right, in this case the JSON files are the initial source of truth. I compared the snapshots against the Arrow files read via pandas/pyarrow, and it looked correct. After this (assuming stable test data), the snapshots are the source of truth. If we decide to change the test data, then we have to re-validate the snapshots are what we expect them to be.

But I want to stress, I'm not against doing it differently. I'm also bandwidth constrained, and snapshots get high coverage with minimal effort. It sounds like the JSON reader should provide all the same benefits as snapshot testing. From that perspective, I see snapshots as a stop-gap until the JS JSON reader is done (unless there's a way we can validate columns with the C++ or Java JSON readers from the JS tests?)

With that in mind, I agree it's best not to commit the snapshots to the git history, if we're just going to remove them once the JSON reader is ready. In the interim, I don't mind validating any new JS PR's against my local snapshots, as the volume of JS PR's isn't that high yet.

wesm · 2017-11-15T03:06:08Z

Yes, definitely, we are in agreement. We should push for a JSON reader ASAP -- following the C++ reader as a guide, I do not think it is that big of a project, to be honest, when you consider all the hardship of dealing with parsing JSON in C++, which is a complete non-issue in JavaScript.

trxcllnt · 2017-11-15T03:49:34Z

@wesm that sounds good to me. In the meantime can we get this PR merged + finish ASF release scripts, and push a new version to npm? I'm at the point where not having the latest on npm is going to be a problem for projects at work soon, @TheNeuralBit may be feeling this too.

wesm · 2017-11-15T04:17:48Z

That's good for me. As soon as the release scripts are good to go I can conduct the release vote on the mailing list. We can close the vote in less than the usual 72 hours so long as we get 3 PMC votes. So we'll need a quick "here's how to verify the release candidate" blurb to direct people do when we start the release vote

wesm · 2017-11-16T21:53:20Z

Does anything still need to be done on this branch? If we are still not close to being able to cut a JS release by early next week I will rearrange my priorities to help out

trxcllnt · 2017-11-16T21:56:08Z

@wesm nope, only thing left to do is the ASF release scripts I think

trxcllnt · 2017-11-17T21:04:05Z

@wesm I understand you may be busy, so do you mind if I go ahead and merge this?

wesm · 2017-11-17T21:19:09Z

Sorry I have been dragging my feet because I’m not really on board with checking in data files that can be generated as part of CI. Per Slack conversation it seems there are some roadblocks so I’m available as needed today and tomorrow to get this sorted out

wesm · 2017-11-17T22:00:30Z

Sorry that I missed the thing. I will figure out what's going on here:

Then I had to manually edit the "primitive.json" file to remove the "binary_nullable" and "binary_nonnullable" columns, because the C++ command fails if they're present (click to expand)

$ ../cpp/build/release/json-integration-test \
  --integration --mode=JSON_TO_ARROW \
  --json=./test/arrows/json/primitive.json \
  --arrow=./test/arrows/cpp/file/primitive.arrow
Found schema: bool_nullable: bool
bool_nonnullable: bool not null
int8_nullable: int8
int8_nonnullable: int8 not null
int16_nullable: int16
int16_nonnullable: int16 not null
int32_nullable: int32
int32_nonnullable: int32 not null
int64_nullable: int64
int64_nonnullable: int64 not null
uint8_nullable: uint8
uint8_nonnullable: uint8 not null
uint16_nullable: uint16
uint16_nonnullable: uint16 not null
uint32_nullable: uint32
uint32_nonnullable: uint32 not null
uint64_nullable: uint64
uint64_nonnullable: uint64 not null
float32_nullable: float
float32_nonnullable: float not null
float64_nullable: double
float64_nonnullable: double not null
binary_nullable: binary
binary_nonnullable: binary not null
utf8_nullable: string
utf8_nonnullable: string not null
Error message: Invalid: Encountered non-hex digit

wesm · 2017-11-17T22:06:31Z

This doesn't fail for me locally:

$ ../cpp/build/debug/json-integration-test --integration --json=/tmp/tmp0jga4tt5/generated_primitive.json --arrow=foo.arrow --mode=JSON_TO_ARROW
Found schema: bool_nullable: bool
bool_nonnullable: bool not null
int8_nullable: int8
int8_nonnullable: int8 not null
int16_nullable: int16
int16_nonnullable: int16 not null
int32_nullable: int32
int32_nonnullable: int32 not null
int64_nullable: int64
int64_nonnullable: int64 not null
uint8_nullable: uint8
uint8_nonnullable: uint8 not null
uint16_nullable: uint16
uint16_nonnullable: uint16 not null
uint32_nullable: uint32
uint32_nonnullable: uint32 not null
uint64_nullable: uint64
uint64_nonnullable: uint64 not null
float32_nullable: float
float32_nonnullable: float not null
float64_nullable: double
float64_nonnullable: double not null
binary_nullable: binary
binary_nonnullable: binary not null
utf8_nullable: string
utf8_nonnullable: string not null

Change-Id: I2666e81e459581656fe8d02bcec3b1d0f7ccc7c6

wesm · 2017-11-19T21:55:26Z

I found the problem -- one of the primitive integration test files was being clobbered and not run, which was suppressing a failure that should have been raised a long time ago.

In the meantime, there was also a regression from the Java refactor, and we are no longer able to fully read unsigned integer types anymore. I will hack the integration tests for now and open a JIRA about fixing,

here's an example of trying to read a uint16 vector:

16:49:51.051 [main] DEBUG io.netty.util.Recycler - -Dio.netty.recycler.ratio: 8
Error accessing files
Numeric value (65350) out of range of Java short
 at [Source: /tmp/tmpwgopllpl/generated_primitive.json; line: 1111, column: 18]
16:49:51.065 [main] ERROR org.apache.arrow.tools.Integration - Error accessing files
com.fasterxml.jackson.core.JsonParseException: Numeric value (65350) out of range of Java short

…gned max for bit width Change-Id: Id06eaa8a4071a50a2958f46c41fe1e2ea80c2c82

wesm · 2017-11-19T22:10:50Z

OK, integration tests should pass now, fingers crossed. I will merge once the build is green, thanks @trxcllnt and @TheNeuralBit for the patience, it is appreciated

trxcllnt · 2017-11-19T22:59:42Z

@wesm awesome, thanks! After my latest commit, the integration tests all pass now for me locally. Are you fine with this PR as-is, or should I close it and do one from a new branch w/o the test data commits?

wesm · 2017-11-19T23:02:21Z

The PR tool should squash out those commits, so I don't think it's a problem. I'll let you know if I run into any issues after the build runs

wesm · 2017-11-20T01:08:04Z

There was a non-deterministic Plasma failure in the C++/Python entry, but the integration test entry also failed:

[00:23:46] Starting 'test:es5:umd'...
 FAIL  test/table-tests.ts
  ● Test suite failed to run
    Cannot find module '/home/travis/build/apache/arrow/js/targets/es5/umd/Arrow' from 'Arrow.ts'
      
      at Resolver.resolveModule (node_modules/jest-resolve/build/index.js:191:17)
      at Object.<anonymous> (test/Arrow.ts:50:17)
 FAIL  test/reader-tests.ts
  ● Test suite failed to run
    Cannot find module '/home/travis/build/apache/arrow/js/targets/es5/umd/Arrow' from 'Arrow.ts'
      
      at Resolver.resolveModule (node_modules/jest-resolve/build/index.js:191:17)
      at Object.<anonymous> (test/Arrow.ts:50:17)
 FAIL  test/integration-tests.ts
  ● Test suite failed to run
    Cannot find module '/home/travis/build/apache/arrow/js/targets/es5/umd/Arrow' from 'Arrow.ts'
      
      at Resolver.resolveModule (node_modules/jest-resolve/build/index.js:191:17)
      at Object.<anonymous> (test/Arrow.ts:50:17)
 FAIL  test/vector-tests.ts
  ● Test suite failed to run
    Cannot find module '/home/travis/build/apache/arrow/js/targets/es5/umd/Arrow' from 'Arrow.ts'
      
      at Resolver.resolveModule (node_modules/jest-resolve/build/index.js:191:17)
      at Object.<anonymous> (test/Arrow.ts:50:17)
Test Suites: 4 failed, 4 total
Tests:       0 total
Snapshots:   0 total
Time:        2.409s, estimated 24s
Ran all test suites.
[00:23:49] 'test:es5:umd' errored after 2.64 s
[00:23:49] Error: exited with error code: 1
    at ChildProcess.onexit (/home/travis/build/apache/arrow/js/node_modules/end-of-stream/index.js:39:36)
    at ChildProcess.emit (events.js:159:13)
    at Process.ChildProcess._handle.onexit (internal/child_process.js:209:12)
[00:23:49] 'test' errored after 3.42 min
npm ERR! Test failed.  See above for more details.

trxcllnt · 2017-11-20T01:12:40Z

@wesm yeah, looks like Closure Compiler threw an exception building the ES5 UMD target: https://travis-ci.org/apache/arrow/jobs/304499884#L4649. I'm not certain, but it could be related to the integration tests running with JDK7 instead of 8. I'll switch the job to use the JS version of the Closure Compiler which, while slower, won't be affected by Java externalities.

trxcllnt · 2017-11-20T03:58:54Z

@wesm actually this might be tough -- the JS version of closure-compiler is a bit outdated and broken, and the Java version hasn't supported Java 7 since October.

I don't want to skip running the tests on the ES5 UMD bundle, as that's the lowest-common-denominator for anyone wanting to experiment with Arrow in the browser, and the integration tests validate that public methods and properties don't get minified away.

Is it possible to update the integration job to openjdk8 (like the java version)? If not, I can create a sibling integration-java8 job that includes the JS tests.

wesm · 2017-11-20T04:17:19Z

Totally fine to move the integration tests to jdk8

…losure-compiler-scripts

Change-Id: Icae98c89f8989c6b33d1e7a16a1c05296211bc9b

trxcllnt · 2017-11-20T07:17:06Z

@wesm alright! now that the integration tests are passing, I added backwards-compatibility for Arrow files written before 0.8, re-enabled the datetime tests, and removed the generated arrow files from the performance tests. Should be good to go pending this last CI build.

wesm · 2017-11-20T14:19:53Z

Sweet thanks! So after we get the release scripts set up, we can release this, but one problem to be aware of is that the library in its current state cannot read 0.7.1 binary data and possibly not 0.8.0 (in its final form) binary data. Hopefully we can get ARROW-1785 sorted out soon

trxcllnt · 2017-11-20T18:22:15Z

@wesm I added 4811129 to support reading <= 0.7.1 buffers. I tested on the previous arrow files in the tests, plus a few I generated in pyarrow locally. Is there anything else we need to do on that?

wesm · 2017-11-20T19:32:27Z

Ah cool I missed that. I think we are good then, so I suggest we cut a JS release ASAP to make sure we've got the process down and then we can release again after 0.8.0 final goes out. I'm available this week to help out with this

trxcllnt changed the title ~~WIP ARROW-1693: Fix reading C++ dictionary-encoded vectors~~ WIP ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors Nov 9, 2017

trxcllnt changed the title ~~WIP ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors~~ ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors (WIP) Nov 9, 2017

trxcllnt mentioned this pull request Nov 9, 2017

ARROW-1785: [Format/C++/Java] Remove VectorLayout from serialized schemas #1297

Closed

trxcllnt force-pushed the generate-js-test-files branch 2 times, most recently from bf48cda to 00e54b5 Compare November 12, 2017 06:54

trxcllnt force-pushed the generate-js-test-files branch from 64e318c to e7ae4c7 Compare November 12, 2017 17:25

wesm changed the title ~~ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors (WIP)~~ ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors Nov 12, 2017

wesm reviewed Nov 13, 2017

View reviewed changes

TheNeuralBit reviewed Nov 13, 2017

View reviewed changes

Uppercase hex values

fd3ed47

Change-Id: I2666e81e459581656fe8d02bcec3b1d0f7ccc7c6

Hack until ARROW-1837 resolved. Constrain unsigned integers max to si…

c83a700

…gned max for bit width Change-Id: Id06eaa8a4071a50a2958f46c41fe1e2ea80c2c82

trxcllnt force-pushed the generate-js-test-files branch from 9393b42 to c86dc02 Compare November 19, 2017 22:54

compile JS source in integration tests

c72134a

trxcllnt force-pushed the generate-js-test-files branch from c86dc02 to c72134a Compare November 20, 2017 00:03

trxcllnt and others added 3 commits November 19, 2017 22:22

Add support for reading Arrow buffers < MetadataVersion 4

4811129

update closure compiler with full support for ESModules, and remove c…

019e8e2

…losure-compiler-scripts

Use openjdk8 for integration tests, jdk7 for main Java CI job

5660eb3

Change-Id: Icae98c89f8989c6b33d1e7a16a1c05296211bc9b

trxcllnt force-pushed the generate-js-test-files branch from 0a84b23 to 5660eb3 Compare November 20, 2017 06:26

trxcllnt added 3 commits November 19, 2017 23:14

fix Date64Vector TypedArray, enable datetime integration tests

14be77f

update performance tests to use generated test data

5972349

remove arrow files from perf tests

57c7df4

fix aggressive closure-compiler mangling in the ES5 UMD bundle

f907d5a

wesm changed the title ~~ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors~~ ARROW-1693: [JS] Expand JavaScript implementation, build system, fix integration tests Nov 20, 2017

wesm closed this in b3a3a74 Nov 20, 2017

trxcllnt deleted the generate-js-test-files branch November 21, 2017 15:09

This was referenced Jun 5, 2019

[Java] Unable to read unsigned integers outside signed range for bit width in integration tests #17831

Closed

[JS] Error reading dictionary-encoded integration test files #17701

Closed

ARROW-1693: [JS] Expand JavaScript implementation, build system, fix integration tests #1294

ARROW-1693: [JS] Expand JavaScript implementation, build system, fix integration tests #1294

Uh oh!

Conversation

trxcllnt commented Nov 9, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wesm commented Nov 9, 2017

Uh oh!

trxcllnt commented Nov 9, 2017

Uh oh!

wesm commented Nov 9, 2017

Uh oh!

trxcllnt commented Nov 12, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wesm commented Nov 12, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

trxcllnt Nov 14, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

trxcllnt Nov 13, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wesm Nov 13, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TheNeuralBit left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

trxcllnt Nov 14, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

trxcllnt commented Nov 9, 2017 •

edited

Loading

trxcllnt commented Nov 12, 2017 •

edited

Loading

trxcllnt Nov 14, 2017 •

edited

Loading

trxcllnt Nov 13, 2017 •

edited

Loading

wesm Nov 13, 2017 •

edited

Loading

trxcllnt Nov 14, 2017 •

edited

Loading

trxcllnt commented Nov 14, 2017 •

edited

Loading

trxcllnt commented Nov 14, 2017 •

edited

Loading

trxcllnt commented Nov 14, 2017 •

edited

Loading