Skip to content

Conversation

@nevi-me
Copy link
Contributor

@nevi-me nevi-me commented Jan 8, 2021

parquet-format 2.7.0 was released yesterday, and our cargo.toml file specifies 2.6.1, which means cargo will update this on new builds or of you run cargo update

alamb@ip-192-168-0-182:~/Software/arrow2/rust$ cargo update
cargo update
    Updating crates.io index
    Updating parquet-format v2.6.1 -> v2.7.0

Sadly this version of new parquet format causes a build error:

157 Compiling arrow-flight v3.0.0-SNAPSHOT (/__w/arrow/arrow/rust/arrow-flight) 
 158error[E0063]: missing fields `file_offset`, `ordinal`, `total_compressed_size` in initializer of `RowGroup` 
 159 --> parquet/src/file/metadata.rs:262:9 
 160 | 
 161262 | RowGroup { 
 162 | ^^^^^^^^ missing `file_offset`, `ordinal`, `total_compressed_size` 
 163
 164error[E0063]: missing field `bloom_filter_offset` in initializer of `ColumnMetaData` 
 165 --> parquet/src/file/metadata.rs:501:31 
 166 | 
 167501 | let column_metadata = ColumnMetaData { 
 168 | ^^^^^^^^^^^^^^ missing `bloom_filter_offset` 
 169
 170error[E0063]: missing fields `crypto_metadata`, `encrypted_column_metadata` in initializer of `ColumnChunk` 
 171 --> parquet/src/file/metadata.rs:517:9 
 172 | 
 173517 | ColumnChunk { 
 174 | ^^^^^^^^^^^ missing `crypto_metadata`, `encrypted_column_metadata` 
 175

This PR pins the version so that it will not pick up 2.7

@nevi-me nevi-me changed the title ARROW-11167: [Rust] [Parquet] Pin specific parquet-format-rs version ARROW-11187: [Rust] [Parquet] Pin specific parquet-format-rs version Jan 8, 2021
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I verified that this fixes the build for me locally. I think once the CI completes we should merge it in. Thanks @nevi-me !

@nevi-me
Copy link
Contributor Author

nevi-me commented Jan 8, 2021

I verified that this fixes the build for me locally. I think once the CI completes we should merge it in. Thanks @nevi-me !

We had pinned the version at some point, we must have unpinned it while updating dependencies.

@alamb alamb changed the title ARROW-11187: [Rust] [Parquet] Pin specific parquet-format-rs version ARROW-11184: [Rust] [Parquet] Pin specific parquet-format-rs version Jan 8, 2021
@alamb alamb changed the title ARROW-11184: [Rust] [Parquet] Pin specific parquet-format-rs version ARROW-11187: [Rust] [Parquet] Pin specific parquet-format-rs version Jan 8, 2021
@alamb
Copy link
Contributor

alamb commented Jan 8, 2021

We had pinned the version at some point, we must have unpinned it while updating dependencies.

I think given that 2.6.1 --> 2.7.0 was a breaking change, the default versioning scheme wanted to see it use major version numbers (aka 2.7.0 should have been 3.0.0)

@alamb
Copy link
Contributor

alamb commented Jan 8, 2021

Thanks for fixing this @nevi-me !

@alamb alamb changed the title ARROW-11187: [Rust] [Parquet] Pin specific parquet-format-rs version ARROW-11187: [Rust] [Parquet] Pin specific parquet-format-rs version and fix build Jan 8, 2021
@alamb alamb changed the title ARROW-11187: [Rust] [Parquet] Pin specific parquet-format-rs version and fix build ARROW-11187: [Rust] [Parquet] Fix Build error by Pin specific parquet-format-rs version Jan 8, 2021
@mqy
Copy link
Contributor

mqy commented Jan 8, 2021

Thanks @nevi-me I guess the reason that I can't reproduce this failure may related to crate cache somehow?
Sorry, I did not aware the problem! I had read "2.6.1" as fixed version, that's totally wrong. From https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html, it seems 2.6.* should work as well.

@nevi-me
Copy link
Contributor Author

nevi-me commented Jan 8, 2021

@alamb the version of the crate follows the Parquet format versioning. I was thinking about it today, I'll open a PR in the crate, suggesting how the versioning can work going forward.

We released 2.7.0, and we're going to release 2.8.0 soon, to follow the parquet format versions.

We can use x.y._ to denote the parquet-format specification version, and then use _._.y for bugfixes on the crate where we don't change the format version, like we did with 2.6.0 and 2.6.1. CC @sunchao

@github-actions
Copy link

github-actions bot commented Jan 8, 2021

@alamb
Copy link
Contributor

alamb commented Jan 8, 2021

Thanks @nevi-me I guess the reason that I can't reproduce this failure may related to crate cache somehow?

@mqy you can probably reproduce the problem if you run cargo update which will force cargo to pull all the latest dependencies

@mqy
Copy link
Contributor

mqy commented Jan 8, 2021

Thanks @nevi-me I guess the reason that I can't reproduce this failure may related to crate cache somehow?

@mqy you can probably reproduce the problem if you run cargo update which will force cargo to pull all the latest dependencies

@alamb reproduced, thanks! Here is the output from my local cargo update:

    Updating crates.io index
    Updating ahash v0.6.1 -> v0.6.2
    Updating anyhow v1.0.34 -> v1.0.37
    Updating byteorder v1.3.4 -> v1.4.0
    Updating cc v1.0.65 -> v1.0.66
    Updating const_fn v0.4.3 -> v0.4.5
    Updating futures v0.3.8 -> v0.3.9
    Updating futures-channel v0.3.8 -> v0.3.9
    Updating futures-core v0.3.8 -> v0.3.9
    Updating futures-executor v0.3.8 -> v0.3.9
    Updating futures-io v0.3.8 -> v0.3.9
    Updating futures-macro v0.3.8 -> v0.3.9
    Updating futures-sink v0.3.8 -> v0.3.9
    Updating futures-task v0.3.8 -> v0.3.9
    Updating futures-util v0.3.8 -> v0.3.9
    Removing getrandom v0.1.15
    Removing getrandom v0.2.0
      Adding getrandom v0.1.16
      Adding getrandom v0.2.1
    Updating heck v0.3.1 -> v0.3.2
    Updating http v0.2.1 -> v0.2.2
    Updating indexmap v1.6.0 -> v1.6.1
    Updating itoa v0.4.6 -> v0.4.7
    Updating libc v0.2.80 -> v0.2.82
    Updating net2 v0.2.36 -> v0.2.37
    Updating ordered-float v1.1.0 -> v1.1.1
    Updating parquet-format v2.6.1 -> v2.7.0
    Updating paste v1.0.3 -> v1.0.4
    Updating pin-project v1.0.2 -> v1.0.3
    Updating pin-project-internal v1.0.2 -> v1.0.3
    Updating pin-project-lite v0.2.0 -> v0.2.1
    Updating quote v1.0.7 -> v1.0.8
    Updating rand v0.8.0 -> v0.8.1
    Updating rand_core v0.6.0 -> v0.6.1
    Updating regex v1.4.2 -> v1.4.3
    Updating regex-syntax v0.6.21 -> v0.6.22
    Updating serde v1.0.117 -> v1.0.118
    Updating serde_derive v1.0.117 -> v1.0.118
    Updating serde_json v1.0.59 -> v1.0.61
    Updating smallvec v1.5.0 -> v1.6.0
    Updating socket2 v0.3.17 -> v0.3.19
    Updating syn v1.0.53 -> v1.0.58
    Updating thiserror v1.0.22 -> v1.0.23
    Updating thiserror-impl v1.0.22 -> v1.0.23
    Updating thread_local v1.0.1 -> v1.1.0
    Updating time v0.1.44 -> v0.1.43
    Updating tinytemplate v1.1.0 -> v1.2.0
    Updating tokio v0.2.23 -> v0.2.24
    Updating tower-layer v0.3.0 -> v0.3.1
    Updating wasi v0.10.0+wasi-snapshot-preview1 -> v0.10.1+wasi-snapshot-preview1

BTW, should we check and pin all dependencies in this way to avoid suddenly broking?

@codecov-io
Copy link

Codecov Report

Merging #9138 (0969097) into master (98159f1) will decrease coverage by 0.03%.
The diff coverage is 77.75%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9138      +/-   ##
==========================================
- Coverage   82.60%   82.57%   -0.04%     
==========================================
  Files         204      204              
  Lines       50496    50879     +383     
==========================================
+ Hits        41713    42011     +298     
- Misses       8783     8868      +85     
Impacted Files Coverage Δ
rust/arrow-pyarrow-integration-testing/src/lib.rs 0.00% <ø> (ø)
rust/benchmarks/src/bin/tpch.rs 7.02% <ø> (ø)
...datafusion/src/physical_plan/string_expressions.rs 87.50% <ø> (ø)
rust/datafusion/src/optimizer/utils.rs 58.18% <45.45%> (-0.54%) ⬇️
rust/datafusion/src/sql/utils.rs 53.92% <47.05%> (-0.68%) ⬇️
rust/datafusion/src/physical_plan/functions.rs 78.51% <63.63%> (-1.49%) ⬇️
rust/arrow/src/compute/kernels/sort.rs 93.56% <68.62%> (+0.14%) ⬆️
rust/datafusion/src/logical_plan/expr.rs 76.92% <77.27%> (+0.02%) ⬆️
rust/datafusion/src/physical_plan/expressions.rs 83.77% <78.53%> (-0.71%) ⬇️
rust/arrow/src/ffi.rs 75.95% <80.00%> (+0.28%) ⬆️
... and 10 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c4ee536...0969097. Read the comment docs.

@nevi-me
Copy link
Contributor Author

nevi-me commented Jan 8, 2021

BTW, should we check and pin all dependencies in this way to avoid suddenly broking?

Historically, we seldom have issues with minor version of crates being updated. I think the effort to pin everything isn't worth it. It's actually by luck that we caught the parquet-format-rs update before a new major Arrow release.

@github-actions
Copy link

github-actions bot commented Jan 8, 2021

@github-actions
Copy link

github-actions bot commented Jan 8, 2021

Copy link
Member

@sunchao sunchao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nevi-me for catching this up! and sorry for breaking the CI (again). For some reason I also thought parquet-format = "2.6.1" is exact version match. The syntax is a bit confusing.


[dependencies]
parquet-format = "2.6.1"
parquet-format = "~2.6.1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we add a comment here for future reference? I think this is not the first time we got this error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sunchao @alamb @mqy given the long CI times, is it okay if we piggy-back on #9133 to add the comment?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. SGTM.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes for sure

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The update note was added to 8cf807f

@jorgecarleitao
Copy link
Member

Having a library version be set after a format version is funny, as it is entirely possible to backward incompatibly change an API without backward incompatibly change the format, and vice-versa.

IMO the contract that different implementations sign about a format is independent of the contract that the consumers of a library sign with the developers of the library.

Given the decision to version that package as such, IMO we should pin the exact version in Cargo.toml, as there is no guarantee that a minor bump does not break backward compatibility (as this incident shows), and ~ in cargo assumes semver.

@alamb
Copy link
Contributor

alamb commented Jan 8, 2021

IMO we should pin the exact version in Cargo.toml

I agree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants