Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to arrow 20.0.0 (but no change to object_store), including prost, and tonic #3083

Merged
merged 7 commits into from
Aug 9, 2022

Conversation

avantgardnerio
Copy link
Contributor

@avantgardnerio avantgardnerio commented Aug 8, 2022

Closes #3006.

Closes #3085
Closes #3086
Closes #3087

This is an alternative to #3007 but it retains object_store = "0.3.0".

Rationale for this change

We'll have to do it sooner or later, but mostly I want the new authentication capabilities in the FlightSqlServer code.

What changes are included in this PR?

An attempt at upgrading to the latest arrow.

Are there any user-facing changes?

Users should be able to use arrow 20

Brent Gardner added 4 commits August 8, 2022 09:50
fix decimal (#4)

Fix human error

Patch crates io to fix build (#5)

* fix decimal

* patch crate versions

Patch objectstore

Test in CI

Undo override?

Fix more errors

Fix last error?

Formatting

Clippy

Fixes

Fix refs

Able to get session context, but JDBC driver hung

Upgrade to arrow 20

Upgrade to RC2

Formatting

Fix some imports

Install protoc

Try platform agnostic path

Debug in CI :(

Debug in CI :(

Debug in CI :(

Not worth it, just separate builds

Variables

Fixes

Fix windows?

Fix windows?

Hackily fix windows

Down to 1 failure

Fix protoc

All? tests pass

Formatting
@github-actions github-actions bot added core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Physical Expressions sql SQL Planner labels Aug 8, 2022
@avantgardnerio
Copy link
Contributor Author

@tustvold here's the other version with object_store = "0.3.0"

@tustvold
Copy link
Contributor

tustvold commented Aug 8, 2022

👀 This makes no sense, none of this should have changed 😢

@tustvold
Copy link
Contributor

tustvold commented Aug 8, 2022

Oh, you've made a load of path changes - can you revert these 🙏

let format = JsonFormat::default().with_schema_infer_max_rec(Some(3));

let store_root = std::path::Path::new(env!("CARGO_MANIFEST_DIR"));
let filename = store_root.join("tests/jsons/schema_infer_limit.json");
let filename = filename.to_str().expect("Unable to get path!");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverting these should make things work

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thank you! I missed these when reverting things. I'm pushing a new commit to this PR that I think will be releasable.

Sorry I missed these, I appreciate your help 🙌

Copy link
Contributor

@tustvold tustvold Aug 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries at all, for reference the reason this breaks is that paths on Windows not only don't start with / but when canonicalized get converted to extended length path syntax. This is not a valid object store path, and so it is complaining about this.

Paths on Windows is a complete nonsensical mess, as a result LocalFileSystem actually just implements the file URI standard which basically behaves like paths on Linux (or paths on anything that isn't Windows 😆 ). As such if you feed it something that isn't a file URI path, it will complain

@avantgardnerio
Copy link
Contributor Author

@alamb and @andygrove likely this will be a better release candidate than #3007 for the reasons @tustvold explained above.

@codecov-commenter
Copy link

Codecov Report

Merging #3083 (be64e65) into master (85d5363) will increase coverage by 0.00%.
The diff coverage is 96.01%.

@@           Coverage Diff           @@
##           master    #3083   +/-   ##
=======================================
  Coverage   85.93%   85.94%           
=======================================
  Files         289      289           
  Lines       52101    52100    -1     
=======================================
+ Hits        44774    44775    +1     
+ Misses       7327     7325    -2     
Impacted Files Coverage Δ
...usion/core/src/avro_to_arrow/arrow_array_reader.rs 0.00% <0.00%> (ø)
datafusion/core/src/catalog/information_schema.rs 94.42% <0.00%> (ø)
datafusion/core/src/physical_plan/hash_utils.rs 40.43% <0.00%> (ø)
datafusion/core/src/physical_plan/repartition.rs 93.26% <ø> (ø)
...tafusion/core/src/physical_plan/sort_merge_join.rs 90.36% <0.00%> (ø)
datafusion/core/tests/sql/aggregates.rs 99.36% <ø> (ø)
datafusion/core/tests/sql/decimal.rs 100.00% <ø> (ø)
datafusion/core/tests/sql/joins.rs 99.33% <ø> (ø)
datafusion/expr/src/type_coercion.rs 98.87% <ø> (ø)
...on/physical-expr/src/expressions/binary/adapter.rs 83.33% <ø> (ø)
... and 34 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

@avantgardnerio
Copy link
Contributor Author

@alamb & @andygrove the only remaining failure looks like an intermittent network issue. I think we're GTG here 🎉

@alamb
Copy link
Contributor

alamb commented Aug 8, 2022

Great -- I think we are on track for an arrow-rs release in a few hours (I need to wait for the official 72 hour voting period to end) -- 🔜

@alamb
Copy link
Contributor

alamb commented Aug 8, 2022

Thanks so much for this @avantgardnerio

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@avantgardnerio do you have time today to update this to use the released arrow 20.0.0? If not I can make a PR (based on this one) to get in.

@avantgardnerio
Copy link
Contributor Author

@avantgardnerio do you have time today to update this to use the released arrow 20.0.0? If not I can make a PR (based on this one) to get in.

I'll do that now.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like epic work @avantgardnerio -- thanks again

@@ -35,6 +35,15 @@ list to help you get started.

This section describes how you can get started at developing DataFusion.

### Windows setup

```shell
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still correct? Does it need any additional explination?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just put that in there based upon the setup of a VM I did yesterday. I do tend to like scripts-as-doc as it eliminates the ambiguity of English. Ideally CI would start with a vanilla Windows machine and run these scripts, but I guess github did that part for us so we just have to doc it some where.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is valuable -- thank you

datafusion-jit = { path = "../jit", version = "10.0.0", optional = true }
datafusion-optimizer = { path = "../optimizer", version = "10.0.0" }
datafusion-physical-expr = { path = "../physical-expr", version = "10.0.0" }
datafusion-row = { path = "../row", version = "10.0.0" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't remember if the versions are needed to cargo publish these crates to crates.io -- perhaps @andygrove remembers / knows?

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me -- 🤞 for a clean CI run

@avantgardnerio
Copy link
Contributor Author

Looks good to me -- crossed_fingers for a clean CI run

I ran all the checks locally and they passed ... so 🤞

@alamb alamb changed the title Upgrade to arrow 20.0.0 (but no change to object_store) Upgrade to arrow 20.0.0 (but no change to object_store), including prost, and tonic Aug 9, 2022
@alamb
Copy link
Contributor

alamb commented Aug 9, 2022

🚀

@alamb alamb merged commit 3eb55e9 into apache:master Aug 9, 2022
@alamb
Copy link
Contributor

alamb commented Aug 9, 2022

Thanks again everyone for all the work to get this through

@avantgardnerio avantgardnerio deleted the bg_old_objectstore branch August 9, 2022 16:49
@ursabot
Copy link

ursabot commented Aug 9, 2022

Benchmark runs are scheduled for baseline = 1e44417 and contender = 3eb55e9. 3eb55e9 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Physical Expressions sql SQL Planner
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The upcoming release of Arrow (20?) breaks datafusion
5 participants