fix: pin arrow version to 15.0#203
Conversation
Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>
Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>
Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>
Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>
Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>
|
I recall we discussed about it in lance-format/lance#5565 (comment), can we check if this would still work with java 21 and spark 4.0? We are currently only testing java 17: https://github.com/lance-format/lance-spark/blob/main/.github/workflows/spark.yml#L60 |
Arrow:
Spark:
I tested using arrow 15 on Spark 3.5 with Java 21 locally and everything worked. This PR doesn't change anything for our Lance Spark 4.0 story (still using arrow 18 on whatever Java is configured), but downgrades arrow version on 3.4 / 3.5. So if those work with newer Java versions IIUC that covers our concerns right? |
|
Can we add Spark 4.0 + Java 21 in the CI matrix so we are certain it runs and passes? |
|
Also could you explain why the unit and docker tests are working fine, but it fails in the EMR environment for Spark 3.5? I think ideally we should keep a higher version. If it's just the problem of certain platform, we should consider just providing a guide for how to use lower version arrow with that platform. |
Great question. I just opened a PR with integration tests on docker, AFAIK we do not actually have anything in docker that runs as part of CI (once I clean these up we probably should). Running these tests on The issue, as I understand it, is that Spark ships with a version of Arrow that is lower than I attempted to circumvent this using mavens shading plugin to essentially host the Arrow dependency in So the ways I see to fix this:
|
Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>
I see, thanks for the verification!
If that is the case, should we just exclude arrow from the lance-core dependencies when imported in lance-spark? Something like Would that work, so we don't have to ping the arrow version and it just use the one comes with Spark? |
|
Some reference for how Iceberg Spark does it: |
... Java sorcery ... |
|
Closing because these changes are included in #205 |
The recent upgrade of arrow version (15.0 -> 18.3) in lance-core does not play well with Spark 3.5. This results in failures within the Lance Spark connector for simple operations (ex.
CREATE TABLE). In this PR we pin the arrow version to 15.0 (used for ~2 years) for all 3.5 and 3.4 profiles. The other alternative approaches are to (1) downgrade lance-core arrow dependency or (2) maintain separate lance-core branches. This seems like the most reasonable approach.Closes: #196