Download Maven from apache.org archives [skip ci] #10225

gerashegalov · 2024-01-19T09:34:54Z

Fixes #10224

Replace broken install using apt by downloading Maven from apache.org.

Signed-off-by: Gera Shegalov [email protected]

Signed-off-by: Gera Shegalov <[email protected]>

gerashegalov · 2024-01-19T09:35:30Z

build

jlowe

Small nit that's not mustfix.

jlowe · 2024-01-19T14:49:25Z

jenkins/databricks/build.sh

+    if [[ ! -d $HOME/apache-maven-3.6.3 ]]; then
+        wget https://archive.apache.org/dist/maven/maven-3/3.6.3/binaries/apache-maven-3.6.3-bin.tar.gz -P /tmp
+        tar xf /tmp/apache-maven-3.6.3-bin.tar.gz -C $HOME
+        sudo ln -s $HOME/apache-maven-3.6.3/bin/mvn /usr/local/bin/mvn


Nit: Should the .tar.gz be deleted here?

gerashegalov · 2024-01-19T18:19:13Z

Looking into 330db build failure on Databricks Azure 11.3 based on the blossom-ci PR check

gerashegalov · 2024-01-19T19:21:28Z

Rerunning with skip-ci because 330db will fail due to #10228 and non-Databricks CI is not affected by this change

gerashegalov · 2024-01-19T19:22:03Z

build

Fixes NVIDIA#10224 Replace broken install using apt by downloading Maven from apache.org. Signed-off-by: Gera Shegalov <[email protected]>

* Download Maven from apache.org archives (#10225) Fixes #10224 Replace broken install using apt by downloading Maven from apache.org. Signed-off-by: Gera Shegalov <[email protected]> * Fix a hang for Pandas UDFs on DB 13.3[databricks] (#9833) fix #9493 fix #9844 The python runner uses two separate threads to write and read data with Python processes, however on DB13.3, it becomes single-threaded, which means reading and writing run on the same thread. Now the first reading is always ahead of the first writing. But the original BatchQueue will wait on the first reading until the first writing is done. Then it will wait forever. Change made: - Update the BatchQueue to support asking for a batch instead of waiting unitl one is inserted into the queue. This can eliminate the order requirement of reading and writing. - Introduce a new class named BatchProducer to work with the new BatchQueue to support rows number peek on demand for the reading. - Apply this new BatchQueue to relevant plans. - Update the Python runners to support writing one batch one time for the singled-threaded model. - Found an issue about PythonUDAF and RunningWindoFunctionExec, it may be a bug specific to DB 13.3, and add a test (test_window_aggregate_udf_on_cpu) for it. - Other small refactors --------- Signed-off-by: Firestarman <[email protected]> * Fix a potential data corruption for Pandas UDF (#9942) This PR moves the BatchQueue into the DataProducer to share the same lock as the output iterator returned by asIterator, and make the batch movement from the input iterator to the batch queue be an atomic operation to eliminate the race when appending the batches to the queue. * Do some refactor for the Python UDF code to try to reduce duplicate code. (#9902) Signed-off-by: Firestarman <[email protected]> * Fixed 330db Shims to Adopt the PythonRunner Changes [databricks] (#10232) This PR removes the old 330db shims in favor of the new Shims, similar to the one in 341db. **Tests:** Ran udf_test.py on Databricks 11.3 and they all passed. fixes #10228 --------- Signed-off-by: raza jafri <[email protected]> --------- Signed-off-by: Gera Shegalov <[email protected]> Signed-off-by: Firestarman <[email protected]> Signed-off-by: raza jafri <[email protected]> Co-authored-by: Gera Shegalov <[email protected]> Co-authored-by: Liangcai Li <[email protected]>

Download Maven from apache.org archives

6c01bef

Signed-off-by: Gera Shegalov <[email protected]>

gerashegalov added bug Something isn't working build Related to CI / CD or cleanly building labels Jan 19, 2024

gerashegalov self-assigned this Jan 19, 2024

gerashegalov requested review from GaryShen2008, NvTimLiu, jlowe, revans2 and tgravescs as code owners January 19, 2024 09:34

gerashegalov changed the title ~~Download Maven from apache.org archives~~ Download Maven from apache.org archives [databricks] Jan 19, 2024

jlowe approved these changes Jan 19, 2024

View reviewed changes

gerashegalov mentioned this pull request Jan 19, 2024

330db build failure "GpuArrowPythonRunner.scala:80: not found: type WriterThread" #10228

Closed

gerashegalov changed the title ~~Download Maven from apache.org archives [databricks]~~ Download Maven from apache.org archives [skip ci] Jan 19, 2024

gerashegalov merged commit b949674 into NVIDIA:branch-24.02 Jan 19, 2024

gerashegalov deleted the mavenDownload branch January 19, 2024 19:45

gerashegalov mentioned this pull request Jan 19, 2024

Improve the Maven distro download workaround #10231

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Download Maven from apache.org archives [skip ci] #10225

Download Maven from apache.org archives [skip ci] #10225

Uh oh!

gerashegalov commented Jan 19, 2024

Uh oh!

gerashegalov commented Jan 19, 2024

Uh oh!

jlowe left a comment

Uh oh!

jlowe Jan 19, 2024

Uh oh!

gerashegalov commented Jan 19, 2024 •

edited

Loading

Uh oh!

gerashegalov commented Jan 19, 2024

Uh oh!

gerashegalov commented Jan 19, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Download Maven from apache.org archives [skip ci] #10225

Download Maven from apache.org archives [skip ci] #10225

Uh oh!

Conversation

gerashegalov commented Jan 19, 2024

Uh oh!

gerashegalov commented Jan 19, 2024

Uh oh!

jlowe left a comment

Choose a reason for hiding this comment

Uh oh!

jlowe Jan 19, 2024

Choose a reason for hiding this comment

Uh oh!

gerashegalov commented Jan 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gerashegalov commented Jan 19, 2024

Uh oh!

gerashegalov commented Jan 19, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gerashegalov commented Jan 19, 2024 •

edited

Loading