Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: As a data engineer, I want Spark backed by the latest version of Hive so I have the latest fixes and features #116

Open
5 tasks
ewilkins-csi opened this issue May 30, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@ewilkins-csi
Copy link
Contributor

ewilkins-csi commented May 30, 2024

Description

Upgrade to a newer version of Hive. We are currently on Hive 3.0 and the latest version is 4.0.0, which is a good starting target. If there are too many breaking changes, or we decide we want to give 4.0 time to mature, 3.1.3 is the newest 3.x version. There are now public DockerHub images published by Apache that package hive and would be a great base to start from, however they are all still running Java 8 which is EOL. We could consider aligning our image with the paradigms of Apache's public images so that when they do update the Java version we are poised to switch more easily

DOD

  • Hive metastore service Docker image is updated to run Hive 4.0.0 (or 3.1.3)
  • Hive metastore service Helm chart is updated to maintain functionality with any image changes

--- If Helm chart changes are required, the following items must also be completed. ---

  • Migration instructions are written for moving from the v1 Hive service chart to the v2 chart
  • The v1 Helm chart Baton migration is updated to skip updating the image version for the v1 Hive chart
  • The release notes are updated indicating the v2 Hive chart is available and v1 is deprecated and no longer compatible with the new docker image

Test Strategy/Script

  • Create a project with Data Access and a pipeline that reads and writes data to both Hive and Delta Lake
  • Run the pipeline to create and read test data
  • Query Data Access to ensure the test data can be retrieved
  • Create a v1.6 project with a simple pipeline
  • Follow the v1 -> v2 migration instructions for Hive Metastore service
  • Ensure the project can deploy and the pipeline can run
@ewilkins-csi ewilkins-csi added the enhancement New feature or request label May 30, 2024
peter-mcclonski added a commit to peter-mcclonski/aissemble that referenced this issue Jun 7, 2024
peter-mcclonski added a commit to peter-mcclonski/aissemble that referenced this issue Jun 11, 2024
@ewilkins-csi ewilkins-csi added this to the 1.8.0 milestone Jun 12, 2024
peter-mcclonski added a commit that referenced this issue Jun 13, 2024
…ce-v2

#127 #116 Hive Metastore Service v2 chart and Hive upgrade
@csun-cpointe csun-cpointe modified the milestones: 1.8.0, 1.9.0 Aug 5, 2024
@ewilkins-csi ewilkins-csi modified the milestones: 1.9.0, 1.10.0 Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants