Feature: As a devops engineer, I want an aissemble-managed helm chart for the Hive metastore service that uses a newer version of Hive, so I have access to the latest security fixes. #127

peter-mcclonski · 2024-06-05T15:49:49Z

Description

In order to improve usability and maintainability, we will be migrating to a v2 chart for the hive metastore service, keeping a similar usage pattern to that seen in #103. This ticket will encompass #116 as well, to update the underlying Hive metastore version.

Definition of Done

Update hive-metastore-service docker image to use Hive 4.0.0
Validate that the current v2 hive-metastore-service helm chart functions as expected
- If not, make necessary updates to ensure functionality.
- Refactor chart to live under extensions-helm-spark-infrastructure
Update generated values/Chart file in downstream projects using the v2 profile with sensible defaults

Test Strategy/Script

Generate a new project using the following command:

mvn archetype:generate -B -DarchetypeGroupId=com.boozallen.aissemble \
                          -DarchetypeArtifactId=foundation-archetype \
                          -DarchetypeVersion=1.8.0-SNAPSHOT \
                          -DartifactId=test-project\
                          -DgroupId=org.test \
                          -DprojectName='Test' \
                          -DprojectGitUrl=test.org/test-project\
&& cd test-project

Add the following pipeline to test-project-pipeline-models/src/main/resources/pipelines/

{
  "name": "PysparkPersist",
  "package": "com.boozallen",
  "type": {
    "name": "data-flow",
    "implementation": "data-delivery-pyspark"
  },
  "steps": [
    {
      "name": "PersistData",
      "type": "synchronous",
      "persist": {
        "type": "hive"
      }
    }
  ]
}

Add the following record to test-project-pipeline-models/src/main/resources/records/

{
  "name": "CustomRecord",
  "package": "com.boozallen.aiops.mda.pattern.record",
  "description": "Example custom record for Pyspark Data Delivery Patterns",
  "fields": [
    {
      "name": "customField",
      "type": {
        "name": "customType",
        "package": "com.boozallen.aiops.mda.pattern.dictionary"
      }
    }
  ]
}

Add the following dictionary to test-project-pipeline-models/src/main/resources/dictionaries/

{
  "name": "PysparkDataDeliveryDictionary",
  "package": "com.boozallen.aiops.mda.pattern.dictionary",
  "dictionaryTypes": [
    {
      "name": "customType",
      "simpleType": "string"
    }
  ]
}

Execute mvn clean install -Dmaven.build.cache.skipCache=true repeatedly, resolving all presented manual actions until none remain.
Within test-project-deploy/pom.xml, replace aissemble-spark-infrastructure-deploy with aissemble-spark-infrastructure-deploy-v2
Delete the directory test-project-deploy/src/main/resources/apps/spark-infrastructure
Delete all references to hive-metastore-service from your Tiltfile
Within test-project-pipelines/test-project-data-access/src/main/resources/application.properties, set quarkus.datasource.jdbc.url to jdbc:hive2://spark-infrastructure-sts-service:10001/default;transportMode=http;httpPath=cliservice
Within test-project-pipelines/pyspark-persist/src/pyspark_persist/step/persist_data.py, define the implementation for execute_step_impl as follows:

    def execute_step_impl(self) -> None:
        from ..record.custom_record import CustomRecord
        from ..schema.custom_record_schema import CustomRecordSchema
        custom_record = CustomRecord.from_dict({"customField": "foo"})
        record2 = CustomRecord.from_dict({"customField": "bar"})
        df = self.spark.createDataFrame(
            [
                custom_record,
                record2
            ],
            CustomRecordSchema().struct_type
        )
        self.save_dataset(df, "my_new_table")

Replace the contents of test-project-pipelines/pyspark-persist/src/pyspark_persist/resources/apps/pyspark-persist-dev-values.yaml with the following:

sparkApp:
    spec:
      image: "test-project-spark-worker-docker:latest"
      sparkConf:
        spark.eventLog.enabled: "false"
        spark.sql.catalogImplementation: "hive"
        spark.eventLog.dir: "s3a://spark-infrastructure/spark-events"
        spark.hadoop.fs.s3a.endpoint: "http://s3-local:4566"
        spark.hadoop.fs.s3a.access.key: "123"
        spark.hadoop.fs.s3a.secret.key: "456"
        spark.hadoop.fs.s3.impl: "org.apache.hadoop.fs.s3a.S3AFileSystem"
        spark.hive.server2.thrift.port: "10000"
        spark.hive.server2.thrift.http.port: "10001"
        spark.hive.server2.transport.mode: "http"
        spark.hive.metastore.warehouse.dir: "s3a://spark-infrastructure/warehouse"
        spark.hadoop.fs.s3a.path.style.access: "true"
        spark.hive.server2.thrift.http.path: "cliservice"
        spark.hive.metastore.schema.verification: "false"
        spark.hive.metastore.uris: "thrift://hive-metastore-service:9083/default"
      driver:
        cores: 1
        memory: "2048m"
      executor:
        cores: 1
        memory: "2048m"

Execute mvn clean install -Dmaven.build.cache.skipCache=true once.
Use kubectl apply -f to apply the following yaml:

apiVersion: v1
kind: ConfigMap
metadata:
  name: spark-config
data: {}

To avoid an unrelated bug, open your Tiltfile, and remove the entry for pipeline-invocation-service.
Execute tilt up
Once all resources are ready, trigger the pyspark-persist pipeline
Use kubectl get pods | grep data-access to get the name of the data access pod.
Use kubectl exec -it <DATA_ACCESS_POD_NAME> -- bash to enter the data access pod
Execute curl -X POST localhost:8080/graphql -H "Content-Type: application/json" -d '{ "query": "{ CustomRecord(table: \"my_new_table\") { customField } }" }' and ensure that data including two records is returned, ie: {"data":{"CustomRecord":[{"customField":null},{"customField":null}]}}
Note on step 19: If you don't get any values back, in a fresh prompt, execute kubectl get svc | grep sts. It can take a minute or two to provision the service.

References/Additional Context

The text was updated successfully, but these errors were encountered:

…upgrade Signed-off-by: Peter McClonski <[email protected]>

Cho-William · 2024-06-07T13:37:54Z

OTS completed

…upgrade Signed-off-by: Peter McClonski <[email protected]>

Signed-off-by: Peter McClonski <[email protected]>

…ce-v2 #127 #116 Hive Metastore Service v2 chart and Hive upgrade

Signed-off-by: Peter McClonski <[email protected]>

…ce-v2 #127 Rollback of deprecation notices

csun-cpointe · 2024-06-13T19:00:50Z

final test passed!!

peter-mcclonski added the enhancement New feature or request label Jun 5, 2024

peter-mcclonski self-assigned this Jun 5, 2024

peter-mcclonski added a commit to peter-mcclonski/aissemble that referenced this issue Jun 7, 2024

boozallen#127 boozallen#116 Hive Metastore Service v2 chart and Hive …

ab90b91

…upgrade Signed-off-by: Peter McClonski <[email protected]>

peter-mcclonski added a commit to peter-mcclonski/aissemble that referenced this issue Jun 11, 2024

boozallen#127 boozallen#116 Hive Metastore Service v2 chart and Hive …

b529c03

…upgrade Signed-off-by: Peter McClonski <[email protected]>

peter-mcclonski added a commit to peter-mcclonski/aissemble that referenced this issue Jun 11, 2024

boozallen#127 V2 Metastore Service Chart

0d5b45b

Signed-off-by: Peter McClonski <[email protected]>

peter-mcclonski added a commit to peter-mcclonski/aissemble that referenced this issue Jun 11, 2024

boozallen#127 V2 Metastore Service Chart

c7faf7c

Signed-off-by: Peter McClonski <[email protected]>

peter-mcclonski added a commit to peter-mcclonski/aissemble that referenced this issue Jun 11, 2024

boozallen#127 V2 Metastore Service Chart

395cc7e

Signed-off-by: Peter McClonski <[email protected]>

peter-mcclonski added a commit to peter-mcclonski/aissemble that referenced this issue Jun 11, 2024

boozallen#127 V2 Metastore Service Chart

c1e2c9c

Signed-off-by: Peter McClonski <[email protected]>

peter-mcclonski added a commit to peter-mcclonski/aissemble that referenced this issue Jun 11, 2024

boozallen#127 V2 Metastore Service Chart

66b6b9b

Signed-off-by: Peter McClonski <[email protected]>

ewilkins-csi added this to the 1.8.0 milestone Jun 12, 2024

peter-mcclonski added a commit to peter-mcclonski/aissemble that referenced this issue Jun 12, 2024

boozallen#127 V2 Metastore Service Chart

2b0c64a

Signed-off-by: Peter McClonski <[email protected]>

peter-mcclonski added a commit to peter-mcclonski/aissemble that referenced this issue Jun 12, 2024

boozallen#127 V2 Metastore Service Chart

128238c

Signed-off-by: Peter McClonski <[email protected]>

peter-mcclonski added a commit to peter-mcclonski/aissemble that referenced this issue Jun 12, 2024

boozallen#127 V2 Metastore Service Chart

0fc3907

Signed-off-by: Peter McClonski <[email protected]>

peter-mcclonski added a commit to peter-mcclonski/aissemble that referenced this issue Jun 12, 2024

boozallen#127 V2 Metastore Service Chart

5fb052e

Signed-off-by: Peter McClonski <[email protected]>

peter-mcclonski added a commit to peter-mcclonski/aissemble that referenced this issue Jun 13, 2024

boozallen#127 V2 Metastore Service Chart

92b423e

Signed-off-by: Peter McClonski <[email protected]>

peter-mcclonski added a commit that referenced this issue Jun 13, 2024

Merge pull request #130 from peter-mcclonski/127-hive-metastore-servi…

f858d0e

…ce-v2 #127 #116 Hive Metastore Service v2 chart and Hive upgrade

peter-mcclonski added a commit to peter-mcclonski/aissemble that referenced this issue Jun 13, 2024

boozallen#127 Rollback of deprecation notices

cf93f61

Signed-off-by: Peter McClonski <[email protected]>

peter-mcclonski added a commit to peter-mcclonski/aissemble that referenced this issue Jun 13, 2024

boozallen#127 Rollback of deprecation notices

a24fc36

Signed-off-by: Peter McClonski <[email protected]>

peter-mcclonski added a commit to peter-mcclonski/aissemble that referenced this issue Jun 13, 2024

boozallen#127 Rollback of deprecation notices

22b7631

Signed-off-by: Peter McClonski <[email protected]>

peter-mcclonski added a commit that referenced this issue Jun 13, 2024

Merge pull request #145 from peter-mcclonski/127-hive-metastore-servi…

8f13a4b

…ce-v2 #127 Rollback of deprecation notices

peter-mcclonski mentioned this issue Jun 13, 2024

Feature: Add support for JDK 17 - Core library #133

Closed

26 tasks

csun-cpointe closed this as completed Jun 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: As a devops engineer, I want an aissemble-managed helm chart for the Hive metastore service that uses a newer version of Hive, so I have access to the latest security fixes. #127

Feature: As a devops engineer, I want an aissemble-managed helm chart for the Hive metastore service that uses a newer version of Hive, so I have access to the latest security fixes. #127

peter-mcclonski commented Jun 5, 2024 •

edited

Loading

Cho-William commented Jun 7, 2024

csun-cpointe commented Jun 13, 2024

Feature: As a devops engineer, I want an aissemble-managed helm chart for the Hive metastore service that uses a newer version of Hive, so I have access to the latest security fixes. #127

Feature: As a devops engineer, I want an aissemble-managed helm chart for the Hive metastore service that uses a newer version of Hive, so I have access to the latest security fixes. #127

Comments

peter-mcclonski commented Jun 5, 2024 • edited Loading

Description

Definition of Done

Test Strategy/Script

References/Additional Context

Cho-William commented Jun 7, 2024

csun-cpointe commented Jun 13, 2024

peter-mcclonski commented Jun 5, 2024 •

edited

Loading