Skip to content

Conversation

rameeshm
Copy link
Contributor

@rameeshm rameeshm commented Sep 6, 2025

What changes were proposed in this pull request?

Include Apache Tez as the process framework for ranger-hive docker

  • This fixes issue with Insert command in beeline
  • Data processing is much faster with Tez's DAG for processing.
  • Addressed review comments
  • Addressed issue with hadoop and hive container individual start up

How was this patch tested?

Testing in Docker running HiveServer 2 beeline and execute "INSERT" statement for DAG.
TezJob

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR integrates Apache Tez as the processing framework for the ranger-hive Docker setup to enable faster data processing through DAG execution and resolve issues with INSERT commands in beeline.

  • Adds Tez binary distribution and configuration files for Hive integration
  • Updates Hadoop YARN configuration to support Tez execution
  • Creates comprehensive Tez configuration across all Hive database variants

Reviewed Changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tez-site.xml New Tez configuration template with memory and execution settings
ranger-hive-setup.sh Adds Tez setup, YARN configuration, and HDFS directory creation
ranger-hadoop-setup.sh Enhances YARN configuration and installs Tez JARs for NodeManager
hive-site-*.xml Adds Tez execution engine configuration to all database variants
hive-site-metastore-mysql.xml New metastore-specific configuration with Tez support
create-users.sh New script for creating test users (alice, abram)
download-archives.sh Adds Tez binary download support
docker-compose files Updates build arguments and environment variables for Tez
Dockerfiles Integrates Tez installation and user creation across containers
.env Updates Hadoop version compatibility and adds Tez version
Comments suppressed due to low confidence (1)

dev-support/ranger-docker/.env:1

  • The KAFKA_VERSION line appears to be missing after the HIVE_HADOOP_VERSION change. This could break Kafka-related builds that depend on this environment variable.
BUILD_HOST_SRC=true

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

 docker - Review comments adddress, hadoop and hive ssh issue while startup addressed, removed not need configs
@rameeshm rameeshm requested a review from mneethiraj September 8, 2025 00:54
@kumaab
Copy link
Contributor

kumaab commented Sep 8, 2025

Thank you @rameeshm for the patch, I believe this is tested with Ubuntu base image, please see if this can be tested with UBI base image as well, this change needs to made in .env file: RANGER_BASE_VERSION=20250712-1-ubi-8, thanks!

… docker - changes to use ranger base image for user creation, fix issue with usage of ranger base image in other containers
@rameeshm
Copy link
Contributor Author

rameeshm commented Oct 1, 2025

Thank you @rameeshm for the patch, I believe this is tested with Ubuntu base image, please see if this can be tested with UBI base image as well, this change needs to made in .env file: RANGER_BASE_VERSION=20250712-1-ubi-8, thanks!

@kumaab current patch with the review comments tested with RANGER_BASE_VERSION=[20250712-1-ubi-8]

@rameeshm rameeshm requested review from kumaab and mneethiraj October 1, 2025 05:49
… docker - addressed review comment on issue related to base immage
@rameeshm rameeshm requested a review from mneethiraj October 8, 2025 21:29
… docker - address review comment on the switch user statement
@rameeshm rameeshm requested a review from kumaab October 8, 2025 22:53
Copy link
Contributor

@kumaab kumaab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, few minor changes.

ln -s /opt/hadoop-${HADOOP_VERSION} /opt/hadoop && \
rm -f /home/ranger/dist/hadoop-${HADOOP_VERSION}.tar.gz && \
tar xvfz /home/ranger/dist/apache-tez-${TEZ_VERSION}-bin.tar.gz --directory=/opt/ && \
ln -s /opt/apache-tez-${TEZ_VERSION}-bin /opt/tez && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove the apache-tez-<version>-bin.tar.gz after un-taring ? as it's done for other tarballs.

<name>hive.zookeeper.client.port</name>
<value>2181</value>
</property>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file may be removed, as support for SQL Server is going to be deprecated soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants