-
Notifications
You must be signed in to change notification settings - Fork 551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] How to make this hive/presto setup query parquet on AWS S3 #17
Comments
Tried to create external table:
But ended up
I tried the same keys to lookup the files on the path in s3 using aws cli and it was there. |
We have a schema / external table on AWS athena for this folder as well, and it was working fine. |
I have add aws-hadoop macen dependency
|
Yep, rebuilding container 'bde2020/hive:2.3.2-postgresql-metastore' with adding aws jar dependencies and replacing it for hive-server and hive-metastore worked pretty well to me as @GrigorievNick recommended.
More info docs. |
@shveikus @GrigorievNick Can you provide the detailed steps to make this working. I'm new to docker. I tried adding following to dockerfile after creating pom.xml in the same folder. No luck. |
Hi @Ravipillala
And pom.xml
And build command
I am personally deploying this to k8s, so I will add YAML, you can see ENV variables and configs there. apiVersion: v1
data:
HIVE_AUX_JARS_PATH: /external-jars/
HIVE_SITE_CONF_datanucleus_autoCreateSchema: "false"
HIVE_SITE_CONF_fs_s3a_access_key: Enter your key jere
HIVE_SITE_CONF_fs_s3a_secret_key: enter you secret
HIVE_SITE_CONF_hive_metastore_uris: thrift://hive-metastore:9083
HIVE_SITE_CONF_hive_metastore_warehouse_dir: s3a://stage-presto-iceberg-metadata/iceberg-catalog
HIVE_SITE_CONF_javax_jdo_option_ConnectionDriverName: org.postgresql.Driver
HIVE_SITE_CONF_javax_jdo_option_ConnectionPassword: hive
HIVE_SITE_CONF_javax_jdo_option_ConnectionURL: jdbc:postgresql://hive-metastore-postgresql/metastore
HIVE_SITE_CONF_javax_jdo_option_ConnectionUserName: hive
kind: ConfigMap
metadata:
name: hive-env
namespace: bigdata
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: hive-metastore
namespace: bigdata
spec:
selector:
matchLabels:
app: hive-metastore
replicas: 1
template:
metadata:
labels:
app: hive-metastore
spec:
nodeSelector:
kubernetes.io/lifecycle: normal
containers:
- name: hive-metastore
image: artifactory.com:5000/mykola.hryhoriev/hive:2.3.2-postgresql-metastore
args:
- /opt/hive/bin/hive --service metastore
imagePullPolicy: Always
env:
- name: SERVICE_PRECONDITION
value: hive-metastore-postgresql:5432
envFrom:
- configMapRef:
name: hive-env
ports:
- containerPort: 9083 |
Hi @Ravipillala , you can try this:
ADD pom.xml pom.xml
RUN mvn dependency:copy-dependencies -DoutputDirectory=/external-jars/
FROM bde2020/hive:2.3.2-postgresql-metastore
COPY --from=DEPENDENCY_BUILD /external-jars/ /external-jars/ and pom.xml (this is my own)
Hope, this will be useful. |
Hi, |
Thank you |
Simply tried to add aws hadoop jar to path for every each docker image and did not seem to work. Any advices?
Thanks in advance.
The text was updated successfully, but these errors were encountered: