Skip to content

Add support and tests for OpenX JSON SerDe in Hive connector#12213

Closed
anusudarsan wants to merge 2 commits intotrinodb:masterfrom
anusudarsan:anu/openx-serde
Closed

Add support and tests for OpenX JSON SerDe in Hive connector#12213
anusudarsan wants to merge 2 commits intotrinodb:masterfrom
anusudarsan:anu/openx-serde

Conversation

@anusudarsan
Copy link
Member

@anusudarsan anusudarsan commented May 2, 2022

Description

Is this change a fix, improvement, new feature, refactoring, or other?

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

How would you describe this change to a non-technical end user or system administrator?

Related issues, pull requests, and links

Documentation

( ) No documentation is needed.
(X) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

( ) No release notes entries required.
(X) Release notes entries required with the following suggested text:

# Section
* Add support for OpenX Json serde format for Hive connector

@cla-bot cla-bot bot added the cla-signed label May 2, 2022
@anusudarsan anusudarsan force-pushed the anu/openx-serde branch 5 times, most recently from 59a167c to 72e6311 Compare May 3, 2022 20:06
pom.xml Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There shouldn't be any references to other repositories than maven central. Otherwise, it can be hard or problematic to build the project in environments that sit behind firewalls.

Copy link
Member

@MiguelWeezardo MiguelWeezardo May 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we can get rid of this dependency. The serde has shims for HDP 3 and CDH 7 (default), both of which have their own repos. I'm not sure if it will be possible to generalize it to only dependencies found on Maven Central.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out this isn't too difficult: starburstdata/hive-json-serde#6

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anusudarsan anusudarsan force-pushed the anu/openx-serde branch 3 times, most recently from dec5fc8 to 78639d6 Compare May 3, 2022 22:39
@anusudarsan anusudarsan marked this pull request as draft May 3, 2022 22:54
@anusudarsan anusudarsan force-pushed the anu/openx-serde branch 4 times, most recently from 0869a9f to 8215e39 Compare May 4, 2022 02:26
@findepi findepi removed their request for review May 4, 2022 11:45
@anusudarsan anusudarsan force-pushed the anu/openx-serde branch 6 times, most recently from ef63ca0 to e11ce91 Compare May 5, 2022 17:01
@anusudarsan
Copy link
Member Author

anusudarsan commented May 10, 2022

This PR fails when we remove the cloudera maven repo, and enforcer plugin has issues with a "provided" dependency.

The build error message is due to a bug (missing feature) in maven-enforcer-plugin version 3.0.0 (which is the version we set in airbase). A warning in enforcer-plugin 3.0.0-M3 has been upgraded to an error in 3.0.0 and is not a configurable flag yet https://issues.apache.org/jira/browse/MENFORCER-413

... the problem here for maven enforcer plugin is that it is too strict. I think we can just ignore the optional and provided dependencies while checking, or at least provide an option to let users ignore them.

pom.xml Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t know why we’re adding all these dependencies, but something is wrong. Is this depending on a non-shaded version of Hadoop?

Copy link
Member Author

@anusudarsan anusudarsan May 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@electrum yes its not using the shaded version of hadoop. The lib uses these versions of hive/hadoop (https://github.com/starburstdata/hive-json-serde/blob/develop/pom.xml#L41). @MiguelWeezardo have we tried using the shaded version of hadoop for this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tried shading hadoop yet.

Copy link
Member

@MiguelWeezardo MiguelWeezardo May 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

starburstdata/hive-json-serde#8 - first attempt at shading, unfortunately it fails to load in Trino.

@findepi
Copy link
Member

findepi commented Mar 29, 2023

Is this superseded by @dain's #16073 ?

@colebow
Copy link
Member

colebow commented Mar 30, 2023

👋 @anusudarsan - this PR is inactive and doesn't seem to be under development. If you'd like to continue work on this at any point in the future, feel free to re-open.

@colebow colebow closed this Mar 30, 2023
@anusudarsan anusudarsan deleted the anu/openx-serde branch April 3, 2023 22:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

6 participants