Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PARQUET-2: Adding Type Persuasion for Primitive Types #3

Closed
wants to merge 3 commits into from

Conversation

danielcweeks
Copy link

Original from the old repo: https://github.com/Parquet/parquet-mr/pull/410
JIRA: https://issues.apache.org/jira/browse/PARQUET-2

These changes allow primitive types to be requested as different types than what is stored in the file format using a flag to turn off strict type checking (default is on). Types are cast to the requested type where possible and will suffer precision loss for casting where necessary (e.g. requesting a double as an int).

No performance penalty is imposed for using the type defined in the file type. A flag exists to

A 6x6 test case is provided to test conversion between the primitive types.

Daniel Weeks added 2 commits June 19, 2014 10:51
…ct type checking for conflicting schemas, which is strict by default.
julienledem referenced this pull request in julienledem/parquet-mr Jun 19, 2014
Original from the old repo: Parquet/parquet-mr#410
JIRA: https://issues.apache.org/jira/browse/PARQUET-2

These changes allow primitive types to be requested as different types than what is stored in the file format using a flag to turn off strict type checking (default is on). Types are cast to the requested type where possible and will suffer precision loss for casting where necessary (e.g. requesting a double as an int).

No performance penalty is imposed for using the type defined in the file type.  A flag exists to

A 6x6 test case is provided to test conversion between the primitive types.

Author: Daniel Weeks <[email protected]>

Closes #3 from dcw-netflix/type-persuasion and squashes the following commits:

1c3c0c7 [Daniel Weeks] Fixed test with strict checking off
f3cb495 [Daniel Weeks] Added type persuasion for primitive types with a flag to control strict type checking for conflicting schemas, which is strict by default.
@@ -195,6 +195,13 @@ public boolean equals(Object other) {
* @return the union result of merging toMerge into this
*/
protected abstract Type union(Type toMerge);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should put the default implementation here:

protected Type union(Type toMerge) {
    return union(toMerge, true);
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to put the default implementation in the abstract class, but the
maven enforcer plugin wouldn't allow me to do it. I assume removing the
abstract is considered an interface change.

On Fri, Jun 20, 2014 at 9:21 PM, Julien Le Dem [email protected]
wrote:

In parquet-column/src/main/java/parquet/schema/Type.java:

@@ -195,6 +195,13 @@ public boolean equals(Object other) {
* @return the union result of merging toMerge into this
*/
protected abstract Type union(Type toMerge);

We should put the default implementation here:

protected Type union(Type toMerge) {
return union(toMerge, true);
}


Reply to this email directly or view it on GitHub
https://github.com/apache/incubator-parquet-mr/pull/3/files#r14048282.

@julienledem
Copy link
Member

Thanks Daniel!
I made a few comments.
Otherwise this LGTM!

@julienledem
Copy link
Member

please prefix the PR title with "PARQUET-2: "

@danielcweeks danielcweeks changed the title Adding Type Persuasion for Primitive Types PARQUET-2: Adding Type Persuasion for Primitive Types Jun 23, 2014
@asfgit asfgit closed this in 9ad5485 Jun 24, 2014
@julienledem
Copy link
Member

Thank you @dcw-netflix !

asfgit pushed a commit that referenced this pull request Feb 10, 2015
…2 api

Currently for creating a user defined predicate using the new filter api, no value can be passed to create a dynamic filter at runtime. This reduces the usefulness of the user defined predicate, and meaningful predicates cannot be created. We can add a generic Object value that is passed through the api, which can internally be used in the keep function of the user defined predicate for creating many different types of filters.
For example, in spark sql, we can pass in a list of filter values for a where IN clause query and filter the row values based on that list.

Author: Yash Datta <[email protected]>
Author: Alex Levenson <[email protected]>
Author: Yash Datta <[email protected]>

Closes #73 from saucam/master and squashes the following commits:

7231a3b [Yash Datta] Merge pull request #3 from isnotinvain/alexlevenson/fix-binary-compat
dcc276b [Alex Levenson] Ignore binary incompatibility in private filter2 class
7bfa5ad [Yash Datta] Merge pull request #2 from isnotinvain/alexlevenson/simplify-udp-state
0187376 [Alex Levenson] Resolve merge conflicts
25aa716 [Alex Levenson] Simplify user defined predicates with state
51952f8 [Yash Datta] PARQUET-116: Fix whitespace
d7b7159 [Yash Datta] PARQUET-116: Make UserDefined abstract, add two subclasses, one accepting udp class, other accepting serializable udp instance
40d394a [Yash Datta] PARQUET-116: Fix whitespace
9a63611 [Yash Datta] PARQUET-116: Fix whitespace
7caa4dc [Yash Datta] PARQUET-116: Add ConfiguredUserDefined that takes a serialiazble udp directly
0eaabf4 [Yash Datta] PARQUET-116: Move the config object from keep method to a configure method in udp predicate
f51a431 [Yash Datta] PARQUET-116: Adding type safety for the filter object to be passed to user defined predicate
d5a2b9e [Yash Datta] PARQUET-116: Enforce that the filter object to be passed must be Serializable
dfd0478 [Yash Datta] PARQUET-116: Add a test case for passing a filter object to user defined predicate
4ab46ec [Yash Datta] PARQUET-116: Pass a filter object to user defined predicate in filter2 api
rdblue referenced this pull request in rdblue/parquet-mr Mar 9, 2015
…2 api

Currently for creating a user defined predicate using the new filter api, no value can be passed to create a dynamic filter at runtime. This reduces the usefulness of the user defined predicate, and meaningful predicates cannot be created. We can add a generic Object value that is passed through the api, which can internally be used in the keep function of the user defined predicate for creating many different types of filters.
For example, in spark sql, we can pass in a list of filter values for a where IN clause query and filter the row values based on that list.

Author: Yash Datta <[email protected]>
Author: Alex Levenson <[email protected]>
Author: Yash Datta <[email protected]>

Closes apache#73 from saucam/master and squashes the following commits:

7231a3b [Yash Datta] Merge pull request #3 from isnotinvain/alexlevenson/fix-binary-compat
dcc276b [Alex Levenson] Ignore binary incompatibility in private filter2 class
7bfa5ad [Yash Datta] Merge pull request #2 from isnotinvain/alexlevenson/simplify-udp-state
0187376 [Alex Levenson] Resolve merge conflicts
25aa716 [Alex Levenson] Simplify user defined predicates with state
51952f8 [Yash Datta] PARQUET-116: Fix whitespace
d7b7159 [Yash Datta] PARQUET-116: Make UserDefined abstract, add two subclasses, one accepting udp class, other accepting serializable udp instance
40d394a [Yash Datta] PARQUET-116: Fix whitespace
9a63611 [Yash Datta] PARQUET-116: Fix whitespace
7caa4dc [Yash Datta] PARQUET-116: Add ConfiguredUserDefined that takes a serialiazble udp directly
0eaabf4 [Yash Datta] PARQUET-116: Move the config object from keep method to a configure method in udp predicate
f51a431 [Yash Datta] PARQUET-116: Adding type safety for the filter object to be passed to user defined predicate
d5a2b9e [Yash Datta] PARQUET-116: Enforce that the filter object to be passed must be Serializable
dfd0478 [Yash Datta] PARQUET-116: Add a test case for passing a filter object to user defined predicate
4ab46ec [Yash Datta] PARQUET-116: Pass a filter object to user defined predicate in filter2 api
costimuraru pushed a commit to costimuraru/parquet-mr that referenced this pull request Apr 29, 2017
costimuraru pushed a commit to costimuraru/parquet-mr that referenced this pull request Apr 29, 2017
shangxinli added a commit to shangxinli/parquet-mr that referenced this pull request Mar 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants