Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PARQUET-1292 Adding constructors to ProtoParquetWriter with writeSpecsCompliant flag #473

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

chawlakunal
Copy link

This PR adds constructors to ProtoParquetWriter with flags for specs compliant writer (based on #411). Now a specs compliant ProtoParquetWriter can be created as

new ProtoParquetWriter(file, protoMessageClass, true);

…ag so that a specs compliant writer can be instantiated
}

/**
* Create a new {@link ProtoParquetWriter}. The default block size is 50 MB.The default

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BenoitHanotte
Copy link

BenoitHanotte commented May 3, 2018

Thanks @chawlakunal for the commit, it looks good to me. I believe there is a slight mistake in a comment (see my review above).
You may want to open a Jira ticket for this commit and add the ticket name in the message ;)

@chawlakunal chawlakunal changed the title Adding constructors to ProtoParquetWriter with writeSpecsCompliant flag PARQUET-1292 Adding constructors to ProtoParquetWriter with writeSpecsCompliant flag May 3, 2018
@chawlakunal
Copy link
Author

chawlakunal commented May 3, 2018

@BenoitHanotte Happy to contribute!!1

I have created a JIRA for this but I am unable to assign it to myself.

@zivanfi
Copy link
Contributor

zivanfi commented May 4, 2018

@chawlakunal, I added you the list of contributors so that you can assign JIRA-s to yourself. I assigned this one to you to make sure it works.

@chawlakunal
Copy link
Author

@BenoitHanotte Can this be approved and merged?

@lukasnalezenec
Copy link
Contributor

Hi,
Thank you for your work.
I looked at it briefly, I will send you detailed opinion after weekend.
I am not sure if we need so many new constructors.
We should publish the Configuration parameter from superclass.

@chawlakunal
Copy link
Author

Hi @lukasnalezenec
Waiting to hear from you.

@costimuraru
Copy link

+1 on finding a way to pass the writeSpecsCompliant flag directly to the ProtoParquetWriter and simplify things.

Right now we have to do something like:

Configuration conf = new Configuration();
ProtoWriteSupport.setWriteSpecsCompliant(conf, true);

ParquetWriter<MyMessage> writer 
   = new ParquetWriter<>(file, new ProtoWriteSupport<MyMessage>(cls), CompressionCodecName.GZIP, 256 * 1024 * 1024, 1 * 1024 * 1024,  1 * 1024 * 1024, true, false, ParquetWriter.DEFAULT_WRITER_VERSION, conf);

@chawlakunal
Copy link
Author

I am still waiting for comments from @lukasnalezenec

@lukasnalezenec
Copy link
Contributor

lukasnalezenec commented May 18, 2018

I think that problem is that there was no ProtoParquetWriter constructor with conf parameter.
I would recommend adding the conf parameter and not writeSpecsCompliant parameter.
AFAIK there is no elegant solution how we can accept writeSpecsCompliant parameter, set its value to configuration and pass it super constructor.
(see getConfigWithWriteSpecsCompliant())

I would take existing constructors and add those two new constructors:

ProtoParquetWriter(Path file, Class<? extends Message> protoMessage, Configuration conf);

ProtoParquetWriter(Path file, Class<? extends Message> protoMessage, ParquetFileWriter.Mode mode, CompressionCodecName compressionCodecName, int blockSize, int pageSize, int dictionaryPageSize, boolean enableDictionary, boolean validating, WriterVersion writerVersion, Configuration conf

We might also extend some other existing constructors with conf parameter. (I am not sure if it is necessary)

After this change we might turn on the parameter like this:

Configuration conf = new Configuration();
ProtoWriteSupport.setWriteSpecsCompliant(conf, true);

ParquetWriter<MyMessage> writer  = new ProtoParquetWriter<>(file, cls, conf);`

If you disagree with me, I'm open to another solution.

@costimuraru
Copy link

@lukasnalezenec From my perspective, it looks great!

@BenoitHanotte
Copy link

Hello @costimuraru @chawlakunal
As we might again be adding a new flag if #410 passes, I think we may want to follow @lukasnalezenec suggestion and try to have constructors that accept the conf object, which would allow to avoid duplicating all the existing constructors for each new flag.

@chawlakunal
Copy link
Author

@BenoitHanotte I'll work on that sometime this week.

@costimuraru
Copy link

@chawlakunal any luck?

@georgehdd
Copy link

@chawlakunal Any news on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants