PARQUET-601: Add support to configure the encoding used by ValueWriters #342

piyushnarang · 2016-05-05T21:24:25Z

Context:

Parquet is currently structured to choose the appropriate value writer based on the type of the column as well as the Parquet version. As of now, the writer(s) (and hence encoding) for each data type is hard coded in the Parquet source code.

This PR adds support for being able to override the encodings per type via config. That allows users to experiment with various encoding strategies manually as well as enables them to override the hardcoded defaults if they don't suit their use case.

We can override encodings per data type (int32 / int64 / ...).
Something on the lines of:

parquet.writer.encoding-override.<type> = "encoding1[,encoding2]"

As an example:

"parquet.writer.encoding-override.int32" = "plain"
(Chooses Plain encoding and hence the PlainValuesWriter).

When a primary + fallback need to be specified, we can do the following:

"parquet.writer.encoding-override.binary" = "rle_dictionary,delta_byte_array"
(Chooses RLE_DICTIONARY encoding as the initial encoding and DELTA_BYTE_ARRAY encoding as the fallback and hence creates a FallbackWriter(PlainBinaryDictionaryValuesWriter, DeltaByteArrayWriter).

In such cases we can mandate that the first encoding listed must allow for Fallbacks by implementing RequiresFallback.

PR notes:

Restructured the ValuesWriter creation code. Pulled it out of ParquetProperties into a new class and refactored the flow based on type as it was getting hard to follow and I felt adding the overrides would make it harder. Added a bunch of unit tests to verify the ValuesWriter we create for combinations of type, parquet version and dictionary on / off.
Added unit tests to verify parsing of the encoding overrides + creation of ValuesWriters based on these overrides.
Manually tested some encoding overrides scenarios out on Hadoop (both parquet v1, v2).

piyushnarang · 2016-05-09T17:36:43Z

@isnotinvain / @rdblue - please take a look..

isnotinvain · 2016-05-10T01:15:55Z

just to record an offline discussion we just had:

I think the goal of this is more along the lines of creating an encoding selection strategy which gets to choose encodings / encoding implementations dynamically at runtime. Something like:

interface ValuesWriterSelectionStrategy {
  public ValueWriter getWriter(columnMetaData);
}

Then we can say

org.apache.parquet.value.writer.selection.strategy.int32=com.example.FooSelectionStrategy

Now our FooSelectionStrategy might itself be a factory of ValueWriters that implement fallback, or it might even be one that buffers the first N values and runs a heuristic on them, or it might "race" N different ValueWriters against each other and pick the best one at the end. Then, the parquet-core logic can ask the ValueWriter "so what Encoding did you wind up using" and that's what will go into the page metadata.

But what this allows is for us to write our own logic not only for "what type gets what encoding" but also "how do I change my mind about an encoding based on the data I'm seeing" (aka fallback) in a generic way that is tied to just dictionaries.

hkothari · 2016-05-11T03:59:02Z

Is the plan then that you would only be able to choose the encoding types via the Strategies that you mentioned? Or would there also be some sort of lightweight way to specify the encoding type explicitly as well for certain columns (as suggested by the initial comment).

I ask because this would be super useful to have in Spark and I can imagine tons of situations where information is known about the data beforehand and people would like to be able to explicitly specify the column encodings. That's not to say that the strategies wouldn't be useful (they would be), or that you couldn't jerryrig and explicit setting into a strategy but I think it would be useful to have explicit setting be possible in some first class way.

(If either of you guys are closer to the spark implementation, feel free to point out if this is irrelevant, but I know Spark parquet depends on the parquet-mr so I suspect whatever you guys do here will affect me if I want to include support in Spark)

isnotinvain · 2016-05-11T05:41:59Z

Hi @hkothari,

So what I was thinking is, I'm not entirely convinced that it's a useful feature for users to be able to easily configure what encoding is used for what type of column, eg "use delta encoding for integers". The reason I say that is, a lot of the encodings depend on what's actually in the data (are the integers close together / sorted, or are they random ids?). And what happens when you've got 2 columns of the same type, but with different attributes (an int column that's sorted, and one that's random)? One way would be to let users choose an encoding per field, instead of per type. But I worry that this will spiral into way too much to reasonably configure.

Instead, I would rather we make the heuristics inside of parquet good enough that they can choose a good encoding on their own, as the data is being looked at, and, because our first set of heuristics will probably not be the best ones, we can let users bring their own strategy, though ideally we would be folding those strategies back into parquet-mr if they are good / better than what we have.

That said, as you pointed out, we could implement a constant strategy, that always picks the 1 encoding you asked it to, and we could make a shorthand for using that strategy as this PR initially planned to. Do you think that is a feature that you would still find useful, even if we have a decent automatic selection strategy? I think if we're going to do that, it should probably be per-field not per-type.

hkothari · 2016-05-11T15:45:42Z

I'm not proposing per column type. I don't think that's nearly as useful as per actual column. In a lot of the cases I've worked in, you either know certain columns will have certain distributions beforehand (I'm recieving this data sorted by the "purchase_date", so deltaencode my "purchase_date" column but not my other date columns, or I know one column is pretty clustered but has tons of distinct values so RLE it) or you don't.

In the unknown case a strategy makes sense. But in the known case which happens fairly often in my experience a strategy can be slower on writes (a metric people have complained about already for parquet) or in the case that something goes wrong it's just suboptimal. In those cases, it's helpful to be able to explicitly override to what you know is more optimal.

I'm open to doing this as an ExplicitStrategy or something, as long as it's flexible enough.

piyushnarang · 2016-05-11T17:46:17Z

Thanks for chiming in @hkothari good to get some additional feedback :-). I like the idea of being able to explicitly specify the encoding for a given column type (int / bool / ...). One of the reasons (apart from the ones already discussed above) is that users could want to ensure they optimize for different variables apart from what we know at write time. For example you might want to ensure that your read path is not super expensive and that might conflict with the write side constraint of minimizing size on disk. We could possibly tackle this with a sophisticated WriteSelectionStrategy (that potentially accounts for this) but it might end up being easier to specify manually and prototype.

I'm a bit vary though of being able to override on a per column basis (rather than per type). It definitely is more accurate but what I've seen is that most of our datasets are really large and make this level of overriding painful. (It could conceivably be useful to others with smaller schemas though).

I was thinking of maybe using @isnotinvain 's idea with a possibility of allowing column type overrides.
Something on the lines of:

parquet.writer.selection.strategy.int32=com.example.ConfigSelectionStrategy
parquet.writer.encoding-override.int32=rle_dictionary,plain
...

ConfigSelectionStrategy basically can look up the actual encoding to use based on what is configured in parquet.writer.encoding-override.<type> and fallback to the default if it isn't configured there yet (similar to what was originally proposed in the PR). This could also be extended to per column overrides if needed (we'll just need to ensure that adequate information is passed to the ConfigSelectionStrategy so that it knows which column it's trying to create a writer for..

isnotinvain · 2016-05-11T18:14:58Z

In order to keep the first pass of this PR simple and constrained, I think we should first build the selection strategy interface and a way to set it in the config, eg:

parquet.writer.encoding.selection.strategy=com.example.MyCoolStrategy

Just doing this part puts us in a good position to add more built-in strategies, like the manual-per-column strategy or the per-type strategy and so on.

As for whether to specify a single global strategy or a strategy per-type, I was initially thinking just one strategy that handles all types, but one strategy per type would also be fine.

piyushnarang · 2016-05-18T20:40:06Z

@isnotinvain / @hkothari, I've updated the PR based on our discussion. Here's how things work now:

Created a ValuesWriterFactory interface that helps us create ValuesWriters.
Set up a default version of that interface with the refactored code from ParquetProperties to capture our current values writer instantiation logic. This is the writer in use by default when no override is configured.
Also added an interface to help pass in Hadoop config to the factory if needed (see the TestParquetOutputFormatFactoryOverrides unit tests for an example).
For more sophisticated logic, e.g. type column writer selection or per type writer selection overrides we can add config which the factory can then read to instantiate the appropriate values writers. Will follow up with a PR for that to achieve what I'd earlier implemented.

Here's how things can be set up:

parquet.writer.factory-override = "org.apache.parquet.hadoop.MyValuesWriterFactory"

This creates a factory, MyValuesWriterFactory which is then invoked for every ColumnDescriptor to get a ValueWriter.

hkothari · 2016-05-24T23:03:42Z

parquet-column/src/main/java/org/apache/parquet/column/ParquetProperties.java

@@ -97,21 +84,27 @@ public static WriterVersion fromString(String name) {
  private final int maxRowCountForPageSizeCheck;
  private final boolean estimateNextSizeCheck;
  private final ByteBufferAllocator allocator;
-
-  private final int initialSlabSize;
+  private final ValuesWriterFactory valuesWriterFactory;

  private ParquetProperties(WriterVersion writerVersion, int pageSize, int dictPageSize, boolean enableDict, int minRowCountForPageSizeCheck,


What's the story for configurable properties here? If one writes a custom ValuesWriterFactory, it seems totally reasonable that they would have other non-default settings that they would configure. I'm not really sure how this is supported in hadoop (if at all) but I would imagine it being something like fetching all settings under parquet.writer.writerProperties.* or something.

Not sure I understand your question completely but if you take a look at my last commit: 503958a, you can see that there's a way to configure ValuesWriterFactories. To do so, you write your special ValuesWriterFactory (similar to what I've done in the unit tests) and make it extend ConfigurableFactory. When you do so, you have the Hadoop Config passed in which you can read & use. I did mull doing something like reading everything under parquet.writer.writerProperties.* but felt this was a cleaner approach.

Ahh, yeah I totally missed ConfigurableFactory, that works perfectly.

ConfigurableFactory is now a factory that is Configurable, right?

piyushnarang · 2016-06-18T01:12:14Z

@isnotinvain - updated based on your comments. Do take a look when you get the time.

isnotinvain · 2016-06-30T21:32:12Z

parquet-column/src/main/java/org/apache/parquet/column/values/factory/ValuesWriterFactory.java

+ * Due to this, they must provide a default constructor.
+ * Lifecycle of ValuesWriterFactories is:
+ * 1) Created via reflection while creating a {@link org.apache.parquet.column.ParquetProperties}
+ * 2) If the factory is Configurable (needs Hadoop conf), that is set, initialize is also called. This is done


Maybe lets clarify here that if the factory implements Configurable, it's setConfig method will be called. Just so the reader understands that to opt-in to getting the config they must extends Configurable.

Sure, will do

isnotinvain · 2016-06-30T21:34:38Z

+1 for me, one minor comment about clarifying some docs, but LGTM

piyushnarang · 2016-06-30T21:58:31Z

@rdblue / @julienledem - do you guys have the time to take a look?

piyushnarang · 2016-06-30T22:00:19Z

Thanks for taking a detailed look @isnotinvain :-)

rdblue · 2016-07-27T18:05:00Z

parquet-column/src/main/java/org/apache/parquet/column/ParquetProperties.java

+    ValuesWriterFactoryParams params =
+      new ValuesWriterFactoryParams(writerVersion, initialSlabSize, pageSizeThreshold, allocator,
+        enableDictionary, dictionaryPageSizeThreshold);
+    valuesWriterFactory = writerFactory;


Nit: the convention is to use this.x when setting instance variable x.

rdblue · 2016-07-27T19:49:35Z

I made a few comments, but I have two bigger issues as well:

First, ParquetProperties has evolved over time so that it is currently doing the work of both new abstractions introduced in this PR, the write properties and the factory. I think it originally was intended to manage the properties, but convenience methods attached to it eventually made it into the factory it is today. I don't see a good reason to replace both of those functions for values writers.

What about adding the reader methods from ValuesWriterFactoryParams to ParquetProperties instead? That would keep those settings in one place. We could also use the existing builder, which avoids a big public constructor that will change over time. Then we would have a method that configures the ValuesWriterFactory with a ParquetProperties instance. I think in the long term, we want to change so that the ValuesWriterFactory is used primarily, rather than what we do today.

Second, ValuesWriter is not part of the public API. I don't think we want it to be considered public. We don't support anyone's custom ValuesWriter implementations, which would be made possible by this commit. In order to accomplish what you guys want, testing fallback rules and different strategies for writers, we do need to make this possible, but we shouldn't consider it part of the public API.

That means we shouldn't make the config setting public or expose it through any public API. This makes more sense in a SPI. Maybe we should start a module for that? It would be good to document extension points as SPI interfaces, like the ColumnReadStore, PageReadStore, and ValuesWriterFactory.

piyushnarang · 2016-07-27T21:58:08Z

@rdblue Thanks for taking a look. I can take a stab at refactoring things from the ValuesWriterFactoryParams to ParquetProperties. I also prefer the separation of ParquetProperties and the ValuesWriterFactory code. ParquetProperties was getting to hard to read and doing too many things at once.

I wasn't aware of us requiring that ValuesWriter are not to be part of the public API. I do however see value in us being able to configure strategies to help choose these writers - it helps users test out various encoding strategies manually and in the future we could also plug in sophisticated strategies that could pick column encodings in an automated fashion. We could set up an SPI module but we'd still not be able to configure which factory to use at runtime right? We could have some annotations that we could look up by reflection to see which ValuesWriterFactories there are and which one is the chosen one but that would need to be specified in code.

Shall try and think of any other potential options.

isnotinvain · 2016-07-27T22:27:59Z

We don't want users to be able to provide their own column formats (like inventing a new storage format), but I thought the point of this PR is to allow users to plug in encoding selection strategies, including things that maybe change their mind mid encoding using heuristics or something. I don't know whether that should be part of the public API or not, parquet doesn't actually distinguish between public and private API yet as far as I know (it would be great if it did).

I think the most important thing is that parquet developers be able to easily swap in / test different encoding strategies. It's probably fine if the only way to do that is to fork parquet as long as you don't have to mess with tons of layers of plumbing, so keeping an API like this private seems fine because it still makes experimenting w/ encodings easier for parquet developers.

rdblue · 2016-07-27T22:45:40Z

If you guys are happy keeping the API private, then I think that makes the most sense. Then there's no need for the reflection or extra options in the OutputFormat.

piyushnarang · 2016-07-28T00:45:43Z

Yeah I don't mind going with that. I'll yank out the code to configure the ValuesWriterFactory via Hadoop config + creation with reflection. If folks want to override + test out other strategies they can implement their own ValuesWriterFactory and update their code to use their ValuesWriterFactory. Still a manual step but like Alex pointed out it should be fairly small.

piyushnarang · 2016-07-30T01:43:20Z

@rdblue couple of questions on this:

Currently the Configurable interface is present to allow folks to pass Hadoop config to the ValuesWriterFactory. It's not needed for the DefaultValuesWriterFactory but I was thinking of leaving it in so that the hooks are in place to easily pass config while testing out ad-hoc ValuesWriterFactories.
I was looking at passing the ParquetProps to the ValuesWriterFactory and it seems a bit convoluted:
In ParquetOutputFormat.getRecordWriter we do:

ParquetProperties props = ParquetProperties.builder()
   .withPageSize(..)
   ...
   .withValuesWriterFactory(factory)
   .build();

Now if we want to in turn pass the ParquetProps to the factory we need to do:
valuesWriterFactory.initialize(props);. The flow seems a bit convoluted to me. We're passing a semi initialized factory to ParquetProps and then passing the ParquetProps in turn to the factory.
The ValuesWriterFactoryParams is really just a struct with a bunch of values from the ParquetProps that we need in the ValuesWriterFactory to construct ValuesWriters. Seems cleaner to pass that rather than the approach above. Thoughts?

rdblue · 2016-08-02T20:57:01Z

@piyushnarang, #1 sounds fine. For #2, you're just recreating ParquetProperties with a different name and making the existing one a useless wrapper. I get that you pass the factory in to the properties only for it to configure the factory. That's so that we can maintain backward-compatibility. In the future, we would remove the factory methods from ParquetProperties so that isn't needed. I think this is still better.

piyushnarang · 2016-08-03T01:51:16Z

@rdblue sounds good, we can tackle decoupling the factory methods from ParquetProperties in the future. I've put out an update that addresses your comments. Do take a look when you get the time. Thanks!

julienledem · 2016-08-03T21:04:34Z

parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetOutputFormat.java

+    if (factory instanceof Configurable) {
+      Configurable configurableFactory = (Configurable) factory;
+      configurableFactory.setConf(conf);
+    }


We should not set the hadoop configuration in a static member.
we could just create a new DefaultValuesWriterFactory every time instead.

possibly create method ParquetProperties.getValuesWriterFactory(Configuration conf)

Does this mean that configuration happens by subclassing ParquetOutputFormat ?

Hmm I don't think creating a ParquetProperties.getValuesWriterFactory(Configuration conf) is possible cause ParquetProperties is in parquet-column (which doesn't depend on hadoop so we don't have Config there) so we'll have to do this in ParquetOutputFormat.
Configuration happens by creating a ValuesWriterFactory that implements Configurable:

public class MyConfigurableValuesWriterFactory implements ValuesWriterFactory, Configurable { ... }

Now when we create a new ValuesWriterFactory in getValuesWriterFactory() via getRecordWriter(...) we pass the config object to the factory there.

Don't mind updating this to getValuesWriterFactory method be non-static.
Currently the DEFAULT_VALUES_WRITER_FACTORY is a static declared in ParquetProperties as Alex preferred that in one of the prior reviews but we can revisit that if you feel strongly.

since DefaultValuesWriterFactory does not implement Configurable maybe we just remove this if statement?

my question on configuration was how you decide which ValuesWriterFactory to use.

@julienledem Yeah so 1) that was one of the question I posed to Ryan above (if you see the PR notes). Copying it again here:

Currently the Configurable interface is present to allow folks to pass Hadoop config to the ValuesWriterFactory. It's not needed for the DefaultValuesWriterFactory but I was thinking of leaving it in so that the hooks are in place to easily pass config while testing out ad-hoc ValuesWriterFactories. Ryan felt that it would be OK to leave it in place.

The original approach was to configure the ValuesWriterFactory to use via Hadoop config. Something like:

parquet.writer.factory-override = "org.apache.parquet.hadoop.MyValuesWriterFactory"

In ParquetOutputFormat we were creating MyValuesWriterFactory by reflection and using that to create new ValuesWriters for various columns.
@rdblue wasn't keen on this as ValuesWriter is supposed to be a private class internal to Parquet so he didn't want us to be able to configure the ValuesWriterFactory. So we decided to yank the configuration part of it out and leave the basic plumbing in place. So right now if you wrote your own custom ValuesWriterFactory that you wanted to test out, you'd have to update your Parquet code base to use that ValuesWriterFactory (instead of the DefaultValuesWriterFactory) in ParquetProperties / ParquetOutputFormat. This is easier than what I was before (as then the values writer creation code was not decoupled from the ParquetProperties) but not as flexible as our PR proposal initially was (to be able to allow users to configure things).

thanks @piyushnarang for bringing me up to speed :)

We should remove that if statement since in the current PR it can never be true. It did make sense if you could configure a class name instantiated with newInstance() but right now it is just dead code.

If we add a configuration in the future, it should be expressed in terms of encodings so that only valid Parquet files can be written. Once you are happy with your current experiment I think it will become clearer what the configuration would look like.

Hmm ok, I can remove that piece of code. I'll add a comment in the ValuesWriterFactory interface to indicate that if someone wants to use Hadoop config in their factory they need to hook it up in ParquetOutputFormat it's not very obvious.

We've gone back and forth a bit on the desired approach.
My initial implementation expressed configuration in terms of encodings:

parquet.writer.encoding-override.<type> = "encoding1[,encoding2]" "parquet.writer.encoding-override.int32" = "plain"

@isnotinvain suggested the reflection based approach as it was more flexible - can do manual experimentation as well as potentially automated encoding selection(see notes above).
Either this approach or the reflection based override method works for us right now (I have a subclass of ValuesWriterFactory that reads the encoding for a given type from config in a fork).

Anyway, I think for now we can go ahead with what we have right now(not exposing any of these in config) after I remove the Configurable code. This will help us break up some of the coupling in ParquetProperties. We could discuss which of these approaches are more appealing to various groups of Parquet users and I'd be happy to add a PR. Let me know if this sounds reasonable.

@piyushnarang: merging the current approach sounds good to me.

type or column based configuration can be added in a follow up. I suspect that sometimes users might want to be able to force a specific encoding for a given column.

piyushnarang · 2016-08-05T01:51:29Z

@julienledem - update the PR to remove the Configurable hook.

isnotinvain · 2016-08-05T03:39:13Z

@rdblue and @julienledem there's been a lot of back and forth on this PR, are there any remaining issues? Thanks!

julienledem · 2016-08-05T17:33:06Z

+1

piyushnarang · 2016-08-05T17:38:40Z

Thanks @julienledem. I can spin up a different thread to discuss how we want to configure type / column based configuration. Any preferences on how / where? Email (parquet-dev@) / Github issue / jira?

julienledem · 2016-08-05T17:54:23Z

@piyushnarang: JIRA is good.(we don't use github issues)

julienledem · 2016-08-05T17:55:20Z

@piyushnarang @isnotinvain @rdblue good to go?

piyushnarang · 2016-08-05T18:00:11Z

@julienledem Ok, I'll spin up a jira and cc you, Alex & Ryan.
Let me try out one quick test of this build on Hadoop to confirm. I'll get back in the next couple of hours.

isnotinvain · 2016-08-05T18:24:43Z

+1

piyushnarang · 2016-08-05T20:11:42Z

@julienledem Tested this out and I think it is good to go from my end.
I've already got a jira which captures the two approaches discussed on this PR for specifying overrides (pinged you folks on it) - https://issues.apache.org/jira/browse/PARQUET-601. Do chime in when you get a chance.

isnotinvain · 2016-08-05T20:19:17Z

I will merge this, but I'd like a +1 / +0 / -0 from @rdblue first

piyushnarang · 2016-08-09T18:40:41Z

Ping @rdblue - can you take a look?

rdblue · 2016-08-09T18:43:36Z

Will do, sorry I missed it when Alex pinged me earlier

rdblue · 2016-08-10T02:54:47Z

+1 Thanks, @piyushnarang!

HansBrende · 2018-02-25T20:24:03Z

@piyushnarang @isnotinvain @rdblue @hkothari
Hi. First of all, thank you for making this commit! It's very helpful for my use case: which is that I want to turn off dictionary encoding for individual columns in which I know values will not be repeated (or are seldom repeated), using more efficient encodings instead of the plain fallback encoding, while leaving dictionary encoding in place for columns where I know values will often be repeated.

However, it's rather hard to configure this correctly in the hadoop ParquetOutputFormat class, as the ParquetProperties used to configure the ParquetRecordWriter are never directly accessible to me, so I cannot modify the ValuesWriterFactory.

Instead, I have to copy and paste the entire ParquetOutputFormat class into my own custom class in order to modify the ValuesWriterFactory used to create the ParquetRecordWriter. But oh, the ParquetRecordWriter constructor has package-private access, so I have to use reflection to create the ParquetRecordWriter instance.

So... would it be possible to slightly modify the ParquetOutputFormat class so that I can less painfully specify a different ValuesWriterFactory? Ideally, I'd like to be able to just subclass ParquetOutputFormat and override a method or two.

Thoughts? Should I turn this into a new JIRA issue? This seems to be related to an existing one as well: https://issues.apache.org/jira/projects/PARQUET/issues/PARQUET-796.

HansBrende · 2018-02-25T20:35:55Z

(If PARQUET-796 is resolved, then that would also fix my use case, as then I wouldn't have to specify a different ValuesWriterFactory in the first place! I really like @rdblue 's suggestion to use dictionary encoding only if dictionary encoding is enabled AND the original type == OriginalType.ENUM. But I'd recommend taking it even one step further by ditching the parquet.enable.dictionary setting altogether and using dictionary encoding IF AND ONLY IF original type == OriginalType.ENUM (although in my use case, the often-repeated values won't be java enums, but urls). Also, when and how dictionary encoding is used should really be specified in the documentation--it seemed rather vague on that point.)

piyushnarang force-pushed the master branch from f097b0c to 1fa93b1 Compare May 18, 2016 02:38

hkothari reviewed May 24, 2016
View reviewed changes

isnotinvain reviewed Jun 30, 2016
View reviewed changes

piyushnarang force-pushed the master branch from 6682ac8 to 2020681 Compare June 30, 2016 21:57

Piyush Narang added 4 commits July 18, 2016 14:44

Pull out value writer creation to ValuesWriterFactory and add unit tests

ff4c90d

Refactor code in ValuesWriterFactory a bit

b9d6c13

Add encoding-overrides config to ParquetOutputFormat config

5c636c7

Add guava test dep

9ead61d

rdblue reviewed Jul 27, 2016
View reviewed changes

Merge branch 'master' into piyush/dynamic-encoding-overrides

1da6ca3

Address Ryan's feedback

0b78e04

julienledem reviewed Aug 3, 2016
View reviewed changes

Piyush Narang added 2 commits August 4, 2016 11:39

Switch to getValuesWriterFactory call to non-static

149bb98

Remove Configurable

3ebab28

asfgit closed this in 30aa910 Aug 11, 2016

PARQUET-601: Add support to configure the encoding used by ValueWriters #342

PARQUET-601: Add support to configure the encoding used by ValueWriters #342

Conversation

piyushnarang commented May 5, 2016

Context:

PR notes:

piyushnarang commented May 9, 2016

isnotinvain commented May 10, 2016

hkothari commented May 11, 2016

isnotinvain commented May 11, 2016

hkothari commented May 11, 2016

piyushnarang commented May 11, 2016

isnotinvain commented May 11, 2016

piyushnarang commented May 18, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

piyushnarang commented Jun 18, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

isnotinvain commented Jun 30, 2016

piyushnarang commented Jun 30, 2016

piyushnarang commented Jun 30, 2016

Choose a reason for hiding this comment

rdblue commented Jul 27, 2016

piyushnarang commented Jul 27, 2016

isnotinvain commented Jul 27, 2016

rdblue commented Jul 27, 2016

piyushnarang commented Jul 28, 2016

piyushnarang commented Jul 30, 2016

rdblue commented Aug 2, 2016

piyushnarang commented Aug 3, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

piyushnarang commented Aug 5, 2016

isnotinvain commented Aug 5, 2016

julienledem commented Aug 5, 2016

piyushnarang commented Aug 5, 2016

julienledem commented Aug 5, 2016

julienledem commented Aug 5, 2016

piyushnarang commented Aug 5, 2016

isnotinvain commented Aug 5, 2016

piyushnarang commented Aug 5, 2016

isnotinvain commented Aug 5, 2016

piyushnarang commented Aug 9, 2016

rdblue commented Aug 9, 2016

rdblue commented Aug 10, 2016 • edited Loading

HansBrende commented Feb 25, 2018

HansBrende commented Feb 25, 2018 • edited Loading

rdblue commented Aug 10, 2016 •

edited

Loading

HansBrende commented Feb 25, 2018 •

edited

Loading