[SPARK-20961][SQL] generalize the dictionary in ColumnVector#18183
[SPARK-20961][SQL] generalize the dictionary in ColumnVector#18183cloud-fan wants to merge 3 commits intoapache:masterfrom
Conversation
|
Test build #77667 has finished for PR 18183 at commit
|
| * limitations under the License. | ||
| */ | ||
|
|
||
| package org.apache.spark.sql.execution.vectorized; |
There was a problem hiding this comment.
Should this move to org.apache.spark.sql.execution.vectorized.parquet package?
|
|
||
| package org.apache.spark.sql.execution.vectorized; | ||
|
|
||
| public interface Dictionary { |
There was a problem hiding this comment.
Do we need some JavaDoc since it is public?
There was a problem hiding this comment.
+1 for JavaDoc.
@cloud-fan , do we need @InterfaceStability.Evolving annotation, too?
There was a problem hiding this comment.
no it's not public yet, as it's under execution package. I'll add java doc
| package org.apache.spark.sql.execution.vectorized; | ||
|
|
||
| public final class ParquetDictionary implements Dictionary { | ||
| private org.apache.parquet.column.Dictionary dictionary; |
There was a problem hiding this comment.
Is it better to declare import org.apache.parquet.column.Dictionary?
There was a problem hiding this comment.
the name is conflict
|
Test build #77679 has finished for PR 18183 at commit
|
|
|
||
| /** | ||
| * The general interface for dictionary in `ColumnVector`, defines how to decode a dictionary id to | ||
| * actual value. |
There was a problem hiding this comment.
How about?
The interface for dictionary in
ColumnVectorto decode dictionary based encodings
|
LGTM except a minor comment |
|
Test build #77722 has finished for PR 18183 at commit
|
|
Thanks! Merging to master. |
What changes were proposed in this pull request?
As the first step of https://issues.apache.org/jira/browse/SPARK-20960 , to make
ColumnVectorpublic, this PR generalizeColumnVector.dictionaryto not couple with parquet.How was this patch tested?
existing tests