Skip to content

[SPARK-20961][SQL] generalize the dictionary in ColumnVector#18183

Closed
cloud-fan wants to merge 3 commits intoapache:masterfrom
cloud-fan:dictionary
Closed

[SPARK-20961][SQL] generalize the dictionary in ColumnVector#18183
cloud-fan wants to merge 3 commits intoapache:masterfrom
cloud-fan:dictionary

Conversation

@cloud-fan
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

As the first step of https://issues.apache.org/jira/browse/SPARK-20960 , to make ColumnVector public, this PR generalize ColumnVector.dictionary to not couple with parquet.

How was this patch tested?

existing tests

@cloud-fan
Copy link
Copy Markdown
Contributor Author

@SparkQA
Copy link
Copy Markdown

SparkQA commented Jun 2, 2017

Test build #77667 has finished for PR 18183 at commit b9c1a06.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • public final class ParquetDictionary implements Dictionary

* limitations under the License.
*/

package org.apache.spark.sql.execution.vectorized;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this move to org.apache.spark.sql.execution.vectorized.parquet package?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1


package org.apache.spark.sql.execution.vectorized;

public interface Dictionary {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need some JavaDoc since it is public?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for JavaDoc.
@cloud-fan , do we need @InterfaceStability.Evolving annotation, too?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no it's not public yet, as it's under execution package. I'll add java doc

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see.

package org.apache.spark.sql.execution.vectorized;

public final class ParquetDictionary implements Dictionary {
private org.apache.parquet.column.Dictionary dictionary;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it better to declare import org.apache.parquet.column.Dictionary?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the name is conflict

@SparkQA
Copy link
Copy Markdown

SparkQA commented Jun 2, 2017

Test build #77679 has finished for PR 18183 at commit cc7e782.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


/**
* The general interface for dictionary in `ColumnVector`, defines how to decode a dictionary id to
* actual value.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about?

The interface for dictionary in ColumnVector to decode dictionary based encodings

@gatorsmile
Copy link
Copy Markdown
Member

LGTM except a minor comment

@SparkQA
Copy link
Copy Markdown

SparkQA commented Jun 4, 2017

Test build #77722 has finished for PR 18183 at commit 88a0b24.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Copy Markdown
Member

Thanks! Merging to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants