[SPARK-13553][SPARK-13554][SQL] Migrates basic inspection and typed relational operations from DataFrame to Dataset#11431
Conversation
There was a problem hiding this comment.
This change is included in PR #11421. Without this fix, we can't use case classes defined in SQLTestData as Dataset element type.
f38c016 to
ee9c432
Compare
|
Test build #52187 has finished for PR 11431 at commit
|
|
Test build #52188 has finished for PR 11431 at commit
|
|
Can you rebase to get rid of the merged commit? |
…lational Conflicts: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala
|
@liancheng I just pushed two commits to resolve the conflicts. |
|
Test build #52191 has finished for PR 11431 at commit
|
| val sum = weights.sum | ||
| val normalizedCumWeights = weights.map(_ / sum).scanLeft(0.0d)(_ + _) | ||
| normalizedCumWeights.sliding(2).map { x => | ||
| new Dataset(sqlContext, Sample(x(0), x(1), withReplacement = false, seed, sorted)()) |
There was a problem hiding this comment.
Do we need to pass encoder into newly created Datasets at here?
There was a problem hiding this comment.
No, there's an implicit encoder defined in the constructor of Dataset.
|
|
||
| /** | ||
| * Returns all column names and their data types as an array. | ||
| * @since 2.0.0 |
There was a problem hiding this comment.
If the API is moved from DataFrame, should we also copy the @since? cc @rxin
There was a problem hiding this comment.
I'd put 2.0 since it didn't exist on dataset before.
There was a problem hiding this comment.
Since DataFrame will be an alias of Dataset, what will the doc for DataFrame looks like?
There was a problem hiding this comment.
If we copy the versions, It's also weird that see a method of Dataset (1.3) is introduced before Dataset is introduced (1.6).
|
Closing this in favor of #11443. |
What changes were proposed in this pull request?
This PR migrates basic inspection and typed relational operations from DataFrame to Dataset. This is the first step of unifying DataFrame and Dataset API.
TODO
explodeoperations.How was this patch tested?
Corresponding test cases are migrated from
DataFrameSuitetoDatasetSuite. These newly added test cases all share the same "df-to-ds" prefix so that we can easily execute them under SBT using:This prefix will be removed after migrating all the DataFrame operations.