[SPARK-20849][DOC][SPARKR] Document R DecisionTree by zhengruifeng · Pull Request #18067 · apache/spark

zhengruifeng · 2017-05-23T05:47:46Z

What changes were proposed in this pull request?

1, add an example for sparkr decisionTree
2, document it in user guide

How was this patch tested?

local submit

felixcheung · 2017-05-23T06:09:19Z

I'd say try to use a data set without . in column name if you can.
Probably would be confusion when examples are causing warnings when users run them

actually, I think there's a confusion - I don't mean to change not to use . in formula
I mean the reason why we have warning=FALSE here is because createDataFrame(longley) will cause a warning because it has column with name with . in it. And we should avoid that if we can

SparkQA · 2017-05-23T06:30:37Z

Test build #77226 has finished for PR 18067 at commit f43ebe0.

This patch fails SparkR unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-05-23T06:42:38Z

Test build #77230 has started for PR 18067 at commit 65cf494.

zhengruifeng · 2017-05-23T07:07:17Z

Jenkins, retests this please

zhengruifeng · 2017-05-23T08:02:46Z

Jenkins, retest this please

SparkQA · 2017-05-23T08:44:20Z

Test build #77239 has finished for PR 18067 at commit 65cf494.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

zhengruifeng · 2017-05-23T08:46:32Z

@felixcheung Updated. By the way, I update other formulas in sparkr-vignettes.Rmd.

felixcheung

ok, then could you check if we could remove
, warning=FALSE?

felixcheung · 2017-05-24T17:28:47Z

actually, I think there's a confusion - I don't mean to change not to use . in formula
I mean the reason why we have warning=FALSE here is because createDataFrame(longley) will cause a warning because it has column with name with . in it. And we should avoid that if we can

SparkQA · 2017-05-25T02:31:02Z

Test build #77325 has finished for PR 18067 at commit f413081.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-05-25T02:35:42Z

Test build #77324 has finished for PR 18067 at commit 76a9726.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

felixcheung · 2017-05-25T03:27:00Z

as commented, before, please check. I'm pretty sure createDataFrame(longley) will cause a warning

longley GNP.deflator GNP Unemployed Armed.Forces Population Year Employed 1947 83.0 234.289 235.6 159.0 107.608 1947 60.323 1948 88.5 259.426 232.5 145.6 108.632 1948 61.122

so our options are:

don't use longley (my earlier suggestion)

use longley but keep warning=FALSE

option 2: do you mean using ````{r, warning=FALSE}` like other examples?
I think both are OK,.
which do you prefer?

yes - but as mentioned, if you can think of a data set that doesn't have dot in column name, like as.data.frame(Titanic)

SparkQA · 2017-05-25T08:30:26Z

Test build #77349 has finished for PR 18067 at commit 48a9686.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-05-25T10:49:37Z

Test build #77351 has finished for PR 18067 at commit 1a97e42.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

felixcheung · 2017-05-25T17:05:13Z

why change to classification for trees?

zhengruifeng · 2017-05-26T01:05:57Z

@felixcheung just because dataset Titanic is often used to illustrate classification. Like the usage in Kaggle contest[https://www.kaggle.com/c/titanic#evaluation].

felixcheung · 2017-05-26T06:01:03Z

merged to master

felixcheung reviewed May 23, 2017

View reviewed changes

felixcheung reviewed May 24, 2017

View reviewed changes

felixcheung requested changes May 24, 2017

View reviewed changes

zhengruifeng force-pushed the dt_example branch from 65cf494 to 76a9726 Compare May 25, 2017 01:46

felixcheung reviewed May 25, 2017

View reviewed changes

zhengruifeng added 7 commits May 25, 2017 15:17

create pr

d13ef8d

update vignettes

c988ec4

update sparkr.md

1003b49

fix bug & update r formula

377588e

update test with warning

6c07cff

revert one nit

40cc6be

use Titanic instead of longley

48a9686

zhengruifeng force-pushed the dt_example branch from f413081 to 48a9686 Compare May 25, 2017 07:38

use cla instead of reg

1a97e42

felixcheung approved these changes May 26, 2017

View reviewed changes

asfgit closed this in a97c497 May 26, 2017

zhengruifeng deleted the dt_example branch May 26, 2017 06:28

zhengruifeng mentioned this pull request Dec 13, 2017

[SPARK-20849][DOC][FOLLOWUP] Document R DecisionTree - Link Classification Example #19963

Closed

Conversation

zhengruifeng commented May 23, 2017

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

felixcheung May 23, 2017

Choose a reason for hiding this comment

Uh oh!

felixcheung May 24, 2017

Choose a reason for hiding this comment

Uh oh!

SparkQA commented May 23, 2017

Uh oh!

SparkQA commented May 23, 2017

Uh oh!

zhengruifeng commented May 23, 2017

Uh oh!

zhengruifeng commented May 23, 2017

Uh oh!

SparkQA commented May 23, 2017

Uh oh!

zhengruifeng commented May 23, 2017

Uh oh!

felixcheung left a comment

Choose a reason for hiding this comment

Uh oh!

felixcheung May 24, 2017

Choose a reason for hiding this comment

Uh oh!

SparkQA commented May 25, 2017

Uh oh!

SparkQA commented May 25, 2017

Uh oh!

felixcheung May 25, 2017

Choose a reason for hiding this comment

Uh oh!

zhengruifeng May 25, 2017

Choose a reason for hiding this comment

Uh oh!

felixcheung May 25, 2017

Choose a reason for hiding this comment

Uh oh!

SparkQA commented May 25, 2017

Uh oh!

SparkQA commented May 25, 2017

Uh oh!

felixcheung commented May 25, 2017

Uh oh!

zhengruifeng commented May 26, 2017

Uh oh!

felixcheung commented May 26, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants