Skip to content

[SPARK-20849][DOC][SPARKR] Document R DecisionTree#18067

Closed
zhengruifeng wants to merge 8 commits intoapache:masterfrom
zhengruifeng:dt_example
Closed

[SPARK-20849][DOC][SPARKR] Document R DecisionTree#18067
zhengruifeng wants to merge 8 commits intoapache:masterfrom
zhengruifeng:dt_example

Conversation

@zhengruifeng
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

1, add an example for sparkr decisionTree
2, document it in user guide

How was this patch tested?

local submit

Comment thread R/pkg/vignettes/sparkr-vignettes.Rmd Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say try to use a data set without . in column name if you can.
Probably would be confusion when examples are causing warnings when users run them

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, I think there's a confusion - I don't mean to change not to use . in formula
I mean the reason why we have warning=FALSE here is because createDataFrame(longley) will cause a warning because it has column with name with . in it. And we should avoid that if we can

@SparkQA
Copy link
Copy Markdown

SparkQA commented May 23, 2017

Test build #77226 has finished for PR 18067 at commit f43ebe0.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link
Copy Markdown

SparkQA commented May 23, 2017

Test build #77230 has started for PR 18067 at commit 65cf494.

@zhengruifeng
Copy link
Copy Markdown
Contributor Author

Jenkins, retests this please

@zhengruifeng
Copy link
Copy Markdown
Contributor Author

Jenkins, retest this please

@SparkQA
Copy link
Copy Markdown

SparkQA commented May 23, 2017

Test build #77239 has finished for PR 18067 at commit 65cf494.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zhengruifeng
Copy link
Copy Markdown
Contributor Author

@felixcheung Updated. By the way, I update other formulas in sparkr-vignettes.Rmd.

Copy link
Copy Markdown
Member

@felixcheung felixcheung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, then could you check if we could remove
, warning=FALSE?

Comment thread R/pkg/vignettes/sparkr-vignettes.Rmd Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, I think there's a confusion - I don't mean to change not to use . in formula
I mean the reason why we have warning=FALSE here is because createDataFrame(longley) will cause a warning because it has column with name with . in it. And we should avoid that if we can

@SparkQA
Copy link
Copy Markdown

SparkQA commented May 25, 2017

Test build #77325 has finished for PR 18067 at commit f413081.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link
Copy Markdown

SparkQA commented May 25, 2017

Test build #77324 has finished for PR 18067 at commit 76a9726.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Comment thread R/pkg/vignettes/sparkr-vignettes.Rmd Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as commented, before, please check. I'm pretty sure createDataFrame(longley) will cause a warning

longley
     GNP.deflator     GNP Unemployed Armed.Forces Population Year Employed
1947         83.0 234.289      235.6        159.0    107.608 1947   60.323
1948         88.5 259.426      232.5        145.6    108.632 1948   61.122

so our options are:

  • don't use longley (my earlier suggestion)
  • use longley but keep warning=FALSE

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

option 2: do you mean using ````{r, warning=FALSE}` like other examples?
I think both are OK,.
which do you prefer?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes - but as mentioned, if you can think of a data set that doesn't have dot in column name, like as.data.frame(Titanic)

@SparkQA
Copy link
Copy Markdown

SparkQA commented May 25, 2017

Test build #77349 has finished for PR 18067 at commit 48a9686.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link
Copy Markdown

SparkQA commented May 25, 2017

Test build #77351 has finished for PR 18067 at commit 1a97e42.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@felixcheung
Copy link
Copy Markdown
Member

why change to classification for trees?

@zhengruifeng
Copy link
Copy Markdown
Contributor Author

@felixcheung just because dataset Titanic is often used to illustrate classification. Like the usage in Kaggle contest[https://www.kaggle.com/c/titanic#evaluation].

@felixcheung
Copy link
Copy Markdown
Member

merged to master

@asfgit asfgit closed this in a97c497 May 26, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants