Skip to content

Conversation

@zhengruifeng
Copy link
Contributor

What changes were proposed in this pull request?

support decision tree in R

How was this patch tested?

added tests

@SparkQA
Copy link

SparkQA commented May 15, 2017

Test build #76926 has finished for PR 17981 at commit 68041a0.

  • This patch fails R style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 15, 2017

Test build #76927 has finished for PR 17981 at commit 7e383a2.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zhengruifeng
Copy link
Contributor Author

Jenkins, please retest this please

@SparkQA
Copy link

SparkQA commented May 15, 2017

Test build #76930 has finished for PR 17981 at commit 7e383a2.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 15, 2017

Test build #76938 has finished for PR 17981 at commit 0ba00db.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zhengruifeng
Copy link
Contributor Author

@felixcheung I send this PR following your implementation of RandomForest. Could you help reviewing this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2.3.0 please

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and all other instances...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please keep this sorted

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep this sorted

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove double empty line

@zhengruifeng
Copy link
Contributor Author

@felixcheung Updated. Thanks for your reviewing!

@SparkQA
Copy link

SparkQA commented May 16, 2017

Test build #76966 has finished for PR 17981 at commit b4a68fa.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

#' @param cacheNodeIds If FALSE, the algorithm will pass trees to executors to match instances with
#' nodes. If TRUE, the algorithm will cache node IDs for each instance. Caching
#' can speed up training of deeper trees. Users can set how often should the
#' cache be checkpointed or disable it by setting checkpointInterval.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kind of confusing

Users can set how often should the cache be checkpointed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wording can be improved a bit I guess but this matches the Scaladoc...

Copy link
Member

@felixcheung felixcheung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

function(data, formula, type = c("regression", "classification"),
maxDepth = 5, maxBins = 32, impurity = NULL, seed = NULL,
minInstancesPerNode = 1, minInfoGain = 0.0, checkpointInterval = 10,
maxMemoryInMB = 256, cacheNodeIds = FALSE) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider adding thresholds parameter - possibly as a follow up PR.

@felixcheung
Copy link
Member

any more comment?

@felixcheung
Copy link
Member

merged to master. thanks!

@asfgit asfgit closed this in 4be3375 May 22, 2017
@zhengruifeng zhengruifeng deleted the dt_r branch May 23, 2017 01:44
lycplus pushed a commit to lycplus/spark that referenced this pull request May 24, 2017
## What changes were proposed in this pull request?
support decision tree in R

## How was this patch tested?
added tests

Author: Zheng RuiFeng <[email protected]>

Closes apache#17981 from zhengruifeng/dt_r.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants