[SPARK-11612] [ML] Pipeline and PipelineModel persistence #9674

jkbradley · 2015-11-12T20:11:16Z

Pipeline and PipelineModel extend Readable and Writable. Persistence succeeds only when all stages are Writable.

Note: This PR reinstates tests for other read/write functionality. It should probably not get merged until [https://issues.apache.org/jira/browse/SPARK-11672] gets fixed.

CC: @mengxr

SparkQA · 2015-11-12T21:14:00Z

Test build #45764 has finished for PR 9674 at commit 3700091.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):\n * class Pipeline(override val uid: String) extends Estimator[PipelineModel] with Writable\n

mengxr · 2015-11-13T21:39:04Z

test this please

SparkQA · 2015-11-13T22:28:41Z

Test build #45897 has finished for PR 9674 at commit caf57c2.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):\n * class Pipeline(override val uid: String) extends Estimator[PipelineModel] with Writable\n

mengxr · 2015-11-16T20:05:46Z

test this please

SparkQA · 2015-11-16T21:03:59Z

Test build #46012 has finished for PR 9674 at commit caf57c2.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):\n * class Pipeline(override val uid: String) extends Estimator[PipelineModel] with Writable\n

mengxr · 2015-11-16T21:12:57Z

mllib/src/main/scala/org/apache/spark/ml/Pipeline.scala

Should users be able to save an incomplete pipeline? For example, I could make a template pipeline, send it to other users, and they only need to fill in some required params like inputCol after they load it back.

mengxr · 2015-11-16T21:40:41Z

One suggestion is to merge PipelineShardWriter and PipelineSharedReader into a single object under object Pipeline, e.g., called SharedReadWrite. Then move PipelineReader, PipelineWriter to object Pipeline, and PipelineModelReader and PipelineModelWriter to object PipelineModel. The main purpose is to not pollute the package space in Java. Otherwise, they are all visible under org.apache.spark.ml in Java.

…ipelineModel, to clean up namespace

jkbradley · 2015-11-16T23:08:36Z

@mengxr Thanks for reviewing! I believe I addressed everything, except where I quibbled in responses above.

mengxr · 2015-11-16T23:18:55Z

LGTM pending Jenkins.

SparkQA · 2015-11-17T00:58:19Z

Test build #46026 has finished for PR 9674 at commit f791010.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):\n * class Pipeline(override val uid: String) extends Estimator[PipelineModel] with Writable\n

jkbradley · 2015-11-17T01:12:14Z

@mengxr Thank you for reviewing! Merging with master and branch-1.6

Pipeline and PipelineModel extend Readable and Writable. Persistence succeeds only when all stages are Writable. Note: This PR reinstates tests for other read/write functionality. It should probably not get merged until [https://issues.apache.org/jira/browse/SPARK-11672] gets fixed. CC: mengxr Author: Joseph K. Bradley <[email protected]> Closes #9674 from jkbradley/pipeline-io. (cherry picked from commit 1c5475f) Signed-off-by: Joseph K. Bradley <[email protected]>

jkbradley added 5 commits November 13, 2015 11:16

added save/load to logreg in spark.ml

3b92783

fixed read, write for logreg

a367e74

added Pipeline save, load but not PipelineModel

38d262c

added PipelineModel save/load

5d13393

reorder for Pipeline.scala classes

caf57c2

jkbradley force-pushed the pipeline-io branch from 3700091 to caf57c2 Compare November 13, 2015 19:17

mengxr reviewed Nov 16, 2015
View reviewed changes

jkbradley added 2 commits November 16, 2015 14:59

Cleanups per code review, including adding stage index to stage paths

1d1d31c

refactored Pipeline reader and writer classes to be under Pipeline, P…

f791010

…ipelineModel, to clean up namespace

asfgit closed this in 1c5475f Nov 17, 2015

jkbradley deleted the pipeline-io branch November 17, 2015 01:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-11612] [ML] Pipeline and PipelineModel persistence #9674

[SPARK-11612] [ML] Pipeline and PipelineModel persistence #9674

Uh oh!

jkbradley commented Nov 12, 2015

Uh oh!

SparkQA commented Nov 12, 2015

Uh oh!

mengxr commented Nov 13, 2015

Uh oh!

SparkQA commented Nov 13, 2015

Uh oh!

mengxr commented Nov 16, 2015

Uh oh!

SparkQA commented Nov 16, 2015

Uh oh!

mengxr Nov 16, 2015

Uh oh!

mengxr commented Nov 16, 2015

Uh oh!

jkbradley commented Nov 16, 2015

Uh oh!

mengxr commented Nov 16, 2015

Uh oh!

SparkQA commented Nov 17, 2015

Uh oh!

jkbradley commented Nov 17, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-11612] [ML] Pipeline and PipelineModel persistence #9674

[SPARK-11612] [ML] Pipeline and PipelineModel persistence #9674

Uh oh!

Conversation

jkbradley commented Nov 12, 2015

Uh oh!

SparkQA commented Nov 12, 2015

Uh oh!

mengxr commented Nov 13, 2015

Uh oh!

SparkQA commented Nov 13, 2015

Uh oh!

mengxr commented Nov 16, 2015

Uh oh!

SparkQA commented Nov 16, 2015

Uh oh!

mengxr Nov 16, 2015

Choose a reason for hiding this comment

Uh oh!

mengxr commented Nov 16, 2015

Uh oh!

jkbradley commented Nov 16, 2015

Uh oh!

mengxr commented Nov 16, 2015

Uh oh!

SparkQA commented Nov 17, 2015

Uh oh!

jkbradley commented Nov 17, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants