[SPARK-16431] [ML] Add a unified method that accepts single instances to feature transformers and predictors#14101
Closed
hazimehh wants to merge 3 commits intoapache:masterfrom
Closed
[SPARK-16431] [ML] Add a unified method that accepts single instances to feature transformers and predictors#14101hazimehh wants to merge 3 commits intoapache:masterfrom
hazimehh wants to merge 3 commits intoapache:masterfrom
Conversation
|
Can one of the admins verify this patch? |
Author
|
@rxin @jkbradley @mengxr can you review this? |
Contributor
|
I don't know ML that well. |
Member
|
I just responded on the main JIRA. Can you please check that out and close this issue for now? Thanks! |
vanzin
pushed a commit
to vanzin/spark
that referenced
this pull request
Aug 4, 2016
Closing the following PRs due to requests or unresponsive users. Closes apache#13923 Closes apache#14462 Closes apache#13123 Closes apache#14423 (requested by srowen) Closes apache#14424 (requested by srowen) Closes apache#14101 (requested by jkbradley) Closes apache#10676 (requested by srowen) Closes apache#10943 (requested by yhuai) Closes apache#9936 Closes apache#10701
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add a new transformation method
transformInstancethat operates on single instances. This method can reduce the latency of predictions by 200x for typical ML tasks which facilitates serving models in production. See the JIRA ticket for details.What changes were proposed in this pull request?
Current feature transformers in spark.ml can only operate on DataFrames and don't have a method that accepts single instances. A typical transformer has a User-Defined Function (udf) in its
transformmethod which includes a set of operations on the features of a single instance:Adding a new method called
transformInstancethat operates directly on single instances and using it in the udf instead can be useful:Predictors also don't have a public method that does predictions on single instances.
transformInstancecan be easily added to predictors by acting as a wrapper for the internal method predict (which takes features as input).The proposed method in this change is added to all predictors and feature transformers except OnehotEncoder, VectorSlicer, and Word2Vec, which might require bigger changes due to dependencies on the dataset's schema (they can be fixed using simple hacks but this needs to be discussed)
How was this patch tested?
The current tests for transformers and predictors, which invoke
transformInstanceinternally, passed.