Classification

Identifying to which of a set of categories a new observation belongs, on the basis of a training set of data.

http://spark.apache.org/docs/latest/mllib-classification-regression.html

Logistic regression

Type of probabilistic statistical classification model.[2] It is also used to predict a binary response from a binary predictor, used for predicting the outcome of a categorical dependent variable (i.e., a class label) based on one or more predictor variables (features). That is, it is used in estimating the parameters of a qualitative response model.

method: LogisticRegressionWithSGD, LogisticRegressionWithLBFGS
model: LogisticRegressionModel
ruby: classification/logistic_regression.rb

data = [
  LabeledPoint.new(0.0, [0.0, 1.0]),
  LabeledPoint.new(1.0, [1.0, 0.0]),
]
lrm = LogisticRegressionWithSGD.train($sc.parallelize(data))

lrm.predict([1.0, 0.0])
# => 1
lrm.predict([0.0, 1.0])
# => 0

lrm.clear_threshold
lrm.predict([0.0, 1.0])
# => 0.123...

Support Vector Machine

Supervised learning models are associated with learning algorithms that analyze data and recognize patterns, used for classification and regression analysis. Given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that assigns new examples into one category or the other, making it a non-probabilistic binary linear classifier

method: SVMWithSGD
model: SVMModel
ruby: classification/svm.rb

data = [
  LabeledPoint.new(0.0, [0.0]),
  LabeledPoint.new(1.0, [1.0]),
  LabeledPoint.new(1.0, [2.0]),
  LabeledPoint.new(1.0, [3.0])
]
svm = SVMWithSGD.train($sc.parallelize(data))

svm.predict([1.0])
# => 1
svm.clear_threshold
svm.predict([1.0])
# => 1.25...

NaiveBayes

Naive Bayes classifiers are a family of simple probabilistic classifiers based on applying Bayes' theorem with strong (naive) independence assumptions between the features.

method: NaiveBayes
model: NaiveBayesModel
ruby: classification/naive_bayes.rb

data = [
  LabeledPoint.new(0.0, [0.0, 0.0]),
  LabeledPoint.new(0.0, [0.0, 1.0]),
  LabeledPoint.new(1.0, [1.0, 0.0])
]
model = NaiveBayes.train($sc.parallelize(data))

model.predict([0.0, 1.0])
# => 0.0
model.predict([1.0, 0.0])
# => 1.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Classification

Logistic regression

Support Vector Machine

NaiveBayes

Clone this wiki locally