Skip to content
Ondřej Moravčík edited this page Mar 25, 2015 · 4 revisions

Identifying to which of a set of categories a new observation belongs, on the basis of a training set of data.

Logistic regression

Type of probabilistic statistical classification model.[2] It is also used to predict a binary response from a binary predictor, used for predicting the outcome of a categorical dependent variable (i.e., a class label) based on one or more predictor variables (features). That is, it is used in estimating the parameters of a qualitative response model.

data = [
  LabeledPoint.new(0.0, [0.0, 1.0]),
  LabeledPoint.new(1.0, [1.0, 0.0]),
]
lrm = LogisticRegressionWithSGD.train($sc.parallelize(data))

lrm.predict([1.0, 0.0])
# => 1
lrm.predict([0.0, 1.0])
# => 0

lrm.clear_threshold
lrm.predict([0.0, 1.0])
# => 0.123...

Support Vector Machine

Supervised learning models are associated with learning algorithms that analyze data and recognize patterns, used for classification and regression analysis. Given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that assigns new examples into one category or the other, making it a non-probabilistic binary linear classifier

data = [
  LabeledPoint.new(0.0, [0.0]),
  LabeledPoint.new(1.0, [1.0]),
  LabeledPoint.new(1.0, [2.0]),
  LabeledPoint.new(1.0, [3.0])
]
svm = SVMWithSGD.train($sc.parallelize(data))

svm.predict([1.0])
# => 1
svm.clear_threshold
svm.predict([1.0])
# => 1.25...

NaiveBayes

Naive Bayes classifiers are a family of simple probabilistic classifiers based on applying Bayes' theorem with strong (naive) independence assumptions between the features.

data = [
  LabeledPoint.new(0.0, [0.0, 0.0]),
  LabeledPoint.new(0.0, [0.0, 1.0]),
  LabeledPoint.new(1.0, [1.0, 0.0])
]
model = NaiveBayes.train($sc.parallelize(data))

model.predict([0.0, 1.0])
# => 0.0
model.predict([1.0, 0.0])
# => 1.0
Clone this wiki locally