Skip to content
Closed
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
3bf2718
add trait offset
actuaryzhang Jan 24, 2017
0e240eb
add offset setter
actuaryzhang Jan 24, 2017
9c41453
implement offset in GLM
actuaryzhang Jan 25, 2017
7823f8a
add test for glm with offset
actuaryzhang Jan 25, 2017
a1f5695
minor cleanup
actuaryzhang Jan 25, 2017
d071b95
add doc for GLRInstance
actuaryzhang Jan 25, 2017
d2afcb0
remove offset from shared param
actuaryzhang Jan 25, 2017
9eca1a6
fix style issue
actuaryzhang Jan 25, 2017
d44974c
rename to OffsetInstance and add param check
actuaryzhang Jan 25, 2017
9c320ee
create separate instance definition when initializing
actuaryzhang Jan 26, 2017
e183c08
fix style in test
actuaryzhang Jan 26, 2017
58f93af
resolve conflict
actuaryzhang Jan 27, 2017
da4174a
add test for tweedie
actuaryzhang Jan 27, 2017
52bc32b
cast offset and add in instrumentation
actuaryzhang Jan 28, 2017
59e10f7
update var name
actuaryzhang Jan 30, 2017
1d41bdd
add test for intercept only
actuaryzhang Feb 8, 2017
fb372ad
update test
actuaryzhang Feb 8, 2017
2bc3ae7
pull and merge
actuaryzhang Feb 8, 2017
afb4643
implement null dev for offset model
actuaryzhang Feb 9, 2017
fc64d32
fix null deviance calculation and add tests
actuaryzhang Feb 10, 2017
90d68a6
allow missing offset in prediction
actuaryzhang Feb 14, 2017
e95c25b
clean up
actuaryzhang Feb 14, 2017
4b336be
Merge branch 'master' of https://github.com/apache/spark into offset
actuaryzhang Feb 14, 2017
1e47a11
address comments
actuaryzhang Jun 27, 2017
db0ac93
address comments
actuaryzhang Jun 29, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions mllib/src/main/scala/org/apache/spark/ml/feature/Instance.scala
Original file line number Diff line number Diff line change
Expand Up @@ -27,3 +27,25 @@ import org.apache.spark.ml.linalg.Vector
* @param features The vector of features for this data point.
*/
private[ml] case class Instance(label: Double, weight: Double, features: Vector)

/**
* Case class that represents an instance of data point with
* label, weight, offset and features.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add doc This is mainly used in GeneralizedLinearRegression currently.

*
* @param label Label for this data point.
* @param weight The weight of this instance.
* @param offset The offset used for this data point.
* @param features The vector of features for this data point.
*/
private[ml] case class OffsetInstance(label: Double, weight: Double, offset: Double,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong preference, but maybe calling it GLMInstance is clearer?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we should be using Spark style indentation:

private[ml] case class OffsetInstance(
    label: Double,
    weight: Double,
    offset: Double,
    features: Vector) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the style.
I think I'll stick with OffsetInstance. There may be other models in the future that use offset.

features: Vector) {

/** Constructs from an [[Instance]] object and offset */
def this(instance: Instance, offset: Double = 0.0) = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove it if it was not used anymore.

this(instance.label, instance.weight, offset, instance.features)
}

/** Converts to an [[Instance]] object by leaving out the offset. */
private[ml] def toInstance: Instance = Instance(label, weight, features)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like this method is only used once in a test case, might it be better to remove it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, another alternative might be to make OffsetInstance inherit from Instance (which I wrote below as well). What do you think about this idea?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is only used once in the current code, and I can get rid of it. But I feel other regression-type models may use offset at some point and having this method will make it easier to switch between Instance and OffsetInstance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove private[ml] since you have marked the whole class as private[ml].


}
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
package org.apache.spark.ml.optim

import org.apache.spark.internal.Logging
import org.apache.spark.ml.feature.Instance
import org.apache.spark.ml.feature.{Instance, OffsetInstance}
import org.apache.spark.ml.linalg._
import org.apache.spark.rdd.RDD

Expand All @@ -43,7 +43,7 @@ private[ml] class IterativelyReweightedLeastSquaresModel(
* find M-estimator in robust regression and other optimization problems.
*
* @param initialModel the initial guess model.
* @param reweightFunc the reweight function which is used to update offsets and weights
* @param reweightFunc the reweight function which is used to update working labels and weights
* at each iteration.
* @param fitIntercept whether to fit intercept.
* @param regParam L2 regularization parameter used by WLS.
Expand All @@ -57,13 +57,13 @@ private[ml] class IterativelyReweightedLeastSquaresModel(
*/
private[ml] class IterativelyReweightedLeastSquares(
val initialModel: WeightedLeastSquaresModel,
val reweightFunc: (Instance, WeightedLeastSquaresModel) => (Double, Double),
val reweightFunc: (OffsetInstance, WeightedLeastSquaresModel) => (Double, Double),
val fitIntercept: Boolean,
val regParam: Double,
val maxIter: Int,
val tol: Double) extends Logging with Serializable {

def fit(instances: RDD[Instance]): IterativelyReweightedLeastSquaresModel = {
def fit(instances: RDD[OffsetInstance]): IterativelyReweightedLeastSquaresModel = {

var converged = false
var iter = 0
Expand All @@ -75,10 +75,10 @@ private[ml] class IterativelyReweightedLeastSquares(

oldModel = model

// Update offsets and weights using reweightFunc
// Update working labels and weights using reweightFunc
val newInstances = instances.map { instance =>
val (newOffset, newWeight) = reweightFunc(instance, oldModel)
Instance(newOffset, newWeight, instance.features)
val (newLabel, newWeight) = reweightFunc(instance, oldModel)
Instance(newLabel, newWeight, instance.features)
}

// Estimate new model
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ import org.apache.spark.SparkException
import org.apache.spark.annotation.{Experimental, Since}
import org.apache.spark.internal.Logging
import org.apache.spark.ml.PredictorParams
import org.apache.spark.ml.feature.Instance
import org.apache.spark.ml.feature.{Instance, OffsetInstance}
import org.apache.spark.ml.linalg.{BLAS, Vector}
import org.apache.spark.ml.optim._
import org.apache.spark.ml.param._
Expand Down Expand Up @@ -134,6 +134,17 @@ private[regression] trait GeneralizedLinearRegressionBase extends PredictorParam
@Since("2.0.0")
def getLinkPredictionCol: String = $(linkPredictionCol)

/**
* Param for offset column name. If this is not set or empty, we treat all
* instance offsets as 0.0.
* @group param
*/
final val offsetCol: Param[String] = new Param[String](this, "offsetCol", "The offset " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Since("2.3.0")

"column name. If this is not set or empty, we treat all instance offsets as 0.0")

/** @group getParam */
def getOffsetCol: String = $(offsetCol)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like you will need to update the validateAndTransformSchema method below to validate these parameters - eg check if the column exists? (similar to what the base class does for features/label columns)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for fixing!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Since("2.3.0")


/** Checks whether we should output link prediction. */
private[regression] def hasLinkPredictionCol: Boolean = {
isDefined(linkPredictionCol) && $(linkPredictionCol).nonEmpty
Expand Down Expand Up @@ -168,6 +179,9 @@ private[regression] trait GeneralizedLinearRegressionBase extends PredictorParam
}

val newSchema = super.validateAndTransformSchema(schema, fitting, featuresDataType)
if (isSet(offsetCol) && $(offsetCol).nonEmpty) {
SchemaUtils.checkNumericType(schema, $(offsetCol))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need check numeric type for both fit & transform. Offset column was used when model prediction if applicable.

if (hasLinkPredictionCol) {
SchemaUtils.appendColumn(newSchema, $(linkPredictionCol), DoubleType)
} else {
Expand Down Expand Up @@ -302,6 +316,17 @@ class GeneralizedLinearRegression @Since("2.0.0") (@Since("2.0.0") override val
@Since("2.0.0")
def setWeightCol(value: String): this.type = set(weightCol, value)

/**
* Sets the value of param [[offsetCol]].
* The feature specified as offset has a constant coefficient of 1.0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this line to param doc. Usually we keep the most integrated doc in param annotation, and for set method, we can just say Sets the value of param [[offsetCol]].

* If this is not set or empty, we treat all instance offsets as 0.0.
* Default is not set, so all instances have offset 0.0.
*
* @group setParam
*/
@Since("2.2.0")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2.2.0 -> 2.3.0

def setOffsetCol(value: String): this.type = set(offsetCol, value)

/**
* Sets the solver algorithm used for optimization.
* Currently only supports "irls" which is also the default solver.
Expand All @@ -325,7 +350,7 @@ class GeneralizedLinearRegression @Since("2.0.0") (@Since("2.0.0") override val

val numFeatures = dataset.select(col($(featuresCol))).first().getAs[Vector](0).size
val instr = Instrumentation.create(this, dataset)
instr.logParams(labelCol, featuresCol, weightCol, predictionCol, linkPredictionCol,
instr.logParams(labelCol, featuresCol, weightCol, offsetCol, predictionCol, linkPredictionCol,
family, solver, fitIntercept, link, maxIter, regParam, tol)
instr.logNumFeatures(numFeatures)

Expand All @@ -336,14 +361,19 @@ class GeneralizedLinearRegression @Since("2.0.0") (@Since("2.0.0") override val
}

val w = if (!isDefined(weightCol) || $(weightCol).isEmpty) lit(1.0) else col($(weightCol))
val instances: RDD[Instance] =
dataset.select(col($(labelCol)), w, col($(featuresCol))).rdd.map {
case Row(label: Double, weight: Double, features: Vector) =>
Instance(label, weight, features)
}
val offset = if (!isDefined(offsetCol) || $(offsetCol).isEmpty) {
lit(0.0)
} else {
col($(offsetCol)).cast(DoubleType)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor style comment - I think these 4 lines here need to be indented with 2 more spaces:
lit(0.0)
} else {
col($(offsetCol)).cast(DoubleType)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm following the style in existing code, for example:

  lazy val rank: Long = if (model.getFitIntercept) {
    model.coefficients.size + 1
  } else {
    model.coefficients.size
  }


val model = if (familyAndLink.family == Gaussian && familyAndLink.link == Identity) {
// TODO: Make standardizeFeatures and standardizeLabel configurable.
val instances: RDD[Instance] =
dataset.select(col($(labelCol)), w, offset, col($(featuresCol))).rdd.map {
case Row(label: Double, weight: Double, offset: Double, features: Vector) =>
Instance(label - offset, weight, features)
}
val optimizer = new WeightedLeastSquares($(fitIntercept), $(regParam), elasticNetParam = 0.0,
standardizeFeatures = true, standardizeLabel = true)
val wlsModel = optimizer.fit(instances)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of adding a new interface fit(instances: RDD[OffsetInstance]) for WeightedLeastSquares? Then we can remove some redundant code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest we leave WeightedLeastSquares as is, since it is a general purpose optimization tool and offset is more specific to GLM. I have not seen a weighted least squares implementation that supports offset.

We discussed something relevant above here. I originally defined val instances: RDD[OffsetInstance] outside the ifelse and then convert it to RDD[Instance] for the Gaussian identity link case. But this will incur one extra map. There was some concern that this could be expensive. However, if this extra conversion is not a big deal, I can revert it back to that which is basically the implementation of the OffsetInstance interface for WeightedLeastSquares.

Expand All @@ -354,6 +384,11 @@ class GeneralizedLinearRegression @Since("2.0.0") (@Since("2.0.0") override val
wlsModel.diagInvAtWA.toArray, 1, getSolver)
model.setSummary(Some(trainingSummary))
} else {
val instances: RDD[OffsetInstance] =
dataset.select(col($(labelCol)), w, offset, col($(featuresCol))).rdd.map {
case Row(label: Double, weight: Double, offset: Double, features: Vector) =>
OffsetInstance(label, weight, offset, features)
}
// Fit Generalized Linear Model by iteratively reweighted least squares (IRLS).
val initialModel = familyAndLink.initialize(instances, $(fitIntercept), $(regParam))
val optimizer = new IterativelyReweightedLeastSquares(initialModel,
Expand Down Expand Up @@ -417,12 +452,12 @@ object GeneralizedLinearRegression extends DefaultParamsReadable[GeneralizedLine
* Get the initial guess model for [[IterativelyReweightedLeastSquares]].
*/
def initialize(
instances: RDD[Instance],
instances: RDD[OffsetInstance],
fitIntercept: Boolean,
regParam: Double): WeightedLeastSquaresModel = {
val newInstances = instances.map { instance =>
val mu = family.initialize(instance.label, instance.weight)
val eta = predict(mu)
val eta = predict(mu) - instance.offset
Instance(eta, instance.weight, instance.features)
}
// TODO: Make standardizeFeatures and standardizeLabel configurable.
Expand All @@ -436,13 +471,13 @@ object GeneralizedLinearRegression extends DefaultParamsReadable[GeneralizedLine
* The reweight function used to update offsets and weights
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update offsets -> update working labels

* at each iteration of [[IterativelyReweightedLeastSquares]].
*/
val reweightFunc: (Instance, WeightedLeastSquaresModel) => (Double, Double) = {
(instance: Instance, model: WeightedLeastSquaresModel) => {
val reweightFunc: (OffsetInstance, WeightedLeastSquaresModel) => (Double, Double) = {
(instance: OffsetInstance, model: WeightedLeastSquaresModel) => {
val eta = model.predict(instance.features)
val mu = fitted(eta)
val offset = eta + (instance.label - mu) * link.deriv(mu)
val weight = instance.weight / (math.pow(this.link.deriv(mu), 2.0) * family.variance(mu))
(offset, weight)
val mu = fitted(eta + instance.offset)
val newLabel = eta + (instance.label - mu) * link.deriv(mu)
val newWeight = instance.weight / (math.pow(this.link.deriv(mu), 2.0) * family.variance(mu))
(newLabel, newWeight)
}
}
}
Expand Down Expand Up @@ -940,15 +975,27 @@ class GeneralizedLinearRegressionModel private[ml] (
private lazy val familyAndLink = FamilyAndLink(this)

override protected def predict(features: Vector): Double = {
val eta = predictLink(features)
if (!isSet(offsetCol) || $(offsetCol).isEmpty) {
val eta = BLAS.dot(features, coefficients) + intercept
familyAndLink.fitted(eta)
} else {
throw new SparkException("Must supply offset to predict when offset column is set.")
}
}

/**
* Calculates the predicted value when offset is set.
*/
protected def predict(features: Vector, offset: Double): Double = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why make it protected?

val eta = predictLink(features, offset)
familyAndLink.fitted(eta)
}

/**
* Calculate the link prediction (linear predictor) of the given instance.
* Calculates the link prediction (linear predictor) of the given instance.
*/
private def predictLink(features: Vector): Double = {
BLAS.dot(features, coefficients) + intercept
private def predictLink(features: Vector, offset: Double): Double = {
BLAS.dot(features, coefficients) + intercept + offset
}

override def transform(dataset: Dataset[_]): DataFrame = {
Expand All @@ -957,14 +1004,19 @@ class GeneralizedLinearRegressionModel private[ml] (
}

override protected def transformImpl(dataset: Dataset[_]): DataFrame = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I summarized all four cases for making prediction as following:

Estimator(training data) Transformer(prediction data) How R predict How Spark predict
w/ offset column w/ offset column use offset of prediction data use offset of prediction data
w/ offset column w/o offset column use offset of training data not use offset
w/o offset column w/ offset column not use offset not use offset
w/o offset column w/o offset column not use offset not use offset

For case 1 and 4, there is not that controversial.
For case 2, the reason behind a different way to handle is we can't store all offset data in our model like what R does, but we should print a warning log to let users know that is different from R.
For case 3, in your current implementation, it ignores whether the model was trained with offset. I think it might be worth discussing. I think the correct way should consider whether the model was trained with offset. If the model was trained without offset, we should ignore the offset column when making prediction on new dataset. Or at least, we should print out warning to remind users.
However, I think we can discuss and resolve this issue in follow-up work. @actuaryzhang What do you think my proposal of how Spark make prediction? Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for summarizing the different cases. I think this is worth a deeper discussion as follow-up work. Let me work on this in another PR.

val predictUDF = udf { (features: Vector) => predict(features) }
val predictLinkUDF = udf { (features: Vector) => predictLink(features) }
val predictUDF = udf { (features: Vector, offset: Double) => predict(features, offset) }
val predictLinkUDF = udf { (features: Vector, offset: Double) => predictLink(features, offset) }
val offset = if (!isSet(offsetCol) || $(offsetCol).isEmpty) {
lit(0.0)
} else {
col($(offsetCol)).cast(DoubleType)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we simplify it as following?

val offset = if (!isSetOffsetCol(this)) lit(0.0) else col($(offsetCol)).cast(DoubleType)

Here it's not necessary to run checkNumericType, since it has been check in validateAndTransformSchema.

var output = dataset
if ($(predictionCol).nonEmpty) {
output = output.withColumn($(predictionCol), predictUDF(col($(featuresCol))))
output = output.withColumn($(predictionCol), predictUDF(col($(featuresCol)), offset))
}
if (hasLinkPredictionCol) {
output = output.withColumn($(linkPredictionCol), predictLinkUDF(col($(featuresCol))))
output = output.withColumn($(linkPredictionCol), predictLinkUDF(col($(featuresCol)), offset))
}
output.toDF()
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,16 +18,16 @@
package org.apache.spark.ml.optim

import org.apache.spark.SparkFunSuite
import org.apache.spark.ml.feature.Instance
import org.apache.spark.ml.feature.{Instance, OffsetInstance}
import org.apache.spark.ml.linalg.Vectors
import org.apache.spark.ml.util.TestingUtils._
import org.apache.spark.mllib.util.MLlibTestSparkContext
import org.apache.spark.rdd.RDD

class IterativelyReweightedLeastSquaresSuite extends SparkFunSuite with MLlibTestSparkContext {

private var instances1: RDD[Instance] = _
private var instances2: RDD[Instance] = _
private var instances1: RDD[OffsetInstance] = _
private var instances2: RDD[OffsetInstance] = _

override def beforeAll(): Unit = {
super.beforeAll()
Expand All @@ -43,7 +43,7 @@ class IterativelyReweightedLeastSquaresSuite extends SparkFunSuite with MLlibTes
Instance(0.0, 2.0, Vectors.dense(1.0, 2.0)),
Instance(1.0, 3.0, Vectors.dense(2.0, 1.0)),
Instance(0.0, 4.0, Vectors.dense(3.0, 3.0))
), 2)
), 2).map(new OffsetInstance(_))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please construct OffsetInstance with offset = 0.0 directly, it will make code more easy to understand.

/*
R code:

Expand All @@ -56,7 +56,7 @@ class IterativelyReweightedLeastSquaresSuite extends SparkFunSuite with MLlibTes
Instance(8.0, 2.0, Vectors.dense(1.0, 7.0)),
Instance(3.0, 3.0, Vectors.dense(2.0, 11.0)),
Instance(9.0, 4.0, Vectors.dense(3.0, 13.0))
), 2)
), 2).map(new OffsetInstance(_))
}

test("IRLS against GLM with Binomial errors") {
Expand Down Expand Up @@ -156,7 +156,7 @@ class IterativelyReweightedLeastSquaresSuite extends SparkFunSuite with MLlibTes
var idx = 0
for (fitIntercept <- Seq(false, true)) {
val initial = new WeightedLeastSquares(fitIntercept, regParam = 0.0, elasticNetParam = 0.0,
standardizeFeatures = false, standardizeLabel = false).fit(instances2)
standardizeFeatures = false, standardizeLabel = false).fit(instances2.map(_.toInstance))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my above comment about adding interface fit(instances: RDD[OffsetInstance]).

val irls = new IterativelyReweightedLeastSquares(initial, L1RegressionReweightFunc,
fitIntercept, regParam = 0.0, maxIter = 200, tol = 1e-7).fit(instances2)
val actual = Vectors.dense(irls.intercept, irls.coefficients(0), irls.coefficients(1))
Expand All @@ -169,29 +169,29 @@ class IterativelyReweightedLeastSquaresSuite extends SparkFunSuite with MLlibTes
object IterativelyReweightedLeastSquaresSuite {

def BinomialReweightFunc(
instance: Instance,
instance: OffsetInstance,
model: WeightedLeastSquaresModel): (Double, Double) = {
val eta = model.predict(instance.features)
val eta = model.predict(instance.features) + instance.offset
val mu = 1.0 / (1.0 + math.exp(-1.0 * eta))
val z = eta + (instance.label - mu) / (mu * (1.0 - mu))
val z = eta - instance.offset + (instance.label - mu) / (mu * (1.0 - mu))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why - instance.offset? I suspect it's wrong, the test doesn't fail because it's zero.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed this is the correct implementation: in the IRWLS, we only include offset when computing mu and use Xb (without offset) when updating the working label. To see this clearly, one would have to derive the IRWLS. But for a quick reference, below is R's implementation:

eta <- drop(x %*% start)
mu <- linkinv(eta <- eta + offset)
z <- (eta - offset)[good] + (y - mu)[good]/mu.eta.val[good]
w <- sqrt((weights[good] * mu.eta.val[good]^2)/variance(mu)[good])
fit <- .Call(C_Cdqrls, x[good, , drop = FALSE] * 
              w, z * w, min(1e-07, control$epsilon/1000), check = FALSE)

val w = mu * (1 - mu) * instance.weight
(z, w)
}

def PoissonReweightFunc(
instance: Instance,
instance: OffsetInstance,
model: WeightedLeastSquaresModel): (Double, Double) = {
val eta = model.predict(instance.features)
val eta = model.predict(instance.features) + instance.offset
val mu = math.exp(eta)
val z = eta + (instance.label - mu) / mu
val z = eta - instance.offset + (instance.label - mu) / mu
val w = mu * instance.weight
(z, w)
}

def L1RegressionReweightFunc(
instance: Instance,
instance: OffsetInstance,
model: WeightedLeastSquaresModel): (Double, Double) = {
val eta = model.predict(instance.features)
val eta = model.predict(instance.features) + instance.offset
val e = math.max(math.abs(eta - instance.label), 1e-7)
val w = 1 / e
val y = instance.label
Expand Down
Loading