This package provides a set of utilities to help with evaluating prescriptive problems.
The package assumes your data is in the following format:
X
: AMatrix
orDataFrame
where each row contains the covariates for each observation.T
: AVector{Int}
giving the treatment applied for each observation. We assume the treatments are labelled as integers from 1 to the number of treatments.y
: AVector{Float64}
giving the outcome for each obbservation. We adopt the convention that smaller outcomes are better.
For the test set, the counterfactuals can be imputed with the following function:
cf = getcounterfactuals(X, y, T, impute_method)
X
, y
, and T
are the test data, and impute_method
specifies the regression method to use for counterfactual estimation (one of :knn
, :random_forest
, or :lasso
).
cf
is a Matrix{Float64}
containing the estimated outcome for each observation under each treatment.
Evaluates the outcomes of the current treatments on the data X
, y
and T
using the counterfactuals cf
:
baseline_outcomes = evaluatebaseline(cf, X, y, T)
baseline_outcomes
is a Vector{Float64}
containing the predicted outcome for each observation.
Evaluates the outcomes of the a clairvoyant oracle using the counterfactuals cf
:
oracle_outcomes, oracle_prescriptions = evaluateoracle(cf, allowed_prescriptions)
oracle_outcomes
is a Vector{Float64}
containing the predicted outcome for each observation. oracle_prescriptions
is a Vector{Int}
containing the prescribed treatment for each observation.
allowed_prescriptions
is an optional argument that allows you to specify for each observation the set of allowed treatments to make sure any prescription rules are respected by the oracle. If not specified, we assume all treatments are available for all observations.
Evaluates the outcomes of prescriptions
using the counterfactuals cf
:
predicted_outcomes = evaluateprescriptions(cf, prescriptions)
predicted_outcomes
is a Vector{Float64}
containing the predicted outcome for each observation.
The package also has simple utility functions for applying regress-and-compare methods to prescription problems of the form described.
Train the regress-and-compare method
(one of :knn
, :random_forest
, or :lasso
) on training data train_X
, train_y
and train_T
and predict the outcomes for each treatment on testing data test_X
:
outcomes = getoutcomes(train_X, train_y, train_T, test_X, method)
outcomes
is a Matrix{Float64}
containing the predicted outcome from the regress-and-compare for each treatment and observation pair.
Make prescriptions from the predicted regress-and-compare outcomes
subject to the allowed_prescriptions
(see 'Oracle evaluation'):
prescriptions = makeprescriptions(outcomes, allowed_prescriptions)
prescriptions
is a Vector{Int}
containing the prescribed treatment for each observation.
Assume we have training and testing data: train_X
, train_y
, train_T
, test_X
, test_y
and test_T
We can estimate the counterfactual outcomes on the test set with kNN using:
cf = getcounterfactuals(test_X, test_y, test_T, :knn)
We can get the baseline and oracle outcomes to get lower and upper bounds on performance:
baseline_outcomes = evaluatebaseline(cf, test_X, test_y, test_T)
oracle_outcomes, oracle_prescriptions = evaluateoracle(cf)
Now we can compare the various regress-and-compare methods. First kNN:
knn = getoutcomes(train_X, train_y, train_T, test_X, :knn)
knn_prescriptions = makeprescriptions(knn)
knn_outcomes = evaluateprescriptions(cf, knn_prescriptions)
Similar for random forests
rf = getoutcomes(train_X, train_y, train_T, test_X, :randomforest)
rf_prescriptions = makeprescriptions(rf)
rf_outcomes = evaluateprescriptions(cf, rf_prescriptions)
And for lasso regression
lasso = getoutcomes(train_X, train_y, train_T, test_X, :lasso)
lasso_prescriptions = makeprescriptions(lasso)
lasso_outcomes = evaluateprescriptions(cf, lasso_prescriptions)
This gives a vector of estimated outcomes on the test set for the baseline, oracle, and each of the three regress-and-compare methods. We can now use whatever metrics we want to compare these approaches.