Update abstract PyFunc to utilise returned treatment_config #164
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Context
Turing SDK offers an abstract class
PyFunc
from which the concretePyFuncEnsembler
is implemented. In particular, thisPyFunc
class implements the abstract methodpredict
from its parentmlflow.pyfunc.PythonModel
, which gets called whenever themlflow
model is used to generate predictions (or in this case, ensembling results).This
predict
method partially implements the batch ensembling logic, i.e. it performs some simpleDataFrame
manipulations by separating a single inputDataFrame
intoDataFrames
corresponding tofeatures
,predictions
andtreatment_config
, that subsequently get passed to a user-definedensemble
method, which contains the rest (and the crux) of the ensembling logic.Previously, the
treatment_config
field has always been unused, since in batch ensembling, a treatment can equivalently be defined in the features columns, hence theNone
value being passed toensemble
when it gets called bypredict
.However, as we are moving to implement real-time pyfunc ensemblers that ultimately use the same
PyFuncEnsembler
class (which would allow a user to use the very same ensembler in both batch and real-time ensembling, when the same batch columns/live payload naming convention is used), there is a need to handle anytreatment_config
that appears in a live-ensembling request.This PR thus aims to expose this
treatment_config
to the user in theensemble
method (which has previously always been a null object), allowing users to manipulate the payload from a live Turing router with a user defined ensembling method in a more intuitive manner.Extra Context
In batch ensembling,
predictions
are currently passed topredict
through themodel_input
argument. These predictions are contained in columns with the__predictions__
header, due to a tabular join operation upstream by other batch ensembling components. In order to not break the existing naming convention of columns inmodel_input
, the future engine for the real-time ensembler will similarly passpredictions
as well astreatment_config
as columns inmodel_input
with the same naming convention.Features
treatment_config
data that gets passed topredict
as part of themodel_input
argument into a separateDataFrame
, which then gets passed to theensemble
method as thetreament_config
argument:__[prefix_name]__
prefix into a static method:Modifications
sdk/turing/ensembler.py
- addition of logic to considertreatment_config
being passed topredict
as part of themodel_input
argument