iterative-proportional-fitting

Iterative Proportional fitting technique to create sample weights such that data is representative of the target dataset or distributions. In statistics, a sample statistic would be biased if sample is not representative of the "population". In ML/AB context, impact of an treatment in an experiment (treatment vs control) or impact of an attribute in an "explainable" model would be biased/incorrect if the distribution of the sample on features is not accurate.

Often with a design of experiments, the effect of a treatment (vs control) could be measured using randomised control trails. However, it might not always be feasible to divide traffic/users/devices/entities in this manner.

The outcome/target/metric could be driven by treatment vs control differences. Besides this, there could be other dimensions which could affect the outcome - exogenous effect variables
In some cases there might be "endogenous / confounder variables".
In such instances, one coule follow a few matching/propensity-scoring techniques such that, the effect of treatment vs difference can be measured, " after accounting for the exogenous effect and endogenous confounder variables/dimensions.
For many linear problems, the econometric modeling methods such as ANOVA/ANCOVA can be used [linear models] . Or more advancements in recent decade, such as counter factual techniques, conditional average treatment effect (CATE) methods could be applied.
In your armour of techniques, add iterative proportional fitting method too.

**KEY POINTS **

It is simple, efficient, can be used to match unconventional distributions.
It can be used to match non-parameteric, discretised distributions of features in the sample.
You may not know the joint distribution of the features. You can try IPF to match "marginal distributions".
Note that there might not be a "feasible set" of weights in some scenarios.
- For example, assume there are two dimensions which are negatively correlated (~very high).
- But user provides target distributions for dimension 1 and 2 such that they seem to be highly +vely correlated.
- A "feasible set" of weights might not exist. Or the weights could be highly skewed, that a few rows in the data have very large weight and most of the rows have very small (close to zero) weight.
- So chose targets wisely.

Below notebook does following:

Creates targets
Creates dummy sample
IPF class
- train weights
- compute standardized mean difference between target and simulated sample distributions [raw distributions]

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
iterative_proportional_fitting.ipynb		iterative_proportional_fitting.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

iterative-proportional-fitting

About

Releases 1

Packages

Languages

nm-narasimha/iterative-proportional-fitting

Folders and files

Latest commit

History

Repository files navigation

iterative-proportional-fitting

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages