Behavior of the MAPE objective when the target follows an exponential distribution #6776

sktin · 2025-01-05T02:43:45Z

Description

It seems that the MAPE objective is not minimizing the MAPE when the target follows an exponential distribution.

Reproducible example

The following is a toy example.

import numpy as np
from sklearn.datasets import make_regression
from scipy.stats import zscore, expon, norm

X, y = make_regression(10000, random_state=42)
y = expon.ppf(norm.cdf(zscore(y)))*10000
print(F'{X.shape=}, {y.min()=}, {y.max()=}')

Expected output:

X.shape=(10000, 100), y.min()=1.0543257427765556, y.max()=131745.43307629693

Suppose the problem is a regression problem on this data with the goal to minimize MAPE. lightgbm already supports the MAPE objective natively so that is one option. Mathematically, the MAPE objective is the MAE objective with sample weights $1/y_{true}$ so that might be another option. A third option is using the MAE objective with log-transformed target, based on the approximation

$$\frac{|y_{pred}-y_{true}|}{y_{true}}=|\log(y_{pred})-\log(y_{true})|+o(|\log(y_{pred})-\log(y_{true})|)$$

On this dataset, the results from the 3 options are very different. Surprisingly, the 3rd option gives much lower MAPE.

from sklearn.model_selection import cross_val_score, KFold
from sklearn.compose import TransformedTargetRegressor
from lightgbm import LGBMRegressor

import sklearn; print(F'{sklearn.__version__=}')
import lightgbm; print(F'{lightgbm.__version__=}')

params = {'n_jobs': 4, 'random_state': 0, 'verbose': -1}
models = {
    'mape': (LGBMRegressor(objective='mape', **params), None),
    'mae+sample_weight': (LGBMRegressor(objective='l1', **params), 1/y),
    'mae+log_transformed_target': (TransformedTargetRegressor(
        LGBMRegressor(objective='l1', **params), func=np.log,
        inverse_func=np.exp
    ), None)
}
for m in models:
    model, sample_weight = models[m]
    scores = -cross_val_score(
        model, X, y, scoring='neg_mean_absolute_percentage_error',
        cv=KFold(5, shuffle=True, random_state=0), n_jobs=1,
        params={'sample_weight': sample_weight}
    )
    print(F'{m}: {scores=}')

Expected output:

sklearn.__version__='1.5.2'
lightgbm.__version__='4.5.0'
mape: scores=array([0.78408993, 0.80927358, 0.79041595, 0.76549926, 0.78440253])
mae+sample_weight: scores=array([0.62120737, 0.55867845, 0.57467397, 0.60479041, 0.60400053])
mae+log_transformed_target: scores=array([0.34786906, 0.37569717, 0.31335129, 0.29947731, 0.30349381])

Environment info

LightGBM version or commit hash: 4.5.0

Command(s) you used to install LightGBM

!pip install -U lightgbm

Additional Comments

The text was updated successfully, but these errors were encountered:

jameslamb added the question label Jan 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Behavior of the MAPE objective when the target follows an exponential distribution #6776

Behavior of the MAPE objective when the target follows an exponential distribution #6776

sktin commented Jan 5, 2025 •

edited

Loading

Behavior of the MAPE objective when the target follows an exponential distribution #6776

Behavior of the MAPE objective when the target follows an exponential distribution #6776

Comments

sktin commented Jan 5, 2025 • edited Loading

Description

Reproducible example

Environment info

Additional Comments

sktin commented Jan 5, 2025 •

edited

Loading