In AUC and AUCPR metrics, detect whether weights are per-instance or per-group #4216

xydrolase · 2019-03-05T17:56:07Z

Overview

This is a proposed bugfix based on PR #3379 originally submitted by @ngoyal2707, which intends to change how the instance weight vector is defined.

For a training dataset of K groups and N total training examples/instances, @ngoyal2707's original PR requires a weight vector of length K. That is, each group has its own weight, and the same weight is shared by all instances in the same group. While this scheme of specifying group weights works without issues in isolation, it however does conflicts with the computation of certain ranking metrics (e.g. auc and aucpr) in https://github.com/dmlc/xgboost/blob/master/src/metric/rank_metric.cc, which expects the weights to be populated for every training example.

Example

To demonstrate the bug, we can use the following example Python code:

import numpy as np
import xgboost as xgb

# some mock training data with 20 training examples and 10 features, divided into 4 groups
# each group has 5 training examples.
X = np.random.rand(20, 10)
y = np.array([1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0])
weights = np.array([1.0, 2.0, 3.0, 4.0])

# construct DMatrix
group = np.array([5, 5, 5, 5], dtype=np.uint)
dmatrix = xgb.DMatrix(X, label=y, weight=weights)
dmatrix.set_group(group)

train_params = {
    "learning_rate": 0.1,
    "min_child_weight": 1,
    "tree_method": "exact",
    "objective": "rank:ndcg",
    "nthread": -1
}

xgb.train(train_params, dmatrix, 10, evals=[(dmatrix, 'train')])

The above code will run without issues. But if we add a ranking metric that also uses weights to the eval_metric parameter, like:

train_params = {
    "learning_rate": 0.1,
    "min_child_weight": 1,
    "tree_method": "exact",
    "objective": "rank:ndcg",
    "nthread": -1,
    "eval_metric": "auc"
}

xgb.train(train_params, dmatrix, 10, evals=[(dmatrix, 'train')])

The xgb.train function call will fail with the following exception (most likely due to the underlying C++ code accessing memory that is out-of-bound):

XGBoostError: b'[12:31:37] src/metric/rank_metric.cc:137: Check failed: !auc_error AUC: the dataset only contains pos or neg samples\n\nStack trace returned 7 entries:\n[bt] (0) 0 libxgboost.dylib 0x0000001a12bff100 dmlc::StackTrace(unsigned long) + 464\n[bt] (1) 1 libxgboost.dylib 0x0000001a12bfede4 dmlc::LogMessageFatal::~LogMessageFatal() + 52\n[bt] (2) 2 libxgboost.dylib 0x0000001a12c6fea6 xgboost::metric::EvalAuc::Eval(std::__1::vector<float, std::__1::allocator > const&, xgboost::MetaInfo const&, bool) const + 1846\n[bt] (3) 3 libxgboost.dylib 0x0000001a12bfbe1b xgboost::LearnerImpl::EvalOneIter(int, std::__1::vector<xgboost::DMatrix*, std::__1::allocatorxgboost::DMatrix* > const&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > > > const&) + 1275\n[bt] (4) 4 libxgboost.dylib 0x0000001a12c1773d XGBoosterEvalOneIter + 717\n[bt] (5) 5 libffi.6.dylib 0x000000010cdfb884 ffi_call_unix64 + 76\n[bt] (6) 6 ??? 0x00007ffee4554950 0x0 + 140732729215312\n\n'

Proposed fix

To resolve the issue demonstrated above, this PR proposes to change the calculation of ranking objective in rank_obj.cc such that the utilization of weights is aligned with the implementation in rank_metric.cc.

Note that the alternative is to change how the weights are accessed in rank_metric.cc, such that when training a ranking model, the calculation of AUC and AUC-PR metrics assumes that the weights are provided on a per-group basis. However, using a per-group weighting scheme does reduce the overall flexibility of specifying weights.

ToDo

Update the documentation in https://xgboost.readthedocs.io/en/latest/python/python_api.html#xgboost.DMatrix.set_weight so that it reflects the changes.

xydrolase · 2019-03-05T18:03:32Z

@CodingCat @ngoyal2707 @hcho3

src/objective/rank_obj.cc

hcho3 · 2019-03-12T18:11:04Z

@xydrolase Any updates? How can I help?

…per-group

Made changes as suggested

hcho3 · 2019-05-02T08:17:39Z

@trivialfis I noticed lack of tests in ranking. And when I tried some weighted data, I am getting AUCPR > 1.0 :( Will need to look deeper...

hcho3 · 2019-05-02T08:33:02Z

It turns out that AUCPR is quite broken for learning-to-rank task (rank:pairwise, rank:ndcg etc). This is bad...

tests/python/test_ranking.py

hcho3 · 2019-05-02T08:46:15Z

@trivialfis @RAMitchell I'm inclined to merge this pull request, and submit another pull request to fix the AUCPR metric. What do you think?

See #4431

hcho3 · 2019-05-03T04:15:00Z

Don't merge this yet; #4436 should be merged first.

…ights

hcho3 previously requested changes Mar 5, 2019

View reviewed changes

src/objective/rank_obj.cc Outdated Show resolved Hide resolved

hcho3 requested a review from trivialfis March 5, 2019 18:16

xydrolase changed the title ~~Bugfix: More consistent weight vector for ranking objectives.~~ [WIP] Bugfix: More consistent weight vector for ranking objectives. Mar 5, 2019

hcho3 removed the request for review from trivialfis May 1, 2019 01:53

In AUC and AUCPR metrics, detect whether weights are per-instance or …

af90762

…per-group

hcho3 force-pushed the bugfix/ranking-weights branch from 51815a5 to af90762 Compare May 1, 2019 03:13

hcho3 changed the title ~~[WIP] Bugfix: More consistent weight vector for ranking objectives.~~ In AUC and AUCPR metrics, detect whether weights are per-instance or per-group May 1, 2019

hcho3 requested review from trivialfis and RAMitchell May 1, 2019 03:15

hcho3 mentioned this pull request May 1, 2019

XGBoost 0.90 Roadmap #4389

Closed

18 tasks

Fix C++ style check

3ecbe55

hcho3 added the status: need review label May 1, 2019

hcho3 added status: WIP and removed status: need review labels May 2, 2019

Add a test for weighted AUC

cabf289

hcho3 added status: need review and removed status: WIP labels May 2, 2019

hcho3 reviewed May 2, 2019

View reviewed changes

tests/python/test_ranking.py Show resolved Hide resolved

hcho3 requested a review from CodingCat May 3, 2019 06:03

Merge remote-tracking branch 'upstream/master' into bugfix/ranking-we…

9eec31c

…ights

hcho3 removed the status: need review label May 4, 2019

hcho3 merged commit 8d1098a into dmlc:master May 4, 2019

hcho3 mentioned this pull request May 17, 2019

[RFC] Version 0.90 release candidate #4475

Merged

lock bot locked as resolved and limited conversation to collaborators Aug 2, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

In AUC and AUCPR metrics, detect whether weights are per-instance or per-group #4216

In AUC and AUCPR metrics, detect whether weights are per-instance or per-group #4216

xydrolase commented Mar 5, 2019 •

edited

Loading

xydrolase commented Mar 5, 2019

hcho3 commented Mar 12, 2019

hcho3 commented May 2, 2019

hcho3 commented May 2, 2019

hcho3 commented May 2, 2019 •

edited

Loading

hcho3 commented May 3, 2019

In AUC and AUCPR metrics, detect whether weights are per-instance or per-group #4216

In AUC and AUCPR metrics, detect whether weights are per-instance or per-group #4216

Conversation

xydrolase commented Mar 5, 2019 • edited Loading

Overview

Example

Proposed fix

ToDo

xydrolase commented Mar 5, 2019

hcho3 commented Mar 12, 2019

hcho3 commented May 2, 2019

hcho3 commented May 2, 2019

hcho3 commented May 2, 2019 • edited Loading

hcho3 commented May 3, 2019

xydrolase commented Mar 5, 2019 •

edited

Loading

hcho3 commented May 2, 2019 •

edited

Loading