Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random assignment for a subset of variables in predict. #5177

Open
jaguerrerod opened this issue Apr 25, 2022 · 2 comments
Open

Random assignment for a subset of variables in predict. #5177

jaguerrerod opened this issue Apr 25, 2022 · 2 comments

Comments

@jaguerrerod
Copy link

Summary

Add random assignment for a subset of variables in predict.

Motivation

Random assignment for a variable V in a tree is a way to drop the effect of that variable in the model. The procedure is assign randomly in every node that uses V for spliting and in the rest do as usual.
Random assignment of a subset of variables W = {V1, V2..Vi} is similar. The assignment in nodes which splitting use any variable in W is random and the rest as usual.

This is usefull for computing honest variable importance over a new test set as the difference between the prediction error with and without perturbation.
This method can be used for interaction computation of two subsets of variables V and W, defined as the difference between paired importance (importance when both V and W are assigned randomly) vs added importance (sum of importance when each of W or V is assigned randomly).

References

The method is referenced in:
Hemant Ishwaran. "Variable importance in binary regression trees and forests." Electron. J. Statist. 1 519 - 537, 2007. https://doi.org/10.1214/07-EJS039

@suncanghuai
Copy link

suncanghuai commented Jul 9, 2023

Call Path For Prediction

I have organized most of the call paths for LightGBM's prediction functions to figure out how to implement this feature. There may be some trivial mistakes because I'm new to LightGBM's source code, but these shouldn't be problems:smirk:
LightGBM_Predict_call_stack

Thoughts on implementation

It seems to me that this feature is about adding one or more input parameters in Apis in the External Interface Layer and passing them down through a call path to control prediction behavior in Model Layer.
As to add more input parameters, I think we may add two variables to Class Config which is used as an expandable cross-language parameter set. One variable represents the type of prediction strategy, another variable contains control parameters of prediction strategy as "a subset of variables" mentioned in this issue. We should also do some similar work to Class Predictor.

  //this piece of code in c_api.c shows the way we pass an expandable cross-language parameter set from Python/R to C/C++
  API_BEGIN();
  auto param = Config::Str2Map(para meter);
  Config config;
  config.Set(param);
  OMP_SET_NUM_THREADS(config.num_threads);
  Booster* ref_booster = reinterpret_cast<Booster*>(handle);
  ref_booster->Predict(start_iteration, num_iteration, predict_type, data_filename, data_has_header,
                       config, result_filename);
  API_END();

As to control prediction behavior in Model Layer, I think there is a choice here:

  1. add a new function as predict/predictRaw/predictContrib to Class Boosting and Class Tree
    image
  2. introduce a new Class named as PredictStrategy(to be determined), which is a wrapper class for Tree::Predict. We create a PredictStratgy object in Booster Layer and pass it down to Class Tree.
inline int Tree::GetLeafByMap(const std::unordered_map<int, double>& feature_values) const {
  int node = 0;
  auto decision_func = pred_strategy.getDecisonFunc(self); 
  auto num_decision_func = pred_strategy.getNumericalDecisonFunc(self); 
  if (num_cat_ > 0) {
    while (node >= 0) {
      node = decision_func(feature_values.count(split_feature_[node]) > 0 ? feature_values.at(split_feature_[node]) : 0.0f, node);
    }
  } else {
    while (node >= 0) {
      node = num_decision_func(feature_values.count(split_feature_[node]) > 0 ? feature_values.at(split_feature_[node]) : 0.0f, node);
    }
  }
  return ~node;
}

Consider that we may have some similar feature requests in the future, choice 2 is a more scalable option. But choice 2 is a obviously more complex modification that possibly causes some potential code defects. In contrast, choice 1 is more easy to understand and less couple with other code designs. Although stacking more similar functions is not a elegant code design, but also not a problem. So I think choice 1 is a better option.

It seems this issue has no one working on it. I'd like to work on it and already have an preliminary scheme. May I have this issue assigned to me and get some suggestions for my preliminary scheme? @jameslamb

@guolinke
Copy link
Collaborator

Hi @suncanghuai , thank you very much for the detailed plan and careful thoughts! The scheme looks good to me, I think option 1 is better for now; option 2 could be left in the future if there are indeed more similar functions.

Please go ahead, and ping us if you have any further questions.

suncanghuai added a commit to suncanghuai/LightGBM that referenced this issue Aug 27, 2023
suncanghuai added a commit to suncanghuai/LightGBM that referenced this issue Aug 28, 2023
* introduce a struct of parameters PredictionControlParameter

* add a new field random_assign_features to class Config

* add new GetLeaf functions that apply random assign mechanism to class Tree
suncanghuai added a commit to suncanghuai/LightGBM that referenced this issue Aug 28, 2023
* introduce a struct of parameters PredictionControlParameter

* add a new field random_assign_features to class Config

* add new GetLeaf functions that apply random assign mechanism to class Tree
suncanghuai added a commit to suncanghuai/LightGBM that referenced this issue Aug 28, 2023
* Introduce a struct of parameters PredictionControlParameter

* Add a new field random_assign_features to class Config

* Add new GetLeaf functions that apply random assign mechanism to class Tree
suncanghuai added a commit to suncanghuai/LightGBM that referenced this issue Aug 28, 2023
* Introduce a struct of parameters PredictionControlParameter

* Add a new field random_assign_features to class Config

* Add new GetLeaf functions that apply random assign mechanism to class Tree
suncanghuai added a commit to suncanghuai/LightGBM that referenced this issue Aug 28, 2023
* Introduce a struct of parameters PredictionControlParameter

* Add a new field random_assign_features to class Config

* Add new GetLeaf functions that apply random assign mechanism to class Tree
suncanghuai added a commit to suncanghuai/LightGBM that referenced this issue Aug 28, 2023
* Introduce a struct of parameters PredictionControlParameter

* Add a new field random_assign_features to class Config

* Add new GetLeaf functions that apply random assign mechanism to class Tree
suncanghuai added a commit to suncanghuai/LightGBM that referenced this issue Aug 28, 2023
* Introduce a struct of parameters PredictionControlParameter

* Add a new field random_assign_features to class Config

* Add new GetLeaf functions that apply random assign mechanism to class Tree
suncanghuai added a commit to suncanghuai/LightGBM that referenced this issue Aug 28, 2023
* Introduce a struct of parameters PredictionControlParameter

* Add a new field random_assign_features to class Config

* Add new GetLeaf functions that apply random assign mechanism to class Tree
suncanghuai added a commit to suncanghuai/LightGBM that referenced this issue Aug 28, 2023
* Introduce a struct of parameters PredictionControlParameter

* Add a new field random_assign_features to class Config

* Add new GetLeaf functions that apply random assign mechanism to class Tree
suncanghuai added a commit to suncanghuai/LightGBM that referenced this issue Aug 28, 2023
* Introduce a struct of parameters PredictionControlParameter

* Add a new field random_assign_features to class Config

* Add new GetLeaf functions that apply random assign mechanism to class Tree
suncanghuai added a commit to suncanghuai/LightGBM that referenced this issue Aug 28, 2023
* Introduce a struct of parameters PredictionControlParameter

* Add a new field random_assign_features to class Config

* Add new GetLeaf functions that apply random assign mechanism to class Tree
suncanghuai added a commit to suncanghuai/LightGBM that referenced this issue Aug 28, 2023
* Introduce a struct of parameters PredictionControlParameter

* Add a new field random_assign_features to class Config

* Add new GetLeaf functions that apply random assign mechanism to class Tree
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants