Issues with prediction time not proportional w.r.t. number of trees in RF

Please fill out the form below.

### System Information
- **Framework (e.g. TensorFlow) / Algorithm (e.g. KMeans)**: SKLearn/Custom
- **Framework Version**: 0.20.0
- **Python Version**: 3.5
- **CPU or GPU**: CPU
- **Python SDK Version**: 1.18.2
- **Are you using a custom image**: No

### Describe the problem

My prediction time is not proportional to the number of trees in a Random Forest

### Minimal repro / logs

My estimation strategy consists on using a set of Random Forest models, each one concerns some 
subset of data (ex : RF_A if feature == A). This has been said seek of completeness as I don't think this affects my issue.

My deployment strategy:

- Fit: return a pickle that contains a dictionary of fitted sklearn Random Forest models
- Deploy: load these dictionaries in memory.
- Inference:
--maps each observation to the correct model in the already loaded dictionary
--for each observation, computes predictions given by each tree in order to allow for elementary confidence interval computation
http://blog.datadive.net/prediction-intervals-for-random-forests/
Note that this last operation is the most time consuming in the inference and the time is proportional to the number of trees in my RF (loop w.r.t. trees).

My code (my custom code in lib) :

```
import argparse
import os
import sys
import pandas as pd
from sklearn.externals import joblib
module_path = os.path.abspath('/opt/ml/code')
if module_path not in sys.path:
    sys.path.append(module_path)
from lib import training, prediction
from data.transactions import raw

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('--output-data-dir', type=str, default=os.environ['SM_OUTPUT_DATA_DIR'])
    parser.add_argument('--model-dir', type=str, default=os.environ['SM_MODEL_DIR'])
    args = parser.parse_args()
    grid_models_dict =\
        training.train_models_in_dict(raw_training_data=raw)
    joblib.dump(grid_models_dict, os.path.join(args.model_dir, "model"))
def model_fn(model_dir):
    grid_models_dict = joblib.load(os.path.join(model_dir, "model"))
    return grid_models_dict
def predict_fn(input_data, model):
    predicted = prediction.predict(input_data, model)
    return predicted
```

My problem :

I have two deployments scenarios : one with 100 trees/RF and one with 300 trees/RF.
Fit is performed without issues. On S3 : compressed 100 trees/RF pickle is 261 Mo and compressed 300 trees/RF is 784 Mo.
Deploy is done with some issues : some timeout with some workers with the 300 trees/RF already reported for example https://github.com/awslabs/amazon-sagemaker-examples/issues/556, but it deploy at the end.
Prediction is performed :
- with the 100 trees/RF in around 500 ms, always, with the same observation 
- with the 300 trees/RF: in paper, with the same observation, due my prediction nature which is a for loop w.r.t. trees, I am supposed to predict in maximum 1.5 seconds
- with the 300 trees/RF : in practice, with the same observation 
-- sometimes (33% of cases) in 700 ms,
-- sometimes (33% of cases) in 40 to 50 seconds,
-- and sometimes (33% of cases) I have a timeout error (inference timeout is limited to 60 seconds)
- This behavior remains when I deploy in a bigger/recent machine. (ml.t2.xlarge to ml.c5.4xlarge)

My guess is that there is a memory swapping mechanism or that the container's memory is not fully privately allocated to me after some threshold.

Is there any solution to predict consistently with more than 100trees/RF ?

Thanks in advance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Issues with prediction time not proportional w.r.t. number of trees in RF #681

System Information

Describe the problem

Minimal repro / logs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issues with prediction time not proportional w.r.t. number of trees in RF #681

Description

System Information

Describe the problem

Minimal repro / logs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions