Exercise: Random Forests

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "3789e639",
   "metadata": {
    "papermill": {
     "duration": 0.003431,
     "end_time": "2024-06-06T13:34:30.404080",
     "exception": false,
     "start_time": "2024-06-06T13:34:30.400649",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "**This notebook is an exercise in the [Introduction to Machine Learning](https://www.kaggle.com/learn/intro-to-machine-learning) course.  You can reference the tutorial at [this link](https://www.kaggle.com/dansbecker/random-forests).**\n",
    "\n",
    "---\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5eb6fa70",
   "metadata": {
    "papermill": {
     "duration": 0.002673,
     "end_time": "2024-06-06T13:34:30.409904",
     "exception": false,
     "start_time": "2024-06-06T13:34:30.407231",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "## Recap\n",
    "Here's the code you've written so far."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "51f72315",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-06-06T13:34:30.417561Z",
     "iopub.status.busy": "2024-06-06T13:34:30.417103Z",
     "iopub.status.idle": "2024-06-06T13:34:33.021127Z",
     "shell.execute_reply": "2024-06-06T13:34:33.019763Z"
    },
    "papermill": {
     "duration": 2.611373,
     "end_time": "2024-06-06T13:34:33.024139",
     "exception": false,
     "start_time": "2024-06-06T13:34:30.412766",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Validation MAE when not specifying max_leaf_nodes: 29,653\n",
      "Validation MAE for best value of max_leaf_nodes: 27,283\n",
      "\n",
      "Setup complete\n"
     ]
    }
   ],
   "source": [
    "# Code you have previously used to load data\n",
    "import pandas as pd\n",
    "from sklearn.metrics import mean_absolute_error\n",
    "from sklearn.model_selection import train_test_split\n",
    "from sklearn.tree import DecisionTreeRegressor\n",
    "\n",
    "\n",
    "# Path of the file to read\n",
    "iowa_file_path = '../input/home-data-for-ml-course/train.csv'\n",
    "\n",
    "home_data = pd.read_csv(iowa_file_path)\n",
    "# Create target object and call it y\n",
    "y = home_data.SalePrice\n",
    "# Create X\n",
    "features = ['LotArea', 'YearBuilt', '1stFlrSF', '2ndFlrSF', 'FullBath', 'BedroomAbvGr', 'TotRmsAbvGrd']\n",
    "X = home_data[features]\n",
    "\n",
    "# Split into validation and training data\n",
    "train_X, val_X, train_y, val_y = train_test_split(X, y, random_state=1)\n",
    "\n",
    "# Specify Model\n",
    "iowa_model = DecisionTreeRegressor(random_state=1)\n",
    "# Fit Model\n",
    "iowa_model.fit(train_X, train_y)\n",
    "\n",
    "# Make validation predictions and calculate mean absolute error\n",
    "val_predictions = iowa_model.predict(val_X)\n",
    "val_mae = mean_absolute_error(val_predictions, val_y)\n",
    "print(\"Validation MAE when not specifying max_leaf_nodes: {:,.0f}\".format(val_mae))\n",
    "\n",
    "# Using best value for max_leaf_nodes\n",
    "iowa_model = DecisionTreeRegressor(max_leaf_nodes=100, random_state=1)\n",
    "iowa_model.fit(train_X, train_y)\n",
    "val_predictions = iowa_model.predict(val_X)\n",
    "val_mae = mean_absolute_error(val_predictions, val_y)\n",
    "print(\"Validation MAE for best value of max_leaf_nodes: {:,.0f}\".format(val_mae))\n",
    "\n",
    "\n",
    "# Set up code checking\n",
    "from learntools.core import binder\n",
    "binder.bind(globals())\n",
    "from learntools.machine_learning.ex6 import *\n",
    "print(\"\\nSetup complete\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d7bf7d54",
   "metadata": {
    "papermill": {
     "duration": 0.002714,
     "end_time": "2024-06-06T13:34:33.030009",
     "exception": false,
     "start_time": "2024-06-06T13:34:33.027295",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "# Exercises\n",
    "Data science isn't always this easy. But replacing the decision tree with a Random Forest is going to be an easy win."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1da4921e",
   "metadata": {
    "papermill": {
     "duration": 0.002623,
     "end_time": "2024-06-06T13:34:33.035531",
     "exception": false,
     "start_time": "2024-06-06T13:34:33.032908",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "## Step 1: Use a Random Forest"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "fe771394",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-06-06T13:34:33.043309Z",
     "iopub.status.busy": "2024-06-06T13:34:33.042361Z",
     "iopub.status.idle": "2024-06-06T13:34:33.632528Z",
     "shell.execute_reply": "2024-06-06T13:34:33.631484Z"
    },
    "papermill": {
     "duration": 0.596363,
     "end_time": "2024-06-06T13:34:33.634681",
     "exception": false,
     "start_time": "2024-06-06T13:34:33.038318",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Validation MAE for Random Forest Model: 21857.15912981083\n"
     ]
    },
    {
     "data": {
      "application/javascript": [
       "parent.postMessage({\"jupyterEvent\": \"custom.exercise_interaction\", \"data\": {\"outcomeType\": 1, \"valueTowardsCompletion\": 1.0, \"interactionType\": 1, \"questionType\": 2, \"questionId\": \"1_CheckRfScore\", \"learnToolsVersion\": \"0.3.4\", \"failureMessage\": \"\", \"exceptionClass\": \"\", \"trace\": \"\"}}, \"*\")"
      ],
      "text/plain": [
       "<IPython.core.display.Javascript object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "<span style=\"color:#33cc33\">Correct</span>"
      ],
      "text/plain": [
       "Correct"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from sklearn.ensemble import RandomForestRegressor\n",
    "\n",
    "# Define the model. Set random_state to 1\n",
    "rf_model = RandomForestRegressor(random_state=1)\n",
    "\n",
    "# fit your model\n",
    "rf_model.fit(train_X, train_y)\n",
    "\n",
    "# Calculate the mean absolute error of your Random Forest model on the validation data\n",
    "melb_preds = rf_model.predict(val_X)\n",
    "\n",
    "rf_val_mae = mean_absolute_error(val_y, melb_preds)\n",
    "\n",
    "print(\"Validation MAE for Random Forest Model: {}\".format(rf_val_mae))\n",
    "\n",
    "# Check your answer\n",
    "step_1.check()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "dbcffdda",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-06-06T13:34:33.643418Z",
     "iopub.status.busy": "2024-06-06T13:34:33.642346Z",
     "iopub.status.idle": "2024-06-06T13:34:33.647310Z",
     "shell.execute_reply": "2024-06-06T13:34:33.646223Z"
    },
    "papermill": {
     "duration": 0.011627,
     "end_time": "2024-06-06T13:34:33.649555",
     "exception": false,
     "start_time": "2024-06-06T13:34:33.637928",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "# The lines below will show you a hint or the solution.\n",
    "# step_1.hint() \n",
    "# step_1.solution()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0a358d66",
   "metadata": {
    "papermill": {
     "duration": 0.003683,
     "end_time": "2024-06-06T13:34:33.656696",
     "exception": false,
     "start_time": "2024-06-06T13:34:33.653013",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "So far, you have followed specific instructions at each step of your project. This helped learn key ideas and build your first model, but now you know enough to try things on your own. \n",
    "\n",
    "Machine Learning competitions are a great way to try your own ideas and learn more as you independently navigate a machine learning project. \n",
    "\n",
    "# Keep Going\n",
    "\n",
    "You are ready for **[Machine Learning Competitions](https://www.kaggle.com/alexisbcook/machine-learning-competitions).**\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a35fdbdf",
   "metadata": {
    "papermill": {
     "duration": 0.003172,
     "end_time": "2024-06-06T13:34:33.663368",
     "exception": false,
     "start_time": "2024-06-06T13:34:33.660196",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "---\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "*Have questions or comments? Visit the [course discussion forum](https://www.kaggle.com/learn/intro-to-machine-learning/discussion) to chat with other learners.*"
   ]
  }
 ],
 "metadata": {
  "kaggle": {
   "accelerator": "none",
   "dataSources": [
    {
     "databundleVersionId": 111096,
     "sourceId": 10211,
     "sourceType": "competition"
    },
    {
     "datasetId": 11167,
     "sourceId": 15520,
     "sourceType": "datasetVersion"
    },
    {
     "datasetId": 2709,
     "sourceId": 38454,
     "sourceType": "datasetVersion"
    }
   ],
   "isGpuEnabled": false,
   "isInternetEnabled": false,
   "language": "python",
   "sourceType": "notebook"
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.13"
  },
  "papermill": {
   "default_parameters": {},
   "duration": 6.734578,
   "end_time": "2024-06-06T13:34:34.288444",
   "environment_variables": {},
   "exception": null,
   "input_path": "__notebook__.ipynb",
   "output_path": "__notebook__.ipynb",
   "parameters": {},
   "start_time": "2024-06-06T13:34:27.553866",
   "version": "2.5.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}