Thera Bank is focused on converting liability customers (depositors) into personal loan customers while retaining them as depositors. The goal of this project is to build a predictive model that identifies potential customers who are more likely to accept personal loan offers. This will enable the bank to increase the success ratio of their campaigns and reduce costs.
The management at Thera Bank is interested in expanding its asset customer base (borrowers) by converting existing liability customers into personal loan customers. Last year’s campaign showed a conversion rate of 9.6%, and the retail marketing department aims to improve this by targeting customers more effectively. The dataset provided contains demographic and financial information for 5,000 customers, along with their response to the previous loan campaign.
Understand the Dataset: Analyze and understand the attributes and their distributions. Target Column Analysis: Examine the distribution of the target variable (Personal Loan) to understand the response rate. Model Building: Develop predictive models using Logistic Regression, K-Nearest Neighbors (K-NN), and Naïve Bayes. Model Evaluation: Evaluate the models using confusion matrices and determine the best-performing model.
The dataset Bank.xls consists of 5000 customers with the following key attributes:
Demographic Information: Age, Income, etc. Banking Relationship: Mortgage, Securities Account, etc. Previous Campaign Response: Personal Loan (Target Variable).
- Data Understanding and Distribution Analysis Column Descriptions: Understand each attribute, including demographic and relationship attributes. Data Distribution: Analyze the distribution of data for each attribute (e.g., age, income, etc.). Findings: Summarize key observations from the distribution, such as skewness, presence of outliers, and correlations.
- Target Column Distribution Analysis of Target Variable: Examine the distribution of the target variable (Personal Loan). Comments: Discuss the imbalance in the target variable, where only 9.6% of customers accepted the loan.
- Data Splitting Training and Test Set: Split the dataset into training (70%) and test (30%) sets. Rationale: Explain why a 70:30 split is chosen for this analysis.
- Model Building Logistic Regression: Build a model using Logistic Regression. K-Nearest Neighbors (K-NN): Build a model using K-NN. Naïve Bayes: Build a model using Naïve Bayes.
- Model Evaluation Confusion Matrix: Generate and analyze confusion matrices for each model. Logistic Regression: Provide the confusion matrix and discuss the performance. K-NN: Provide the confusion matrix and discuss the performance. Naïve Bayes: Provide the confusion matrix and discuss the performance.
- Model Comparison and Selection Best Model Selection: Discuss which model performed the best based on accuracy, precision, recall, and F1 score. Reasoning: Justify why the selected model outperforms the others in predicting loan acceptance, considering factors like model complexity, interpretability, and performance metrics.
Summarize the key findings of the analysis and the benefits of using the selected model for future campaigns. Highlight the potential business impact and any recommendations for the bank's marketing strategy.