Skip to content

This project involves building a machine learning model to predict airfare prices based on various features such as airline, date of journey, source, destination, and duration. The dataset used is sourced from Kaggle, and the notebook demonstrates data preprocessing, feature engineering, model training, and evaluation

Notifications You must be signed in to change notification settings

ahmedatia456123/Predicting-Flight-Prices

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

Flight Ticket Price Prediction by Machine Learning and Exploratory Data Analysis (EDA)

Project Overview

This project aims to predict flight ticket prices using various machine learning algorithms and comprehensive exploratory data analysis (EDA). The dataset used in this project is sourced from Kaggle, and the objective is to build a model that can accurately forecast the price of flight tickets based on multiple features.

Objectives

  • Perform extensive exploratory data analysis (EDA) to understand the data distribution and feature relationships.
  • Preprocess the data to handle missing values, encode categorical variables, and scale numerical features.
  • Implement and compare different machine learning algorithms to identify the best-performing model.
  • Fine-tune the chosen model to achieve optimal performance.
  • Evaluate the model's performance using appropriate metrics.

Skills Demonstrated

  • Data Wrangling and Preprocessing: Cleaning, transforming, and preparing the data for analysis and modeling.
  • Exploratory Data Analysis (EDA): Visualizing and interpreting data to uncover insights and relationships.
  • Feature Engineering: Creating new features to enhance model performance.
  • Machine Learning Algorithms: Implementing and comparing multiple algorithms, including Linear Regression, Decision Trees, Random Forest, and Gradient Boosting.
  • Model Evaluation and Tuning: Using metrics like RMSE, MAE, and R² to evaluate models and applying hyperparameter tuning for optimization.
  • Data Visualization: Utilizing libraries such as Matplotlib, Seaborn, and Plotly for insightful visualizations.

Project Workflow

1. Data Collection and Loading

The dataset was imported and loaded into a Pandas DataFrame for initial examination and preprocessing.

2. Exploratory Data Analysis (EDA)

  • Univariate Analysis: Analyzed the distribution of individual features.
  • Bivariate Analysis: Explored relationships between pairs of features and the target variable.
  • Multivariate Analysis: Investigated complex interactions between multiple features.
  • Visualization: Used Matplotlib, Seaborn, and Plotly to create plots such as histograms, box plots, scatter plots, and heatmaps.

3. Data Preprocessing

  • Handling Missing Values: Imputed missing values using appropriate techniques.
  • Encoding Categorical Variables: Applied One-Hot Encoding to convert categorical features into numerical format.
  • Feature Scaling: Normalized numerical features using StandardScaler.

4. Feature Engineering

Created new features based on domain knowledge to improve model performance. For instance, extracted day, month, and year from the date features.

5. Model Building and Evaluation

  • Model Selection: Implemented multiple machine learning algorithms, including Linear Regression, Decision Trees, Random Forest, and Gradient Boosting.
  • Model Evaluation: Evaluated models using metrics like RMSE, MAE, and R².
  • Model Tuning: Applied hyperparameter tuning techniques such as Grid Search and Random Search to optimize model performance.

6. Model Deployment

The final model was saved and prepared for deployment to predict flight ticket prices on new, unseen data.

7. Conclusion and Insights

Summarized the findings and insights gained from the analysis and modeling process. Highlighted the best-performing model and its practical implications.

Results

  • Best Model: The Random Forest Regressor outperformed other models with the lowest RMSE and highest R² score.
  • Performance Metrics: Achieved an RMSE of 0.11, MAE of 0.069, and R² of 0.94 on the test set. Image

Technologies and Tools Used

  • Programming Language: Python
  • Libraries: Pandas, NumPy, Scikit-learn, Matplotlib, Seaborn, Plotly
  • Jupyter Notebook: For interactive analysis and visualization

Contact

For any questions or collaboration opportunities, feel free to reach out via LinkedIn or Email.

About

This project involves building a machine learning model to predict airfare prices based on various features such as airline, date of journey, source, destination, and duration. The dataset used is sourced from Kaggle, and the notebook demonstrates data preprocessing, feature engineering, model training, and evaluation

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published