A web-based application for anyone who wants to learn and create machine learning model without coding.
CodelessML presents a general workflow for easing the process of creating a machine learning model and using the ML model for prediction. This application aims to help non-expert users experience the journey of how a machine learning model is built without requiring them to code. This application has three main menus: EDA, Modelling, and Prediction.
https://codelessml.streamlit.app/ For optimal performance, we recommend running CodelessML on your local machine.
We’re excited to hear your thoughts about CodelessML, our tool designed to simplify the machine learning process. This project is solely for educational purposes and aims to make it easier for everyone to learn about machine learning. Your feedback is crucial for us to enhance its features and usability. Please take a moment to complete our short survey. It will only take a few minutes and will greatly assist us in making CodelessML even better for you and future users.
https://forms.gle/iYJHgDeAWrCaTVCV6
Thank you for your time and valuable input :)
In the following you will find how they are supposed to be used.
- The EDA menu is used to explore and performs automatic visualisation of any dataset without writing a single code. Just upload your data in a compatible file format (csv or Excel).
- The Modelling menu is used to create machine learning models; you need to select the type of task you want to perform: Classification or Regression. This menu also performs data processing and perform an evaluation of all 9 ML algorithms available in this application. For now, only default data processing, such as dropping columns that have equal or more than 40% of missing values, dropping columns that are 100% unique, inputting missing values with their mean if the data type is number or mode if the data type is a string is available. These steps are necessary because most machine learning algorithms can't deal with missing values and are automatically executed when you click submit button on the Modelling menu.
- The Prediction menu is used to predict new data for which you do not know the target using a trained model.
- Explore and understand your data using the EDA menu (e.g., descriptive statistics, data visualisation, correlation matrix, etc.)
- Go to the Modelling menu. [Modelling - Classification] for the classification task, and [Modelling – Regression] for the regression task. Choose the menu that best suit your problem then select the target variable and determine the ratio of data that will be used for training and testing your model. click submit, and the application will automatically perform data processing and training ML models.
- Determine which model you want to save and Click “Select model to download”. You will get model.joblib and encoder.pkl as well as the original data that have been encoded with label encoder inside a zipped folder.
- Use the model.joblib and encoder.pkl to predict your new data in the “Prediction Menu” and save the result by clicking the “Downlaod Output File” button.
Name | Reference |
---|---|
Logistic Regression | [1] |
Linear SVC | [1] |
K-Nearest Neighbor | [1] |
Multinomial Naive Bayes | [1] |
Decision Tree | [1] |
Random Forest | [1] |
Gradient Boosting | [1] |
LightGBM | [2] |
XGBoost | [3] |
Name | Reference |
---|---|
Linear Regression | [1] |
Support Vector Regression | [1] |
K-Nearest Neighbor | [1] |
Elastic Net | [1] |
Passive Aggressive Regressor | [1] |
Random Forest | [1] |
Gradient Boosting | [1] |
LightGBM | [2] |
XGBoost | [3] |
This application is made using Streamlit, to deploy you can clone this repo and follow this official guide
To deploy on local machine, you can use Anaconda and import this environment. Once your environment is ready, download this repo and run this following command
cd \to your CodelessML directory\
streamlit run About.py
On the EDA menu, when you tick the “categorical data distribution”, this error message appears: AttributeError: 'int' object has no attribute 'astype' , delete this following code from lines 65 and 66:
.astype(int).astype(object)
The following datasets were used for the development and testing of this application:
[1] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825–2830.
[2] Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., & Liu, T.Y. (2017). Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems, 30, 3146–3154.
[3] Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794). ACM.
[4] Aeberhard,Stefan & Forina,M.. (1991). Wine. UCI Machine Learning Repository. https://doi.org/10.24432/C5PC7J.
[5] Schlimmer,Jeffrey. (1987). Automobile. UCI Machine Learning Repository. https://doi.org/10.24432/C5B01C.