Template Repo for ML Project

This template repo will give you a good starting point for your second project. Besides the files used for creating a virtual environment, you will find a simple example of how to build a simple model in a python script. This is maybe the simplest way to do it. We train a simple model in the jupyter notebook, where we select only some features and do minimal cleaning. The output is then stored in simple python scripts.

The data used for this is: coffee quality dataset.

Requirements and Environment

Requirements:

pyenv with Python: 3.9.8

Environment:

For installing the virtual environment you can either use the Makefile and run make setup or install it manually with the following commands:

pyenv local 3.9.8
python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

Usage

In order to train the model and store test data in the data folder and the model in models run:

#activate env
source .venv/bin/activate

python example_files/train.py

In order to test that predict works on a test set you created run:

python example_files/predict.py models/linear_regression_model.sav data/X_test.csv data/y_test.csv

Limitations

Development libraries are part of the production environment, normally these would be separate as the production code should be as slim as possible.

Variable definitions

Client:

Client_id: Unique id for client District: District where the client is Client_catg: Category client belongs to Region: Area where the client is Creation_date: Date client joined Target: fraud:1 , not fraud: 0 Invoice data

Client_id: Unique id for the client Invoice_date: Date of the invoice Tarif_type: Type of tax Counter_number: Counter_statue: takes up to 5 values such as working fine, not working, on hold statue, ect Counter_code: Reading_remarque: notes that the STEG agent takes during his visit to the client (e.g: If the counter shows something wrong, the agent gives a bad score) Counter_coefficient: An additional coefficient to be added when standard consumption is exceeded Consommation_level_1: Consumption_level_1 Consommation_level_2: Consumption_level_2 Consommation_level_3: Consumption_level_3 Consommation_level_4: Consumption_level_4 Old_index: Old index New_index: New index Months_number: Month number Counter_type: Type of counter

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
data		data
example_files		example_files
models		models
.gitignore		.gitignore
EDA-and-modeling.ipynb		EDA-and-modeling.ipynb
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
fraud_detection.ipynb		fraud_detection.ipynb
fraud_detection_daniel.ipynb		fraud_detection_daniel.ipynb
fraud_detection_eda_daniel.ipynb		fraud_detection_eda_daniel.ipynb
fraud_detection_final.ipynb		fraud_detection_final.ipynb
fraud_detection_slides.pdf		fraud_detection_slides.pdf
fraud_detection_tom.ipynb		fraud_detection_tom.ipynb
predict.py		predict.py
requirements.txt		requirements.txt
results.md		results.md
train.py		train.py
xgboost.pkl		xgboost.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Template Repo for ML Project

Requirements and Environment

Usage

Limitations

Variable definitions

About

Releases

Packages

Languages

License

tomduese/ds-fraud-detection

Folders and files

Latest commit

History

Repository files navigation

Template Repo for ML Project

Requirements and Environment

Usage

Limitations

Variable definitions

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages