ID3

Implementation of ID3 algorithm in Python

Thomas MARTIN, Victor CAVERO, Séraphin HENRY

Description

In the CLI mode, we have a CSV file which is already filled with data. We use it to generate the tree using the ID3 algorithm. The CLI will ask questions to the user, until a point when it announces which result was predicted. The user has the ability to confirm, or not, the predicted result.

The web app works the same, except that the data is stored in a postgres SQL database.

If the predicted result is confirmed by the user, then it's all fine. If not, then the CLI asks the user to enter some more additional data.

This can be used through CLI but it wil also be deployed as a webApp

It is composed of a client made with React, two microservices (py-id3 made with Python, and pg-controller made with Go), as well as a Postgres SQL Database.

The React client interacts with these two microservices to retrieve a decision tree (py-id3) and post data to the database (pg-controller).

Documentation regarding py-id3, the ID3 tree generator exposed as a REST API

Documentation regarding pg-controller, the postgres db controller, exposed as a REST API

The two microservices are exposed as REST APIs, fully open, because we prefered focusing on ID3 rather than security for now. However, the admin access to the postgres database is fully secured, by using docker secrets and docker swarm.

Here you can find the database model that the webApp uses : (link)

The apps runs at https://furio.team and has been deployed thanks to Docker and Docker Swarm.

Webapp dev mode

Run : docker-compose -f docker-compose.base.yml -f docker-compose.dev.yml up and cd client && yarn start

ID3 algorithm

To build our tree we follow a few key steps :

Read the .csv to stock it in a variable df
To calcule the entropy :
- Isolate the column result to count the occurences of yes and no
- Calculate general entropy
To build the tree we use the function buildTree()
- Find the attribute with the best information gain find_winner()
- Get distinct values of this attribute
- If an attribute only have yes or only no, then it becomes a leaf and is marked by yes or no
- If an attribute have yes and no, then the algorithm call the function buildTree() recursively with the subtable.

Functions explained

Function	Parameters	Description
`calc_df_entropy`	`df`: records of the .csv ; `attribute`: stock result unique values	Calculates the entropy of the whole dataset
`calc_subtable_entropy`	`df`: dataset of a subtable	Calculate the entropy of a subtable
`calc_subtable_attribute_entropy`	`df`: dataset of a subtable, `attribute`: unique attribute of a subtable	Calculates the entropy of a subtable for an unique attribute
`find_best_attribute`	`df`: dataset of a subtable	Find the attribute wich give the most information
`get_subtable`	`df`: dataset of a subtable ; `attribute`: which attribute we will explore ; `value`: data	Filters the dataset to get only the explored branch
`build_tree`	`df`: dataset explored	Build the tree stocked in a variable `tree[][]`
`generate_decision_tree`	`csv_filename` : Name of the .csv which provide data	General function to read the csv, calculate the entropy and build the tree to return the final tree

Run the thing in CLI mode

git clone the project (or dowload and unzip the archive)
cd inside of it
cd py-id3

and

Run locally

You'll need pipenv, unless you're ready to install the dependencies by hand.

Install pipenv

pip install pipenv

Install the dependencies

pipenv install and pipenv shell

Run !

python3.8 main.py or pipenv run python3.8 main.py

... or with docker

docker build -f Dockerfile.cli -t id3py .

Then, if you're using Linux/MacOS/Unix :

docker run -v $(pwd)/db:/app/db -it id3py

Or if you're using Windows :

docker run -v ${PWD}/db:/app/db -it id3py

The data.csv file in the db folder will be overwritten when the program estimates that it's necessary to update the db.

You can also specify another folder, which contains a file named data.csv that you want to use :

docker run -v PATH_TO_YOUR_FOLDER:/app/db -it id3py

Use your own knowledge base

For now, the program uses the file called db/data.csv as a knowledge base. You can use yours, as long as the column for the output is called result, and that it contains yes and no.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

ID3

Description

This can be used through CLI but it wil also be deployed as a webApp

Webapp dev mode

ID3 algorithm

Functions explained

Run the thing in CLI mode

Run locally

Install pipenv

Install the dependencies

Run !

... or with docker

Use your own knowledge base

Files

README.md

Latest commit

History

README.md

File metadata and controls

ID3

Description

This can be used through CLI but it wil also be deployed as a webApp

Webapp dev mode

ID3 algorithm

Functions explained

Run the thing in CLI mode

Run locally

Install pipenv

Install the dependencies

Run !

... or with docker

Use your own knowledge base