- Started to implement fundamental algorithms and data structures to recap and enhance my python skills. The software packages should do their jobs -- I tested them on small scales. However, this is more of a ''practice notebook'' for me.
- Besides recapping algorithms and data structures, I also code granularized and object-oriented, to exercise a good coding habit, as well.
Binarized Search-Trees help to find objects rapidly.
They are realized to work with key-value-pairs
.
That is, each object is a value and can be addressed by a unique key.
This results in an interface, which is similar to the dictionaries (dict
) data structure in python.
However, the on-board python dictionaries are implemented as hash-maps.
I overwrote magic methods like __getitem__()
or __setitem__()
which enables writing and
reading elements in a tree t
with t[key] = value
, making them behave like dictionaries
as close as possible.
Splitting a tree according to the highest information gain possible.
This is done via mutual information. Mutal Information (MI) determines the amount of dependency between two random variables. It estimates the amount of information gained on an unknown secondary variable by observing the primary variable. In the concrete implementation the observed primary variable(s) is (are) features. From that the probability of the classes, i.e., the secondary variable, is calculated.
The module InformationGain/informationgain.py
can be used on its own to analyze
custom problems on mutual information.
Additionally, the module InformationGain/random_tree.py
can be used to build
a tree estimator, which finds its decisions during training based on this
mutual information implementation.
Lastly, a random forest can be built by having an ensemble of (decorrelated) random trees.
Sudoku is used as example for backtracking
.
Backtracking is a method to solve tasks by ``trial and error''.
That means a partial solution that is valid at the current state is proposed.
Only if a later state can not proceed the solution is updated to the next valid partial
solution.
The method terminates if the entire problem set is solved correctly -- or if no solution for the entire process can be found. In latter case the concrete implementation returns an error indicating that it is an ill-posed problem.