Feature transformation for supervised learning using equation discovery.
For motivation, background and results see the paper or the (much) longer thesis.
You need a Common Lisp compiler, like SBCL. To install SBCL with homebrew:
brew install sbcl
Create a link in the default ASDF systems definition folder to the hokuspokus module:
ln -s ~/projects/hokuspokus/ ~/common-lisp/
Start SBCL and load the hokuspokus
module:
sbcl
(require :asdf)
(asdf:load-system :hokuspokus)
Example run using the kepler` dataset:
(hokuspokus:with
(hokuspokus::in-file #p"data/kepler.arff")
(hokuspokus::operators '+ '- '* '/ 'sqrt)
(hokuspokus::pre-select :percent 0.25)
(hokuspokus::post-select :abs 4)
(hokuspokus::depth 3))
Look at the resulting feature space:
hokuspokus::*feature-space*
One of the resulting formulas can be simplified to Kepler's third law.
Kepler's Third Law: r^3/T^2
Matching formula:
(((r * r) * (r * r)) / ((t * r) * t))
Reduces to:
(r * r * r) / (t * t)
Data files are read and written in the Weka ARFF format.
To write the results of a run back to an ARFF file:
(hokuspokus:with
(hokuspokus::in-file #p"data/kepler.arff")
(hokuspokus::operators '+ '- '* '/ 'sqrt)
(hokuspokus::pre-select :percent 0.25)
(hokuspokus::post-select :abs 4)
(hokuspokus::out-file #p"data/kepler.processed.arff")
(hokuspokus::depth 3))
Only numerical, non-sparse data sets are supported.