Skip to content

Machine learning library, built ground-up in Python

Notifications You must be signed in to change notification settings

therealpeterhua/Cortex

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 
 
 

Repository files navigation

##Cortex

A classification/regression tool written in the snake tongue. Implements a neural network, and logistic + linear regression, using no external libraries.

Performs batch gradient descent across all techniques, and employs sigmoid activation for the NN and logistic reg. Bold driver heuristic adjusts learning rate on the fly based on gradient descent performance (achieving multiples of efficiency gains in some cases). Momentum term helps alleviate hangups on local minima and reaches convergence in fewer epochs. Regularization factor helps address overfit by compressing large weights.

Vectorized implementations have been implemented in Octave, and will be ported to NumPy eventually. In the meantime, please enjoy some for loops and list comprehensions.

###Neural Network ####Example: Training the XOR function...

from cortex import NeuralNet

data = [
    {'input': [1, 0], 'output': [1]},     # also accepts as [1, 0, 1]
    {'input': [0, 1], 'output': [1]},     # also accepts as [0, 1, 1]
    {'input': [0, 0], 'output': [0]},
    {'input': [1, 1], 'output': [0]},
]

options = {
    'hidden_sizes': [3, 2],
    'log_progress': True
}

net = NeuralNet()       # also accepts `net = NeuralNet(data)`
net.load_data(data)
net.train(options)

print net.run([0, 1])     # 0.979
print net.run([1, 1])     # 0.029

NOTE: Occasionally, your neural nets may return higher error results than anticipated. If so, try training the network again. Batch descent is sensitive to initial conditions and can hang on local minima, but each training call will randomize the starting weights.

#####Guidelines: Handles any number of input features of any size, and any number of output elements between 0 and 1 (preferably binary -- 0 or 1). Trains multi-class scenarios via multiple-element output vectors (ie. [1, 0, 0], [0, 1, 0], [0, 0, 1] representing Class I, Class II, Class III). This multi-class output API will be abstracted away in the future so you can just use unique integers and strings to represent different classes. Handles row vectors where last element is the output, [x1, x2, y], as well as the {'input': [x1, x2], 'output': [y]} format used in our neural net. Formatting must be consistent throughout training data.

#####Optional Model Parameters (options dict in XOR example)

  • hidden_sizes: Sets the hidden node architecture using a list. Model will have len(hidden_sizes) hidden layers, with each element being the size of its corresponding layer. [2, 3, 4] creates 3 hidden layers of with 2, 3, and 4 nodes respectively. Uses reasonable defaults otherwise. The more hidden layers / nodes per layer, the lower the final training error (generally), and the more computationally expensive the training process.
  • learn_rate: Determines how aggressively gradient descent runs (default 0.25). Setting too low will result in less progress made per iteration (and longer processing time). Setting too high may result in "overshooting" the optimum, or a divergent learning process.
  • error_threshold: The maximum acceptable average error of the model (default 0.05). The learning process will conclude once errors are below the threshold or max_iters has been reached, whichever comes first.
  • max_iters: The maximum # of iterations to be performed before concluding (default 10000).
  • epsilon: Determines range of initialization weights between (-epsilon, +epsilon). Defaults to 2.
  • reg_rate: Governs regularization term (default 0). Setting this to a high value squashes high weights (esp. when fitting high-order features), but tends the system of weights toward 0. Test different values with your dataset -- the ideal regularization rate may be on the order of magnitude of 0.0001, or single-digit integers.
  • momentum: Value >= 0 (default 0.1). Set to 0 to turn off, setting too high may cause one to "overshoot" the minimum. Dramatically speeds up the descent process in "ravine"-style cost surfaces (think low slope, steep sides).
  • log_progress: Boolean value determining whether to log progress (default False).
  • log_interval: Numerical value (default 1000). Model will log relevant stats every log_interval iterations.

###Linear Regression ####Example: Training y = 2 + 4(x1) + 3(x2) function...

from cortex import LnrReg

data = [
  {'input': [2, 3], 'output': [19]},     # x1 = 2, x2 = 3, y = 19
  {'input': [1, 1], 'output': [9]},      # also accepts as [1, 1, 9]
  {'input': [-5, 2], 'output': [-12]},
  {'input': [3, -4], 'output': [2]}
]

options = {'log_progress': True}

regression = LnrReg()         # also accepts `regression = LnrReg(data)`
regression.load_data(data)
regression.train(options)

print regression.run([2, 2])        # 15.999

#####Guidelines: Handles any number of input features of any size. For now, only supports output of 1 element. Handles row vectors where last element is the output, [x1, x2, y], as well as the {'input': [x1, x2], 'output': [y]} format used in our neural net. Formatting must be consistent throughout training data. The model will log a theta vector at the conclusion of training, which can corresponds to the "weights" of each respective input (with the first weight being the bias, or intercept, value).

#####Optional Model Parameters (options dict in example)

  • threshold: Instead of using an error_threshold like in neural nets, linear regression uses a convergence threshold (default 0.00001). If the difference between the errors of 2 successful gradient descents are below the threshold, the learning process will conclude.
  • max_iters: Same concept as in neural net (default 50000). The learning process will stop once max_iters epochs are reached, unless it has already converged.
  • learn_rate: Same concept as neural net (default 0.01). You shouldn't need to adjust this if you leave momentum on.
  • use_driver: Boolean value determining whether to dynamically scale learning rate (default True). Strongly recommended you keep this on. Nothing wrong with turning it off, but greatly speeds up the descent process.
  • quick_factor: If use_driver is on, learning rate is multiplied by this factor every time a successful gradient descent is successful (default 1.1).
  • brake_factor: If use_driver is on, learning rate is multiplied by this factor every time gradient descent fails (default 0.6). Note that in this case, weights will be "rewound" and gradient descent will be attempted anew with the new learing_rate.
  • log_progress: Same as in neural net (default False).
  • log_interval: Same as in neural net (default 2000).

###Logistic Regression ####Example: Training the x1 = x2 decision boundary...

from cortex import LogReg

data = [
    [1, 0.9, 0],      # x1 = 1, x2 = 0.9, y = 0
    [5, 4, 0],        # also accepts as {'input': [1, 0.9], 'output': [0]}
    [6, 1, 0],
    [8, 7, 0],
    [1, 3, 1],
    [1.1, 1.3, 1],
    [5, 6, 1],
    [6, 6.1, 1],
    [4, 4.5, 1]
]

regression = LogReg(data)
regression.train()

print regression.run([1, 0.5])      # 0.01
print regression.run([5, 4])        # 0.00
print regression.run([9, 10])       # 1.00
print regression.run([10, 15])      # 1.00

#####Guidelines: Handles any number of input features of any size. For now, only supports output of 1 element. For now, only supports output of 1 element, of discrete value 0 or 1. Will support multi-class in the future. Also supports both [x1, x2, y] and {'input': [x1, x2], 'output': [y]} formats. Formatting must be consistent throughout training data. As with linear regression, the model will log a theta vector at the conclusion of training.

#####Optional Model Parameters Same API as linear regression, hallelujah.

###TODOs:

  • Dry up boilerplate setup code into separate module
  • Pruning for neural nets to "trim" redundant nodes
  • More intelligent setting of epsilons
  • Serialization of weights, allowing user to save and resume work on large data sets
  • Prettify multi-class learning for ANN by vectorizing user-given output number into 1s and 0s.
  • Improve ANN performance, consider islice() and fewer comprehensions, 1 sweep through the set per epoch (vs. on error & gradient calc)
  • Add normalization
  • Add regularization term to regressions
  • More extensive error handling

About

Machine learning library, built ground-up in Python

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages