Sunday, September 18, 2016

Supervised Machine Learning: BackPropagation by Linear Descent


Neural networks are really interesting to me because it can be proven that they can essentially solve any problem that can be expressed in terms of numbers - that for any problem, there exists a neural network which can solve it.

In my opinion, there are three kinds of problems - those which we can easily define, those which contain some unknowns, and those which we know nothing about and cannot clearly define.
Neural networks can solve them all!

Before we get too excited, it's time to learn about traditional BackPropagation learning for Feed-Forward Artificial Neural Networks. This is a "training method" - it works for problems that fall into the first category - if we know what we're trying to do, and have a means to quickly and cleanly evaluate whether a given network output is good, or bad, and how bad, then we're in business, we can use this training method for things like pattern recognition (Optical Character Recognition, Voice Recognition, Image Recognition, Mouse Gestures, so on and so forth) or just telling an elevator door when it should open and close. There are millions of real-world situations where a neural network could be BackProp-Trained to perform some specific task, and it should be noted that BackProp-training tends to produce networks that are experts at specific tasks assuming specific inputs, and they do not tend to generalize very well, although they do deal well with very noisy input data.

This technique can only work if we have access to some "training data", and a function which can give us a "fitness score" that tells us, for a given set of environmental inputs (applied to our neural network), how close the outputs of the neural network are to our expectations.

The training data consists of many sets of individual test data.
Each individual test data contains one set of neural input data, and a matching set of expected output data.

One by one, we "show" the neural network our test input data, and measure the error term at each neural output, which I'll define shortly. Therefore, we should now have an array of error terms, one for each neuron in the output array (since each output in the neural layer is connected to exactly one neuron in the last layer of the network).


Before I get into the math, it might be helpful to some readers to learn this stuff in the same way that I did (since I am rather dyslexic, and can't easily grasp things like Greek Notation).

In order to perform traditional backpropagation training, basically, we need to perform two passes over our network. One to measure error terms, and another to adjust the weights. There are a few ways to do that, the most popular are variations on something called Linear Descent: basically, it works by determining how much each input is to blame for the error term at a neuron's output, while accounting for weight, and apportions an adjustment to each input weight in order to tend the output toward the ideal (represented by training data). Typically, the correction term is very small, such as to not greatly disturb the network, lest it become unreliable for previously trained inputs.

During the first pass, we'll go backwards through the layers, calculating the error term for each neuron based partly on the error terms we already calculated for the next outermost layer.
Once we have an error term for every neuron in the entire network, we'll perform a second pass, this time we're going to actually adjust the weights themselves based on the surrounding error terms.

Oh - and when I do get into the math, I will be brushing over a few things that are not relevant unless you happen to love math, in which case you can investigate them on your own time :)


No comments:

Post a Comment