Monday, August 29, 2016

Basic Mathematics required for Feed-Forward Artificial Neural Networks

As a child I was always pulling things apart to "see how they worked".

The best way to understand how a Neural Network produces output signals from its inputs is to take a look at a single Neuron in isolation, and see how that operates.

The neuron has a single output, whose value we seek to calculate, and it also has several inputs.
For now, we'll disregard what the inputs and outputs are connected to, suffice to know that these are represented by simple floating-point values (ideally, in the Normal range, from 0 to 1.0)
Associated with each input signal is a Weight (also a Normal float), which is analogous to electrical resistance per input channel. Importantly, the Input Weights are internally held by the neuron, while the input values are external values presented to the neuron's input channels.
Neurons also have one additional input connector known as the "Bias" input, which is typically connected directly to a constant input value known as the Bias Threshold.

The most important calculation that we need to know is called the Transfer Function, which yields a value known as the Activation Sum for a given neuron. Given the set of input weight values for a neuron, and given a set of input signal values, we simply perform a per-component multiplication, and then add the resulting values, so basically a Dot Product operation for N dimensions, where N is the number of inputs on the neuron.Finally, we account for bias.
So, for weights W and inputs I, the sum is W[0] * I[0] + W[1] * I[1] + W[2] * I[2] + .... +Bias 

Next we apply what is known as an Activation Function, and there's a few choices.
I won't bother going into these because they all produce similar results. The purpose of such a function is to squish our Activation Sum back into an appropriate numerical range, rather like a Normalize or other scaling operation. The one I prefer right now is called Sigmoid, after the name of the curve it produces when plotted on a graph. Here's the function for reference:

float Sigmoid(float netinput) {
    return ( 1 / ( 1 + exp(-netinput / NeuralResponseThreshold)));
}

where the "Response Threshold" constant describes the level required to "turn on" the output of any neuron, usually something like 0.95 - if you have a background in electronics, compare this to CMOS and TTL logic switching thresholds.

Once we've shoved our Activation Sum through that function, we have the output value for the neuron. Typically in our software models, each neuron only has one such output, and due to the Activation Function, its value is typically in the Normal range, just like the input values and weights..

Neuron networks arrange their neurons in layers.
The first layer in a neural network is the input layer.
This is the network's interface with the external environment - we plug in external values here, something I'll talk about later.
There's no neuron on this layer, it's just an array of floats provided from external sources.

Behind the input layer is the first real layer of neurons.
If we look at just one neuron on this layer, each of its inputs is connected to one of the values in the Input Layer. That is to say, each neuron on this layer has the same number of input connectors as there are in the layer before it. We can say that the layer of neurons is 'fully-connected' to the previous layer. The outputs of the neurons on this layer are treated as inputs to the next layer of neurons, should further layers exist. Layers whose neurons are connected only to other neurons are called 'hidden layers'.

The outputs of the final layer of neurons forms the set of outputs for the entire network.
If you've been following, and this topic is new to you, then you should be starting to realize how the network outputs are calculated.

In order to 'run' a neural network, we simply shove floating point values into its inputs, and then solve the network layers one at a time, calculating all the activation sums for the neurons on the current layer, then handing their outputs to the next layer's inputs, solving that layer and so on, until the outputs are obtained.

Making a neural network run is actually not hard at all, and to be honest, training a neural network via traditional back-propagation methods such as linear descent is also actually quite easy.
In the next post I will briefly discuss the math and methodology for training neural networks using test data before discussing an alternative which has very different requirements, strengths and weaknesses.


No comments:

Post a Comment