Sunday, September 18, 2016

Supervised Machine Learning: BackPropagation by Linear Descent


Neural networks are really interesting to me because it can be proven that they can essentially solve any problem that can be expressed in terms of numbers - that for any problem, there exists a neural network which can solve it.

In my opinion, there are three kinds of problems - those which we can easily define, those which contain some unknowns, and those which we know nothing about and cannot clearly define.
Neural networks can solve them all!

Before we get too excited, it's time to learn about traditional BackPropagation learning for Feed-Forward Artificial Neural Networks. This is a "training method" - it works for problems that fall into the first category - if we know what we're trying to do, and have a means to quickly and cleanly evaluate whether a given network output is good, or bad, and how bad, then we're in business, we can use this training method for things like pattern recognition (Optical Character Recognition, Voice Recognition, Image Recognition, Mouse Gestures, so on and so forth) or just telling an elevator door when it should open and close. There are millions of real-world situations where a neural network could be BackProp-Trained to perform some specific task, and it should be noted that BackProp-training tends to produce networks that are experts at specific tasks assuming specific inputs, and they do not tend to generalize very well, although they do deal well with very noisy input data.

This technique can only work if we have access to some "training data", and a function which can give us a "fitness score" that tells us, for a given set of environmental inputs (applied to our neural network), how close the outputs of the neural network are to our expectations.

The training data consists of many sets of individual test data.
Each individual test data contains one set of neural input data, and a matching set of expected output data.

One by one, we "show" the neural network our test input data, and measure the error term at each neural output, which I'll define shortly. Therefore, we should now have an array of error terms, one for each neuron in the output array (since each output in the neural layer is connected to exactly one neuron in the last layer of the network).


Before I get into the math, it might be helpful to some readers to learn this stuff in the same way that I did (since I am rather dyslexic, and can't easily grasp things like Greek Notation).

In order to perform traditional backpropagation training, basically, we need to perform two passes over our network. One to measure error terms, and another to adjust the weights. There are a few ways to do that, the most popular are variations on something called Linear Descent: basically, it works by determining how much each input is to blame for the error term at a neuron's output, while accounting for weight, and apportions an adjustment to each input weight in order to tend the output toward the ideal (represented by training data). Typically, the correction term is very small, such as to not greatly disturb the network, lest it become unreliable for previously trained inputs.

During the first pass, we'll go backwards through the layers, calculating the error term for each neuron based partly on the error terms we already calculated for the next outermost layer.
Once we have an error term for every neuron in the entire network, we'll perform a second pass, this time we're going to actually adjust the weights themselves based on the surrounding error terms.

Oh - and when I do get into the math, I will be brushing over a few things that are not relevant unless you happen to love math, in which case you can investigate them on your own time :)


Monday, August 29, 2016

Basic Mathematics required for Feed-Forward Artificial Neural Networks

As a child I was always pulling things apart to "see how they worked".

The best way to understand how a Neural Network produces output signals from its inputs is to take a look at a single Neuron in isolation, and see how that operates.

The neuron has a single output, whose value we seek to calculate, and it also has several inputs.
For now, we'll disregard what the inputs and outputs are connected to, suffice to know that these are represented by simple floating-point values (ideally, in the Normal range, from 0 to 1.0)
Associated with each input signal is a Weight (also a Normal float), which is analogous to electrical resistance per input channel. Importantly, the Input Weights are internally held by the neuron, while the input values are external values presented to the neuron's input channels.
Neurons also have one additional input connector known as the "Bias" input, which is typically connected directly to a constant input value known as the Bias Threshold.

The most important calculation that we need to know is called the Transfer Function, which yields a value known as the Activation Sum for a given neuron. Given the set of input weight values for a neuron, and given a set of input signal values, we simply perform a per-component multiplication, and then add the resulting values, so basically a Dot Product operation for N dimensions, where N is the number of inputs on the neuron.Finally, we account for bias.
So, for weights W and inputs I, the sum is W[0] * I[0] + W[1] * I[1] + W[2] * I[2] + .... +Bias 

Next we apply what is known as an Activation Function, and there's a few choices.
I won't bother going into these because they all produce similar results. The purpose of such a function is to squish our Activation Sum back into an appropriate numerical range, rather like a Normalize or other scaling operation. The one I prefer right now is called Sigmoid, after the name of the curve it produces when plotted on a graph. Here's the function for reference:

float Sigmoid(float netinput) {
    return ( 1 / ( 1 + exp(-netinput / NeuralResponseThreshold)));
}

where the "Response Threshold" constant describes the level required to "turn on" the output of any neuron, usually something like 0.95 - if you have a background in electronics, compare this to CMOS and TTL logic switching thresholds.

Once we've shoved our Activation Sum through that function, we have the output value for the neuron. Typically in our software models, each neuron only has one such output, and due to the Activation Function, its value is typically in the Normal range, just like the input values and weights..

Neuron networks arrange their neurons in layers.
The first layer in a neural network is the input layer.
This is the network's interface with the external environment - we plug in external values here, something I'll talk about later.
There's no neuron on this layer, it's just an array of floats provided from external sources.

Behind the input layer is the first real layer of neurons.
If we look at just one neuron on this layer, each of its inputs is connected to one of the values in the Input Layer. That is to say, each neuron on this layer has the same number of input connectors as there are in the layer before it. We can say that the layer of neurons is 'fully-connected' to the previous layer. The outputs of the neurons on this layer are treated as inputs to the next layer of neurons, should further layers exist. Layers whose neurons are connected only to other neurons are called 'hidden layers'.

The outputs of the final layer of neurons forms the set of outputs for the entire network.
If you've been following, and this topic is new to you, then you should be starting to realize how the network outputs are calculated.

In order to 'run' a neural network, we simply shove floating point values into its inputs, and then solve the network layers one at a time, calculating all the activation sums for the neurons on the current layer, then handing their outputs to the next layer's inputs, solving that layer and so on, until the outputs are obtained.

Making a neural network run is actually not hard at all, and to be honest, training a neural network via traditional back-propagation methods such as linear descent is also actually quite easy.
In the next post I will briefly discuss the math and methodology for training neural networks using test data before discussing an alternative which has very different requirements, strengths and weaknesses.


Wednesday, August 10, 2016

A few words about Fuzzy Logic with respect to Artificially Intelligent Washing Machines


So, if you've been following along, you'll be expecting to see some math required for any feed-forward neural-network artificial-intelligence system. Try saying that with a mouth full of stuff!

You're aware that Neural Networks are generally designed around the concept of Neurons, which I compared to a logic gate - my friend Bryant jumped on this point, probably because I elsewhere used the term 'binary', which perhaps implied that Neural systems are binary in nature - they too are logic systems, that have inputs and outputs, and 'pure' binary systems (like computers) also have inputs and outputs - but neural networks are not based on binary logic - all the values inside a neural network are signed floating-point values, as are the input and output values.
Neural logic systems, even the simple Neuron, which I compared to a Logic Gate, have more in common with Fuzzy Logic Systems than with binary, or pure digital logic systems.
For example, while a binary logic gate has two inputs and one output, a fuzzy logic gate has at LEAST three inputs, and one output - one of the inputs is called 'bias', funnily enough, which should bring us all back to ground plane with respect to what is meant by 'neural logic' and 'neurological systems' (terms that will be appearing in the whitepaper that I am writing concurrently with this blog and the project code itself).

In a fuzzy logic system, there is no such thing as a 'clean signal' such as 'on' or 'off' - there is no binary concept, there is only floating point values, signed or not. Effectively, devices can be not only ON or OFF, but be ON BY SOME DEGREE, or OFF, per 'digital' signal.
Digital signals are not binary signals - they are just numbers, floating point numbers, in theory they are infinitely variable and infinitely precise, their values are not concrete, they are 'fuzzy values', and just when you thought this sounds like some crazy thing I made up, or some weird kind of science, you better go and look at your washing machine, because it's almost certainly using this stuff, and whoever is doing the washing at your house probably knows that.

Your washing machine is never simply ON or OFF, it is either OFF, or it is ON BY SOME DEGREE, and even then, the MOTOR of your washing machine is not simply going at one speed, and neither is it going in one direction. Fuzzy Systems, which I studied at RMIT along with other numerical systems, are good at two things that binary systems are not designed for - transitional states and detecting the difference between given and ideal states. Devices in hardware that I studied, which bring to bear on this work, include the Programmable Logic Controller which I mentioned previously I believe, as well as the Comparator, which is the heart of any Computerized Numerical Control system, or Robot. I know my automatons, and what makes them tick, and I know that I can use this stuff to make better video games, and challenge the stale status-quo in this industry, there are very few who are inclined to risk their reputation in this way, thank you Peter Molyneux for showing me that the majority are just following each other.

Neural Networks vs Genetic Algorithms: Part 2 - Different ways to train Neural AI


I'm currently aware of exactly two approaches toward the training of Neural Networks.
They are known by several names, such as Supervised versus Unsupervised training, but as far as I can tell, there are just two methods being implemented today.

The first training method I want to talk about is the traditional 'BackPropagation' technique, also known as 'Supervised training'.
This training method has some requirements in order to work effectively, and the most important requirement is Human Supervision - the Supervised training of neural networks invariably involves the engagement of a human trainer, or trainers, whose responsibility is to confirm that the network is producing appropriate output for any given set of inputs.
The second major requirement for Supervised training is a comprehensive set of training data, comprised of a big list of test input data and associated 'desirable output' data.
The idea is to show the Neural Network one set of Input data at a time, examine the outputs produced by the network, and compare them to 'known,good' output data. If the outputs match the desired pattern, then training is complete for that specific input data pattern. If not, the difference between the current output, and the desired output, can be measured, and the resulting 'Error Term' can be fed backwards through the network by determining at each Neuron layer, how much each Neuron contributed to the Error Term, and adjusting the Neuron weights such that they should produce the desired output. Such corrections are introduced 'gently', such that many training iterations are required in order to achieve the correct output results - this is deliberate, because modifying the Neuron Weights too sharply has two consequences - the network will rapidly learn to produce correct output for the current training inputs, but will begin to produce bad output for input patterns that it has already been trained for - that's definitely counter-productive.
Anyway, the math involved in back-propagation is just a transposition of the same basic math formula that is used to calculate the outputs from the inputs to begin with - if you can grasp the math required to calculate the Activation Sum for a Neuron, then you have already got the answer you need in order to 'correct' the network error terms at each Neuron, you just need to express the formula slightly differently.

The second training method that I've encountered, and the one that I am more interested in for the purpose of game development and simulation of artificial life forms, has a completely different basis to the traditional training method discussed above. It's not based on measuring and correcting error terms, instead it's based on chaos theory, or more precisely, a natural phenomenon which is called 'emergence'. In my next post, I'll present first the mathematics involved in both forward and backward propagation of neural signals, and follow that up with a discussion of the 'nature of chaos', and the very nature of nature itself - I will define 'emergence' and talk about properties of closed systems, and how all of that can be applied to produce an Unsupervised machine-learning system.


Wednesday, August 3, 2016

Neural Networks vs Genetic Algorithms: Part 1 - Introduction to FeedForward ANN


For the purposes of this blog, I am primarily interested in discussing two artificial models of biological phenomena, namely Artificial Neural Networks and Genetic Algorithms.
Neither of these topics is trivial, however, neither is really as complex as it appears.
Although unrelated, they can be used together to create something more interesting than the sum of its components - let's start with Neural Networks.

They're a model of what we think is happening inside the brains of those living entities of high enough order to actually have a brain, which of course includes humans, but also includes the lowliest invertebrates and even some creatures whose brains are 'unconventional'.
According to the theoretical model, the brain is constructed of 'Neurons', which are effectively simple biological logic gates that, when sufficiently interconnected, are capable of performing more complex calculations, and more abstract tasks such as pattern recognition (image recognition, face tracking, optical character recognition, gesture recognition, voice command recognition and voice to text translation, the list really does go on and on, and I sometimes wonder what problems exist where they 'cannot' be applied.

A theoretical neuron looks a bit like a squid or jellyfish, typically it has lots of 'inputs' that can be visualized as tentacles, with which the neuron receives external stimula, and it has a singular 'head' producing one output - each input has its own 'weight' which could be visualized as 'electrical resistance per input tentacle', and finally, each neuron has one additional input, known as Bias, which is not set up to sense the environment like the other inputs, but instead takes a fixed value, typically a constant value - like the other inputs it has a Weight (or resistance), but unlike the others, it is not connected to anything else.

There's a simple mathematical formula that we can apply to any neuron which will result in a value known as the 'activation sum', and a related function known as a 'Sigmoid Operator' whose purpose is to force Activation Sums back into the Normal number range - simply put, based on the neuron's inputs and weights, the sigmoid-transformed activation sum will indicate whether that neuron 'fires' or not, producing a binary output at its head.
Neurons are typically connected together in layers - the output of each neuron in layer N is connected to one of the inputs owned by EACH neuron in layer N+1 - the network is Directed, and Highly-Connected... Therefore, outputs on each neuron in a layer have a varying effect on neurons in all subsequent layers. Of course, real biological brains don't quite operate this way - but they are very similar to the theoretical model, enough so that this model can and does work very well, even if it's not yet clear how to make any use of this stuff, please be patient, as each post will contribute to a greater understanding of how game developers can take advantage of this technology, and how it's relevant to modern game developers targeting today's consumer devices.

Neurons do not merely resemble logic gates - one neuron can be used to implement any simple logic gate - the kind of logic gates that make your computer work!
You see, both biological neurons, and their artificial counterparts, can be 'trained' - and this is where it gets interesting for game developers, and other kinds of engineers... for any input signal pattern, we can teach a neuron to produce a known output - this holds true not just for simple neurons in isolation, but also for entire networks of them - given pattern X at its inputs, we can teach a neural network to produce pattern Y at its outputs - and given pattern Y, it can produce pattern Z - you can think of a neural network as a black box that can LEARN to map any given input to any given output - rather like a reprogrammable 'PLC' (programmable logic controller) you can find in some industrial electronics, this is able to do something that conventional electronic hardware (like your computer) cannot do at the hardware level - its logic can be reprogrammed to produce ideal logical outcomes for a set of given input scenarios. The number of i/o mappings supported by a neural network is related directly to the complexity of the network (number of neurons, connectivity, etc.) and inversely to the accuracy of the output - that is to say, the less mappings we teach a network, the more reliably it can produce the outputs we taught it, even when the inputs are 'dirty'. The more different outcomes we attempt to map (to a network of constant complexity), the less reliable the network output becomes (since previously-learned mappings can be corrupted), even when the input is nice and clean... and the only two ways to 'repair' such a scenario, are to spend more time and iterations training the network to increase its reliability, or to increase the network's complexity (by adding more neurons and connections) which again requires more training, more time, to achieve a high degree of confidence in the accuracy of the output for arbitrary inputs.

The way that neural networks receive 'training' can vary - the typical 'supervised training' applied to consumer software based on this sort of tech today (voice recognition for example) involves showing a set of inputs to the network, looking at the outputs that produced, comparing them to a 'desirable' set of outputs, measuring the error term, and then propagating the error term 'backwards' through the network (from output layer, back towards the input layer), which results in adjustment to the 'input weights' at each neuron in the network. This takes a lot of iterations (developer time) and does achieve good results in terms of producing a 'smarter network' however there are other training methods based on self-learning which may be more applicable to games, for example allowing the AI to directly learn from player inputs during gameplay, that's a technique that some car racing games have used to teach braindead AI drivers how to race, but it's not the technique that intrigues me the most, because it still relies on humans to help improve the outcome.
My next post or two will look at a machine learning technique that does not rely on human intervention of any kind, it is equally capable of self-improvement, with or without human players being involved.



Tuesday, August 2, 2016

Genetic Learning for Artificial AI makes for better games

Welcome visitor, to the deep recesses of my cranium.
This blog will supplement work done during the final semester of my bachelor degree (Games and Virtual Worlds), and in particular, will describe both the background knowledge and implementation of a GPU-based neural learning system which allows your game agents to adaptively learn how to best defeat each other, as well as real players - reactive AI which bring their game to you, and which can adapt to your strategies in realtime.

The system described can work for basically any game, and a lot of other well-defined problem domains, and does not rely on any expert knowledge of the game or other simulation environment.
This means that it does not matter if your game features spell-casting mages and arrow-shooting elves, or whether you are simulating crowd panic behavior in the case of some emergency scenario, or any other simulation involving intelligent automatons.

Especially, we are not working with State Machine technology, neither FSM nor DSM are relevant to this work - there is no talk of behavior trees (or even defined behaviors), there are no conditions tested, there is no code or script implementing any logic resulting in predefined behavior.

In nature, there are two primary mechanisms which contribute to the process we know as the evolution of species. The first one, which many are familiar with, is called Natural Selection.
Even the religious people who disclaim evolutionary theory (creationists) can agree that Natural Selection does occur, it simply means that living entities which are 'unfit' for any reason are culled by natural processes, resulting in a healthier population. Whether this is caused by God, is a Causal Effect, or something else, is a subject for another blog - what interests me is that we can create software models of life, based on our theoretical understandings, and those models are able to produce conditions beyond their design, which share many similarities to those we so often discover when studying any population of living entities.

 The second mechanism, which is less-celebrated but nonetheless well-known both to the educated and to the ignorant, is called Genetic Mutation... Mainly, we can think of this as the SHUFFLING of a set of cards that the universe hands an individual Living Entity in terms of which of its genes are actually expressed, versus which are dormant - given two parent entities, their child usually resembles them both in some ways - but due to this genetic shuffle, things can change in one generation very quickly. Existing genes - even if they are dormant in the parent creatures, are the most likely to appear - our creatures have a Genotype - they have well-defined characteristics that tend to survive across many generations, which can be used to identify their Species - but they also have a Phenotype, which describes how our creatures actually look and to a lesser extent behave - how the active genes are expressed is a result of the environment that the creature finds itself trying to exist within. It still depends on those genes being present, but not dominantly so.
While natural selection is a Filter mechanism which tends to reduce genetic diversity within a population of Living Entities, genetic mutation is also a mechanism by which the genetic potential of a population may be improved and increased, more than just a reshuffling of the hand you have been dealt.
It can involve environmental factors which actually modify the genetic code, or DNA, of a Living Entity during its lifetime, such that its offspring are genetically equipped to cope with conditions that might not be endured by the offspring of its peers, who lack such genetic diversity.
Triggers for inducing genetic mutation include radiation sources, dietary factors, environmental factors such as prevailing weather conditions and other ambient conditions such as temperature and humidity, in fact the number of things that can cause genetic mutation is huge, and is the reason that life still exists on this planet, of that I am convinced.
It is noteworthy to mention that Genetic Mutation can result in positive, negative, and neutral outcomes: negative phenotypes tend to be culled through Natural Selection and similar processes (such as the ostrasization and ejection of  many such individuals from their population due to their alarming differences), neutral phenotypes represent genes whose expression may occur infrequently or skip some generations, and positive phenotypes represent 'winners' within any given generation - the fittest of the species, with respect to the genetic average.

The study of biology, with respect to genetics, touches on such mathematical topics as Probability and Statistics - effectively, each Living Entity is a Pool of Genetic Potential, and each Genetic Pool is subject to variable conditions - Natural Selection (and a third genetic mechanism, husbandry or artificial selection) and
Genetic Mutation are the main influencing mechanisms but there are others.