Neural Networks

Noughts and Crosses (Tic Tac Toe) using Artificial Neural Networks.

If simple feedback could be used to train our version of Donald Michie’s MENACE machine and Evolutionary Algorithms used to produce very acceptable solutions to the Traveling Salesman problem then, we wondered, could a combination of evolutionary techniques and artificial neural networks be applied successfully to the game of noughts and crosses?

Here is an account of some of our early experiments using forward feed neural networks. You will find enough information to assist you in building your own neural networks and you can take advantage of some of our experience to learn how to apply them effectively.

An artificial neural network is not difficult to construct in Visual Basic or most other programming languages. You could probably do a reasonable job in Microsoft Excel. Neurons are fundamentally just data with a mathematical function to help adjust their output value. Neural networks are just collections of such neurons and arrangements of logical connections between them. Click here for a description on the basics of constructing a neural network and for some simple to follow Visual Basic source code for those elements that may not be immediately obvious.

Evolving the Neural Networks

We first constructed 200 neural networks each made up of 9 input neurons and nine output neurons (one input and one output for each playing position on the grid). each neural network also had 10 hidden neurons. Why ten? Probably because it looked good on a diagram – the number and structure of hidden neurons is a bit of a black art and we had to start somewhere. This was a forward only network with each input neuron connected to each hidden neuron and each hidden neuron connected to each output neuron. We thus started with 180 connections between the 28 neurons in each network.

Each network played 100 games against the same automatic player as we used to train the Menace machine. In 50 of the games the neural network played first and in the other 50 the automated player started the game. After each neural network had played 100 games the 100 networks with the lowest scores were culled and the 100 networks with the best scores were evolved to create child neural networks taking the total back up to 200 for the next round of games. Parent and child networks then battled it out to survive and evolve again.

A “round” of 100 games for each of the 200 current neural networks was typically taking around 15 seconds so it was not long to wait to see how the networks evolved.

After 10 rounds with our first neural networks we were quite surprised at the level of play when testing the highest scoring neural network interactively. It was playing already much better than random – so much so that we added some diagnostics to the interactive program screen just to check that the network had not just produced a hierarchy of preferred squares on the grid – we wanted to ensure that things changed in repose to the progress of the game. This turned out to be something of a “false dawn”.

Which neural networks should survive?

Deciding which potential Traveling Salesman solution should survive was easy – we just took the 100 shortest routes after every evolutionary round. Noughts and Crosses can have three results – a win, a loss or a draw. We had to decide how to “score” each potential result.

We started off the evolution of our noughts and crosses playing neural networks by applying similar criteria to the feedback used within the Menace machine. That was 3 points for a win, 1 point for a draw and –1 for losing. The values felt about right – we wanted to reward winning but also to recognise that a draw was an acceptable result against strong opposition. This was an erroneous start point. The “cost” of losing compared to a draw was too small and the benefit of winning was not high enough to compensate. Scoring 3 for a win, 1 for a draw and –10 for a loss produced an evolving population that seemed much more intent upon winning and much less content with losses. The point of this simple game, after all, is not to lose if you can’t force a win. In fact these values continued to change as the experiments progressed but it does show that the feedback has to reflect the intended outcomes. In this game the requirement is “do not lose”.

The success ceiling

Our initial simple neural networks produced a standard of play that worked reasonably hard at attacking but often conspicuously failed to defend. They were winning a reasonable number of games against an automated player but were losing a lot as well – and our automated player was not that good. I would guess that the evolved neural networks (of over 100 generations) would manage just fine competing with my five year old son but fail badly against an average teenager even after 1000+ generations.

Perhaps the structure of the neural network needed some adjustment. We could, of course, change the mutation process so that the networks added or deleted hidden neurons or neuron connections at random. Alternately we could be prescriptive and try to measure the result of different changes. The first thing tried was a direct connection between each input neuron and the corresponding output neuron. This would give the network an opportunity to make a direct allowance for unused playing positions and to weight existing noughts and crosses appropriately.

The connections between the input neurons and the output neurons made only a marginal contribution by itself. Perhaps the problem was much more to do with the fact that the neural network had no way of establishing the topology of the game – it had no way of establishing that each input neuron was part of at least two rows of three cells. The neural network needed some mechanism to allow it to come up with the importance of this relationship. An additional hidden layer was added to the networks. This layer consisted of eight neurons and each of these neurons represented one of the eight winning lines of the game. These new hidden neurons were connected to the relevant input neurons and then to each of the ten neurons in the second hidden layer.

Our final designs added a few more neurons to the “topology” layer to provide input not just on the eight possible rows but on other potential groupings of adjacent cells.

Survival of the fittest

The question you have to ask is “fit for what”? This is where those with a Life Sciences background can be forgiven a smile at our expense. Our neural networks evolved – very effectively adapting themselves to our artificial game playing environment. Unfortunately, that did not necessarily make them good noughts and crosses players. The best networks adapted themselves so that they maximised their scores playing against the particular playing style of our game playing algorithm. Plus the networks were only competing against each other for survival – without a “breakthrough” that did not mean they had to become good players. Our early top players were very boring to play against when we tried them out ourselves – frankly they were not much good at the game either.

We had to go back and adjust our noughts and crosses playing software to introduce a good deal more variety into the game. With a more flexible element introduced into the play the algorithm became a more interesting player and showed a good deal more variety in responding to any given playing position. Indeed the importance of variation in artificial evolutionary environments was re-enforced by the relative successful development of neural networks that just played against each other instead of against an algorithm (see below).

Can evolution work by itself?

The final question we wanted to answer with this experiment with noughts and crosses was: did we need our pre-programmed player to test the evolving networks? A known solution is not available for all problems we might want to tackle with evolving neural networks. So an experiment was set up to see how things would progress just playing one network against another. This had the added potential benefit that if things went well the standard of play might continue to rise through the process. This might prove a more effective way of developing the specific “skills” required to play noughts and crosses effectively.

The decision was taken that for each round of games each network in the population would take it in turn to play as X (that is going first) against all of the other networks. Thus in a given round each network would play 199 games as X and 199 games as O and thus two games against each competing network. Scores would be awarded to each participant in each game. Following each round of games then the usual cull would take place together with the evolution of 100 new networks – each a child of the 100 survivors. This took longer to run that our 100 games against the automated player – partly the extra games but also the additional loading and saving of the neural networks from disk. Perhaps we should have kept more of them loaded in memory?

The results were very good indeed producing noughts and crosses playing neural networks with a more varied playing style than those developed using the algorithm. It was clear that this was a successful strategy for developing neural networks and was a good indicator that such techniques might be applied in situations where we did not have an algorithm capable of meeting a given requirement. A subjective judgment would be that these neural networks were slightly better players than our original networks but they still lacked that “killer” drive to win (or not lose) the game.

The final question

This sounds like the end of our quest to apply Neural Networks to the game of noughts and crosses (tic tac toe) but there was one more leap of understanding required before we were satisfied with the approach.We changed the question.

[ Home] [About Us] [Systems] [Software Review]

Google
  Web www.adit.co.uk