|
Our simulation was built in Visual Basic to explore Donald Michie’s Matchbox Educable Noughts And Crosses Engine. This machine demonstrated the capability of machines to learn and applied feedback to achieve that end. Some may know the game as Tic Tac Toe but the rules are the same and Noughts and Crosses is thought to be the original name in English.
The MENACE machine always played “X” and therefore always took the first turn at the start of each noughts and crosses game. Michie was reported as using 300 matchboxes - one representing each possible non winning board position facing the machine in the first four moves. Michie also had nine (9) different colours of glass bead. Each colour represented one of the nine playing positions in the game. Michie placed 4 beads for each of the 9 colours representing the 9 available cells in the initial move box, 3 beads for each of the 7 available cells in each box representing the playing positions for the machine's 2nd move, 2 beads for each of the 5 available cells for the machine's 3rd moves, 1 bead for each of the 3 available cells for the machine's 4th moves
The MENACE machine then went through a learning process by playing trial games. Michie selected the matchbox that represented the board position facing the machine at each turn in the game. He removed one of the glass beads, selected at random, from the box and placed it on top of the box. The colour of the glass bead indicated the machine’s “choice” of play. If the machine won the game then it was “rewarded” as each box with a bead on top had three beads of the same colour added to the box. If the machine managed a draw then the reward was one bead but if the machine lost then the beads on the top of the boxes were removed from the machine rather than being returned to each box.
When the MENACE “machine” runs in learning mode successful outcomes are reinforced in such a way as to bias the machine to make similar moves to those that were successful when faced with the same board playing positions in the future. It is interesting that such reinforcement is applied to all of the moves - irrespective of their individual contribution to the win or draw. Strategies that fail to result in at least a draw are less likely to be repeated. The overall effect of the feedback results in a competent game playing response to most board situations.
Clearly running a real MENACE machine using matchboxes and glass beads would be pretty tedious. Particularly if one had to take symmetry into account when identifying the correct matchbox to use for each turn in the game. The fact that Michie built and ran such a machine as more than a paper exercise or mind experiment is a strong indication of the lack of available computing resources in the UK (or the rest of the world) at that time. We are not hampered by such a limitation and can build a MENACE machine is software without too much effort.
Calculating the Possible Boards
One of the functions of the simulation is to calculate all of the non-winning boards that might face the first player in the first four moves. The recursive function identifies 52489 such boards and when duplicates are removed this results in a total of 2201 unique boards. The position facing the first player at that player’s fifth move does not have to be calculated as there will only be one empty playing position available. If symmetrical duplicates such as rotations and reflections are removed from the list then we end up with 304 truly unique boards. This is close enough to Michie’s reported 300 matchboxes for me to be content that this is the true number he used.
Running the Training Sessions
Our Menace machine simulation allows you to train the software machine interactively or using an automated noughts and crosses
algorithm that can be run at one of several skill settings. It is clear that the Menace machine learns most quickly when faced with a high standard of play although it will suffer a large number of initial defeats. A more gentle introduction to the game sees a better initial performance but it takes much longer to build a high level of skill.
Our simulation exposed one flaw in the machines design - there was a natural limit to the amount of negative feedback that could be applied. In a run of losses some board playing positions could run out of potential moves as all of the “beads” would have been removed from the box. We had to disable negative feedback to achieve a fully trained Menace machine.
Click here for more on running the simulation.
How well does the MENACE Machine perform?
Very well indeed. A trained Menace Machine produces a
high standard of play. Download the simulation and try it for yourself. We have three zipped download packages available:
|