|
The Minimax strategy
One of the most common methods of programming the automated play of board games (and other similar games) is to apply the Minimax strategy. Often this will be combined with Alpha-Beta pruning but more of that later.
The goal of the minimax strategy is to minimise the maximum damage that an opponent could inflict following a selected game move or move sequence. To apply the minimax strategy you have to come up with a way of evaluation game positions numerically with (say) +1 being a very good position for you and –1 being a good position for your opponent with values in between indicating lesser positions of strength for either player.
Consider how you might evaluate noughts and crosses playing positions. Three crosses in a row (if you were playing X) indicating a win might score +1. Two crosses in a row would be a high but lesser value. A cross in a square commanding three or four potential winning lines would be worth more than a regular position that just sits in two. The same positions for your opponent would have similar but negative score values. It is normal to experiment with different valuations of specific playing positions in order to arrive at the optimal settings. One could, I suppose, experiment with evolutionary techniques in order to optimise a set of values for a given game.
Now let us apply the minimax strategy. First you have to take a potential move and then consider all of your opponents possible replies. You score your own move and all of the possible board positions following that move. Then for each of your possible moves and your opponents replies you evaluate the potential net position.
A simple table of the scores might help here. Suppose in a given game (not noughts and crosses clearly) you have three possible moves and that your opponent has three possible replies to each of those moves.
|
Your potential move
scores
|
+0.5
|
+0.1
|
-0.2
|
|
Opponents potential
response scores
|
-0.3
|
-0.2
|
+0.6
|
+0.1
|
+0.05
|
+0.15
|
-0.8
|
-0.4
|
-0.2
|
| |
|
|
|
|
|
|
|
|
|
Now each move can be evaluated with the intention of minimising the maximum possible trouble that your opponent can cause. The first move looks superficially attractive but your opponent has the potential of reducing your net position to –0.3 (the maximum damage that can be done to you). The second move is less attractive but the worst position you can get to is a net score of +0.05. The third possible move is unattractive from every standpoint as it is always to your opponents benefit.
The minimax strategy re-evaluates the scores for your three possible moves – replacing them with the worst case values from your opponents responses. Thus your three moves are re-scored as –0.3, +0.05 and –0.8. Thus the minimax strategy decides that the second move is the one to take as the maximum possible damage that you opponent can do is to reduce the net value of your position to +0.05. While it is possible that taking the first move could lead to a very strong position valued at +0.6 you take the safe second move.
The minimax strategy is not perfect as it is always risk averse and does not allow for an opponent’s mistakes – it is based on the assumption that an opponent is always going to make the move that does the most damage to your position. However minimax is probably the best available strategy for a wide range of what are called “zero sum” games such as chess, draughts (checkers) and, of course, a game of noughts and crosses.
Evaluating one’s next move is often referred to as a “ply” in game jargon. Checking your next possible moves and your opponents possible responses is “two ply” and so on.
The minimax strategy can be extended to any number of ply – given that you have the processing power and time to do so. Examining all of the ply in a noughts and crosses game is not difficult but even that can take a few seconds. Our software emulation of the Menace machine initialises itself by calculating all of the possible board positions facing X (the first player to make a move) up to the fourth move – the fifth takes care of itself . That is 2201 possible unique playing positions at the start although things get much simpler as the game progresses. Many games have time limits – even if the limit is your own patience so it is not often practical to examine all of the possible ply to the end of a game. In any case the number of possible board positions that have to be evaluated can quickly get out of hand – just as the
Traveling Salesman problem does as the number of cities that need to be visited increases.
IBM’s Deep Blue may have evaluated up to 36 billion possible chess board positions in the three minutes allowed between moves during it’s famous match against Kasperov. So how did the IBM machine maximise the benefit of all of that analysis and how do we maximise the number of ply we can evaluate using the minimax strategy? The answer is Alpha-Beta pruning.
Alpha-Beta Pruning
Alpha-beta pruning is a technique that reduces the amount of work your software has to do to implement the minimax strategy. The idea is to identify quickly, potential moves in the game that need not be evaluated because the minimax strategy would rule them out in any case.
Consider the following two moves and the possible responses to them:
|
Your potential move
scores
|
+0.3
|
≤
-0.1
|
|
Opponents potential
response scores
|
+0.3
|
+0.4
|
-0.1
|
Unknown
|
The minimax strategy has scored your first move as +0.3 (the lower of the values after your opponents possible responses). The first response to the second move is evaluated as –0.1. This means that your second possible move will be scored at –0.1 or lower. There is now no need to evaluate the second possible response to your second move it is irrelevant - it can only change the evaluation of the move to a lower score. This is known as alpha pruning.
Beta pruning is a bit more complex to follow and needs a larger diagram to explain it clearly. You will also have to add something to your understanding of the minimax strategy. If you are evaluating more than two ply you have to assume that your opponent will make moves that minimise your maximum possible score. That is – your opponent will make moves that stop you getting to positions that you would evaluate as high scoring – if he she or it can.
OK now to the diagram. Here your possible moves are numbered 1 and 2 with some of your possible responses to your opponents next move numbered 3 to 8. Your opponents possible moves have letters to identify them.
|