Creating a Reversi-Playing Agent Using a Combination of Minimax and a Value Array
1. Introduction
Reversi (also marketed under the names “Othello” and “Iago”) is a two-player game played on an 8 by 8 board with pieces colored white on one side and black on the other. Each player takes turns adding one piece to the board in a location that converts at least one existing piece from the opponent's color to his color. The game ends when neither player can legally add a piece to the board (usually when the board is filled). The winner of the game is the player with the most pieces of his color at the end of the game.
In Reversi, the pieces on the board do not “move” as they do in Checkers or Chess, nor are they removed. Once a piece is placed on the board it can only be flipped from one player's color to the other. A player can play at any location on the board so long as the following is true. 1) The location is currently empty. 2) There is a piece on the board that is the opponent's color and is adjacent to the desired location. 3) A straight line can be made (in at least one of eight directions) from the desired location of the new piece to another piece on the board of that player's color that contains only pieces in the opponents color.
Some basic strategies arise from this. The corner locations are highly desirable since once the corner is claimed, it can never be flipped. A corollary to this is that the locations adjacent to the corner are least desirable since this may enable the opponent to claim the corner later in the game.
These strategies are the basis for the creation of a Reversi-playing agent based on the utility of board locations. This paper studies the creation of such a Reversi-playing agent.
2.Proposed Method
The seed for the Reversi agent is a purely random agent. The agent simply chooses a random position to play from all of the available positions.
2.1Value Array
To create a location value-based agent a grid of weights will be created with all locations set initially to zero. All of the plays that each agent makes during a game are recorded until the end of the game. At the end of the game the locations played by the winning agent will have their values increased by one while the locations played by the losing agents will have their values decreased by one. This is repeated until reasonably stable values are acquired (100,000 times). The values are normalized to the range [0..1].
The hypothesis is that array locations with a higher value are associated with winning games. This would imply that an agent that chooses to play at locations with the higher values will be more successful than one that doesn’t. Therefore, the second agent will choose locations simply based on the predetermined value of that location.
2.2Minimax Algorithm
The goal of Reversi is to have more pieces on the board at the end of the game than your opponent. The minimax algorithm will be used by the agent during the endgame phase. The endgame phase is defined as the portion of the game when less than ten empty spaces remain on the board. Ten was chosen as the delimiter since the minimax algorithm usually makes a determination in a reasonable amount of time (less than three seconds) when the search depth is ten or less.
3.Experimental results
3.1Random vs. Random
Generally speaking, the player that is last to play has a slight advantage. This is usually the black player since white plays first. The games of Random vs. Random support this as shown in figure 1. The random agent playing white won 46% of the games played. (Games ending in a tie are discarded.)

Figure 1.
The values created by playing random agent against random agent are recorded in an array. A Greyscale representation of the array is shown in figure 2. It should be noted that, as hypothesized, the corner locations have the highest values (white) while the adjacent locations to the corners have the lowest (black). The value-based agent that uses this array will be called Value Array Agent Iteration 1.

Figure 2. A Greyscale representation of the first value array. White is highest value, black is lowest.
3.2Value Array Agent Iteration 1 vs. Random
Value Array Agent Iteration 1 was played against the random agent. The value-based agent (playing white) won 76% of the games. The distribution is given in figure 3.

Figure 3. Value Array Agent Iteration 1 versus Random
3.3Value Array Agent Iteration 2 vs. Random
While Value Array Agent Iteration 1 was playing against the random agent a new value array was generated. The value-based agent using these new values was given the name Value Array Agent Iteration 2. Value Array Agent Iteration 2 was played against the random agent with better results shown in figure 4. Value Array Agent Iteration 2 won 83% of the games.

Figure 4. Value Array Agent Iteration 2 versus Random Agent
3.4Value Array Iteration 1 Plus Maximin
The Maximin algorithm was appended to Value Array Agent Iteration 1. This yielded a significant improvement to both Value Array Agent Iteration 1 and Value Array Agent Iteration 2. This agent won 91% of the games against the random agent. The result distribution is shown in figure 5.

Figure 5. Weighted Agent 1 Plus Maximin Agent versus Random Agent
The Maximin algorithm was also combined with Value Array Agent Iteration 2, but the results were not statistically different.
Also, another iteration of the value array was created when playing Value Array Agent Iteration 1 plus Maximin against the random agent. That array was combined with Maximin to create an agent, but it, too, did not score noticeably better when compared to Value Array Agent Iteration 1 plus Maximin.
4.Conclusions
Value Array Agent Iteration 1 was retained to be used in the final product due to concerns of overfitting.
Similar to other examples throughout the field of artificial intelligence, it is surprising that very simple agents produce very useful results. This supports the principle of 90/10: 90 percent of the value can be achieved with 10% of the effort. (The remaining 10% of the value requires about ten times as much effort.)
Addendum:
Improvements were made that increased the strength of the Reversi agent since this was initially distributed. Most notably, the minimax algorithm previously did not properly consider cases where a play resulted in a board state unplayable by the other player. This has been fixed.