CS221 – Artificial intelligence

Autumn quarter 1999

Stanford University


Programming Assigment #2 : Othello

_____________________

The French Connection
Nicolas Scapel (
nscapel@cs.stanford.edu)
Frédéric Mazzella (
mazzella@cs.stanford.edu)

Outline

A. Strategy
B. Heuristic Function
C. Learning and training
D. Search Algorithm
E. Time Management
F. Sources

A. Strategy

Othello game can be played at very different levels. We can rank the strategies the most common strategies as follow:

A novice strategy is focused only on material, i.e. the difference between RED and WHITE disks.
A more interesting strategy also takes into account special locations on the board: corners, C and X squares, center of the board. This is the strategy used by most common players.
An advanced strategy require features such as mobility and pattern identification.

The strategy used can be described as a sequence:

Opening: The first moves are played quickly, using a low depth search. People also sometimes use precomputed moves, or a move database (open book play) for these very first moves.
Early Midgame: During this period, the program tries to maximize its mobility (which can mean minimizing the number of disks in the early midgame), play in the center zone, and avoid C and X squares.
Midgame: The programm still maximizes its mobility but begins to take corners, since the algorithm avoided to use C and X positions, the opponent usually took them, and it is possible to take the corners now.
Late Game / Endgame: Now the program focuses on material. Depending on the search time, the program can start playing with a full depth search around 12 moves before the end of the game.

B. Heuristic Function

Each heuristic emphasizes some features that turn out to be important in each of the 4 game phases. We thus developed many features on which we put different weights to build different heuristics. The choice of these features was directed by researches on the different Othello strategies we found in many papers, books and web sites [1] [2] [3] [4]. We implemented 5 important features computed by 5 functions in heuristics.c:

  1. ComputeCorners
  2. ComputeC
  3. ComputeX
  4. ComputeMobility
  5. GreedyEvaluate

1. ComputeCorners

ComputeCorners returns the difference between the number of corners owned by WHITE, and those owned by RED. This feature is emphasized in the LATEGAME phase, when it is possible, and crucial to reach the corners. The weight of this feature should always be positive, since it is always good to take a corner, regardless of the phase of the game we are playing. A corner is a disk that can never be flipped, thus contributing to the final score. Actually, it would not be clever to emphasize this feature only in the LATEGAME phase without having features making it possible to reach the corners in the earlier states of the game: this is done by emphasizing the features C position count and X position count in the MIDGAME phase. We also implemented a Corner Killer procedure, see below.

2. ComputeC

ComputeC returns the difference between the C-positions owned by WHITE and those owned by RED. A C-position is a square near the corner and on the edge of the board, when the neighbor corner is empty. It's a bad to own this square, because the opponent can take the corner more easily, so the weight of this feature should always be negative. This feature along with X-positions count is emphasized in the MIDGAME part, when it is possible to reach those positions, and when it is bad to take them, because it would reduce the chances to take the corresponding corners after.

3. ComputeX

ComputeX is identical to ComputeC except that it counts the X positions instead of the C positions. An X-position is a square near a corner and on the main diagonal when the corresponding corner is empty. It is also bad to own such a square, so the weight of this feature should always be negative also.

4. ComputeMobility

ComputeMobility represents the number of possible moves. Once again, the feature gives actually the difference between the mobility of WHITE and the one of RED. This feature is especially emphasized in the EARLYGAME phase, and is still important in the MIDGAME phase, less important in the LATEGAME phase, where the heuristic is essentially aimed at taking the corners.

When the game is launched, the PrecomputeMobility function computes the mobility for each possible combination of the three possible occupations of each square in a line: WHITE, RED or EMPTY [5]. The results are stored in arrays of type int[3][3][3][3][3][3][3][3][3][3], named rowMobility and diagMobility. When we want to compute the mobility of an entire board,we simply sum the mobility of each row, column and diagonal of the board. Doing that way, a same square may be counted a two mobilities for a same color, if there is a mobility in two different directions, but it turned out to be a very good feature to know in the EARLYGAME and MIDGAME phases. Maximizing one's mobility means two things: keeping a lot of different possible moves, thus enlarging the chances of finding very good moves, and reducing the number of ! ! possible moves for the opponent, thus forcing him to sometimes play at very bad locations (e.g. X or C).

5. GreedyEvaluate

GreedyEvaluate returns the difference between WHITE's disks and RED's disks on a board. It's actually a good feature to have in the LATEGAME phase, and the only good feature to have in the ENDGAME phase when the search algorithm actually reaches the end of the game to get its heuristic value. However, it is a very bad feature at the beginning of the game, since it does really not deal with taking good or bad squares for a further evolution. In the ENDGAME phase, we are able to search at depth 11 using only this greedy function, and we remarked that the expected values given by the GreedyEvaluate function were very similar to the actual final result.

Comments and additional procedures

All these features were normalized, having the same scale [-100,100]. This allowed us to make good guesses when initializing the initial weights before the learning phase, thus leading to a satisfying convergence.

When trying our heuristics against other players, we remarked that since the weights were not emphasizing the "take the corner" attitude in the EARLYGAME and MIDGAME phases (we were emphasizing the "don't take X or C positions" instead, we were sometimes loosing corners that we could have taken earlier (i.e. our opponents were on X or C positions, but be weren't taking the corner...). Thus we implemented a Corner Killer procedure, which had the double effect of always taking a corner when it was possible, and saving computation time for the next searches, since the Corner Killer procedure was not thinking at all!

We thought that during the EARLYGAME phase, situations where taking X and C positions would not be frequent, and thus put little weights on these features in the EARLYGAME phase. But when playing in the tournament, we lost some strategic squares because of that. We should have implemented a procedure penalizing these moves in the EARLYGAME phase, as we had implemented Corner Killer to encourage our algorithm to take the corners when it was possible. That's what we learnt, but that's not what our learning procedure had learnt, since the program had never played against such players before the tournament.

C. Learning and Training

We use the following learning rule, referred to as Method 1 in the handout:

At state s (value v(s), features vector f(s)) we perform a search at depth d to find the optimal move. v' is the value we get after the opponent has taken the move we planned in the search tree. This means the program plays against a 1-depth search version of itself. The UpdateWeights function implements the classic learning rule to get v(s) closer to v'.

Achieving the weights convergence in such a learning process is never granted :

we are dealing with a 5-dimensions space, of unknown topology.
we only know the value of a decision at the end of the game, so a "bad" program is only going to reinforce its bad behaviors. The hope is that a "beginner" is going to reinforce its good decisions.

Clearly, this method is prone to get stuck in a local minimum if we start with random weights. Here are two possible approaches to avoid this :

Brute-force approach : we take a large number of samples in this space (n = 100, 1000, 10 000, depending on your network admin...), test them, and train only the best one. This means we take n programs with equally distributed random weights, let them play against each other, and keep the winner. Chances are that the winner has 'fairly good' weights.
Our approach : careful initialization. The idea is to normalize the features and the weights, so that the features all vary in the same range and the weights simply represent the percentage of a feature in the heuristic evaluation. This is done by dynamically rescaling the features depending on their possible values at a point in the game, and by weights normalization in the learning process (see part B). Then, we can directly "translate" our strategy (described in part A) into numbers, and use them as a starting point.

Using this approach, we were able to achieve weights convergence. The program easily defeated melloyello and scissorhands, defeated zizou a few times, depending on the parameters learned but it proved to be weak when confronted to a very different strategy (in the tournament, several mobility-based programs have been completely exterminated in the early game by greedy programs) or confronted to a very agessive strategy (our program doesn't try to control corners in the early game). This proves that a strong and robust Othello program must also be trained against very different programs or human-players.

D. Search Algorithm

The search algorithm is based on alpha-beta pruning. This technique can decrease the branching factor from b to SQRT(b) in the best case, which is when the nodes are sorted in best-first order. This bound cannot be achieved, but we can come closer to it by doing a rough sort of the nodes before expanding them.

We first tried to do a bucket-sort of the child nodes based on their heuristic value. The number of expanded nodes drops significantly but the search is then really expensive…the search takes twice more time !

We decided not to do a full search, but just to expand in the first place the best node according to our heuristic value. This seemed to be almost as efficient in terms of pruning, and not too expensive.

E. Time Management

Time repartition

Time management is a big issue, and is treated by assigning a certain fraction of the total time to each phase of the game. We did the repartition of time as follows:

Number of remaining moves:

#define EARLYGAME 100
#define MIDGAME 75
#define LATEGAME 25
#define ENDGAME 12

Fraction of time allocated to each phase:

#define TIMEEARLYGAME 0.20
#define TIMEMIDGAME 0.45
#define TIMELATEGAME 0.25
#define TIMEENDGAME 0.10

Standard phase decisions

The two first moves are played with a search at depth 5. Searching at depth 5 at this state of the game turned out to be almost "free" in terms of computation time on the pup cluster machines, and we did not want to spend time on useless computations, since the first two moves are never significant for the final issue. We compute in each phase an average authorized time to spend to compute a move. This automatically builds a schedule for the 100 moves to do. The default search is depth 7 for the first three phases of the game (using alpha-beta pruning and sorting, see the Search algorithm part). We use the msg.time_left variable sent by the server to know how well we are doing. If we are late on the schedule, then we decide to search only at depth 6, and it seemed to be well tuned on the pup cluster machines, since one search at depth 6 was always sufficient to catch up in time and do a search at depth 7 the followin! ! g move. The time repartition among the phases seemed to be reasonable, because in most parties, we had only one or two switches from depth 7 to depth 6. TIMEENDGAME is only 10% because even if we are doing a search at depth 11, theses searches are only for the first moves in this phase and even at this point, they are not too big, because as we approach the end of the game, the branching factor drastically drops and reaches 1 for the last move.

Emergency decisions

We implemented emergency procedures, in case our standard phase decisions would not be sufficient to catch up in time. These emergency decisions would reduce the depth of the search if we would reach a certain lower bound in terms of remaining time. These emergency procedures were never used though, neither in the training steps nor during the tournament. We usually finished with 20% of the time left.

F. Sources

  1. Othello Strategy Guide, VOG Othello Club, http://www.vog.ru/othello/
  2. Othello Strategy and Tactics, (articles authored by Johathan Cerf, David Shaman, (both have been world champion), Karsten Feldborg, (1994 world championship finalist), and other Othello masters.
  3. The guide to the Game of Othello, http://www.armory.com/~iioa/
  4. Thomas Wolf. The Anatomy of a Game Program, http://lglwww.epfl.ch/~wolf/java/html/othello_desc.html
  5. M. Buro. A strong Learning Othello Program, NEC Research Institute, Princeton, NJ

Wednesday, November 24, 1999