|
| |
CS221
Artificial intelligence
Autumn
quarter 1999
Stanford
University
Programming Assigment #2 : Othello
_____________________
The French
Connection
Nicolas Scapel (nscapel@cs.stanford.edu)
Frédéric Mazzella (mazzella@cs.stanford.edu)
Outline
 | A. Strategy |
 | B. Heuristic Function |
 | C. Learning and training
|
 | D. Search Algorithm |
 | E. Time Management |
 | F. Sources |
A. Strategy
Othello game can be played at very
different levels. We can rank the strategies the most
common strategies as follow:
 | A novice strategy is focused
only on material, i.e. the difference between RED
and WHITE disks. |
 | A more interesting strategy
also takes into account special locations on the
board: corners, C and X squares, center of the
board. This is the strategy used by most common
players. |
 | An advanced strategy require
features such as mobility and pattern
identification. |
The strategy used can be described
as a sequence:
 | Opening: The first moves are
played quickly, using a low depth search. People
also sometimes use precomputed moves, or a move
database (open book play) for these very first
moves. |
 | Early Midgame: During this
period, the program tries to maximize its
mobility (which can mean minimizing the number of
disks in the early midgame), play in the center
zone, and avoid C and X squares. |
 | Midgame: The programm still
maximizes its mobility but begins to take
corners, since the algorithm avoided to use C and
X positions, the opponent usually took them, and
it is possible to take the corners now. |
 | Late Game / Endgame: Now the
program focuses on material. Depending on the
search time, the program can start playing with a
full depth search around 12 moves before the end
of the game. |
B. Heuristic
Function
Each heuristic emphasizes some features that turn out
to be important in each of the 4 game phases. We thus
developed many features on which we put different weights
to build different heuristics. The choice of these
features was directed by researches on the different
Othello strategies we found in many papers, books and web
sites [1] [2] [3] [4]. We implemented 5 important
features computed by 5 functions in heuristics.c:
- ComputeCorners
- ComputeC
- ComputeX
- ComputeMobility
- GreedyEvaluate
1.
ComputeCorners
ComputeCorners
returns the difference between the number of corners
owned by WHITE, and those owned by RED. This feature is
emphasized in the LATEGAME phase, when it is possible,
and crucial to reach the corners. The weight of this
feature should always be positive, since it is always
good to take a corner, regardless of the phase of the
game we are playing. A corner is a disk that can never be
flipped, thus contributing to the final score. Actually,
it would not be clever to emphasize this feature only in
the LATEGAME phase without having features making it
possible to reach the corners in the earlier states of
the game: this is done by emphasizing the features C
position count and X position count in the MIDGAME phase.
We also implemented a Corner
Killer procedure, see below.
2. ComputeC
ComputeC
returns the difference between the C-positions owned by
WHITE and those owned by RED. A C-position is a square
near the corner and on the edge of the board, when the
neighbor corner is empty. It's a bad to own this square,
because the opponent can take the corner more easily, so
the weight of this feature should always be negative.
This feature along with X-positions count is emphasized
in the MIDGAME part, when it is possible to reach those
positions, and when it is bad to take them, because it
would reduce the chances to take the corresponding
corners after.
3. ComputeX
ComputeX is
identical to ComputeC except
that it counts the X positions instead of the C
positions. An X-position is a square near a corner and on
the main diagonal when the corresponding corner is empty.
It is also bad to own such a square, so the weight of
this feature should always be negative also.
4.
ComputeMobility
ComputeMobility represents
the number of possible moves. Once again, the feature
gives actually the difference between the mobility of
WHITE and the one of RED. This feature is especially
emphasized in the EARLYGAME phase, and is still important
in the MIDGAME phase, less important in the LATEGAME
phase, where the heuristic is essentially aimed at taking
the corners.
When the game is launched, the PrecomputeMobility
function computes the mobility for each possible
combination of the three possible occupations of each
square in a line: WHITE, RED or EMPTY [5]. The results
are stored in arrays of type
int[3][3][3][3][3][3][3][3][3][3], named rowMobility and diagMobility. When we
want to compute the mobility of an entire board,we simply
sum the mobility of each row, column and diagonal of the
board. Doing that way, a same square may be counted a two
mobilities for a same color, if there is a mobility in
two different directions, but it turned out to be a very
good feature to know in the EARLYGAME and MIDGAME phases.
Maximizing one's mobility means two things: keeping a lot
of different possible moves, thus enlarging the chances
of finding very good moves, and reducing the number of !
! possible moves for the opponent, thus forcing him to
sometimes play at very bad locations (e.g. X or C).
5.
GreedyEvaluate
GreedyEvaluate returns
the difference between WHITE's disks and RED's disks on a
board. It's actually a good feature to have in the
LATEGAME phase, and the only good feature to have in the
ENDGAME phase when the search algorithm actually reaches
the end of the game to get its heuristic value. However,
it is a very bad feature at the beginning of the game,
since it does really not deal with taking good or bad
squares for a further evolution. In the ENDGAME phase, we
are able to search at depth 11 using only this greedy
function, and we remarked that the expected values given
by the GreedyEvaluate
function were very similar to the actual final result.
Comments and
additional procedures
All these features were normalized, having
the same scale [-100,100]. This allowed us to make good
guesses when initializing the initial weights before the
learning phase, thus leading to a satisfying convergence.
When trying our heuristics against other players, we
remarked that since the weights were not emphasizing the
"take the corner" attitude in the
EARLYGAME and MIDGAME phases (we were emphasizing the "don't
take X or C positions" instead, we were
sometimes loosing corners that we could have taken
earlier (i.e. our opponents were on X or C positions, but
be weren't taking the corner...). Thus we implemented a Corner
Killer procedure, which had the double effect of
always taking a corner when it was possible, and saving
computation time for the next searches, since the Corner
Killer procedure was not thinking at all!
We thought that during the EARLYGAME phase, situations
where taking X and C positions would not be frequent, and
thus put little weights on these features in the
EARLYGAME phase. But when playing in the tournament, we
lost some strategic squares because of that. We should
have implemented a procedure penalizing these moves in
the EARLYGAME phase, as we had implemented Corner
Killer to encourage our algorithm to take the
corners when it was possible. That's what we
learnt, but that's not what our learning procedure had
learnt, since the program had never played against such
players before the tournament.
C. Learning and
Training
We use the following learning rule,
referred to as Method 1 in the handout:
At state s (value v(s),
features vector f(s)) we perform a search at
depth d to find the optimal move. v' is the
value we get after the opponent has taken the move we
planned in the search tree. This means the program plays
against a 1-depth search version of itself. The UpdateWeights function implements the classic learning
rule to get v(s) closer to v'.
Achieving the weights convergence
in such a learning process is never granted :
 | we are dealing with a
5-dimensions space, of unknown topology. |
 | we only know the value of a
decision at the end of the game, so a
"bad" program is only going to
reinforce its bad behaviors. The hope is that a
"beginner" is going to reinforce its
good decisions. |
Clearly, this method is prone to
get stuck in a local minimum if we start with random
weights. Here are two possible approaches to avoid this :
 | Brute-force approach
: we take a large number of samples in this space
(n = 100, 1000, 10 000, depending on your network
admin...), test them, and train only the best
one. This means we take n programs with equally
distributed random weights, let them play against
each other, and keep the winner. Chances are that
the winner has 'fairly good' weights. |
 | Our approach : careful
initialization. The idea is to normalize
the features and the weights, so that the
features all vary in the same range and the
weights simply represent the percentage of a
feature in the heuristic evaluation. This is done
by dynamically rescaling the features depending
on their possible values at a point in the game,
and by weights normalization in the learning
process (see part B). Then, we can directly
"translate" our strategy (described in
part A) into numbers, and use them as a starting
point. |
Using this approach, we were able
to achieve weights convergence. The program easily
defeated melloyello and scissorhands, defeated zizou
a few times, depending on the parameters learned but it
proved to be weak when confronted to a very different
strategy (in the tournament, several mobility-based
programs have been completely exterminated in the early
game by greedy programs) or confronted to a very agessive
strategy (our program doesn't try to control corners in
the early game). This proves that a strong and robust
Othello program must also be trained against very
different programs or human-players.
D. Search
Algorithm
The search algorithm is based on
alpha-beta pruning. This technique can decrease the
branching factor from b to SQRT(b) in the best case,
which is when the nodes are sorted in best-first order.
This bound cannot be achieved, but we can come closer to
it by doing a rough sort of the nodes before expanding
them.
We first tried to do a bucket-sort
of the child nodes based on their heuristic value.
The number of expanded nodes drops significantly but the
search is then really expensive
the search takes
twice more time !
We decided not to do a full search,
but just to expand in the first place the best node
according to our heuristic value. This seemed to be
almost as efficient in terms of pruning, and not too
expensive.
E. Time Management
 | Time repartition
|
Time management is a big issue, and is treated by
assigning a certain fraction of the total time to each
phase of the game. We did the repartition of time as
follows:
Number of remaining moves:
#define EARLYGAME 100
#define MIDGAME 75
#define LATEGAME 25
#define ENDGAME 12
Fraction of time allocated to each phase:
#define TIMEEARLYGAME
0.20
#define TIMEMIDGAME 0.45
#define TIMELATEGAME 0.25
#define TIMEENDGAME 0.10
 | Standard phase decisions |
The two first moves are played with a search at depth
5. Searching at depth 5 at this state of the game turned
out to be almost "free" in terms of computation
time on the pup cluster machines, and we did not want to
spend time on useless computations, since the first two
moves are never significant for the final issue. We
compute in each phase an average authorized time to spend
to compute a move. This automatically builds a schedule
for the 100 moves to do. The default search is depth 7
for the first three phases of the game (using alpha-beta
pruning and sorting, see the Search algorithm
part). We use the msg.time_left
variable sent by the server to know how well we are
doing. If we are late on the schedule, then we decide to
search only at depth 6, and it seemed to be well tuned on
the pup cluster machines, since one search at depth 6 was
always sufficient to catch up in time and do a search at
depth 7 the followin! ! g move. The time repartition
among the phases seemed to be reasonable, because in most
parties, we had only one or two switches from depth 7 to
depth 6. TIMEENDGAME is only 10% because even if we are
doing a search at depth 11, theses searches are only for
the first moves in this phase and even at this point,
they are not too big, because as we approach the end of
the game, the branching factor drastically drops and
reaches 1 for the last move.
 | Emergency decisions |
We implemented emergency procedures, in case our
standard phase decisions would not be sufficient to catch
up in time. These emergency decisions would reduce the
depth of the search if we would reach a certain lower
bound in terms of remaining time. These emergency
procedures were never used though, neither in the
training steps nor during the tournament. We usually
finished with 20% of the time left.
F. Sources
- Othello Strategy Guide, VOG Othello
Club, http://www.vog.ru/othello/
- Othello Strategy and Tactics, (articles
authored by Johathan Cerf, David Shaman, (both
have been world champion), Karsten Feldborg,
(1994 world championship finalist), and other
Othello masters.
- The guide to the Game of Othello, http://www.armory.com/~iioa/
- Thomas Wolf. The Anatomy of a Game Program,
http://lglwww.epfl.ch/~wolf/java/html/othello_desc.html
- M. Buro. A strong Learning Othello Program,
NEC Research Institute, Princeton, NJ
|

Wednesday, November 24, 1999
|
|