Motivation

Adversarial search

Useful for competitive environments, in which two or more agents have conflicting goals, giving rise to adversarial search problems.

Fully observable
Deterministic
Discrete
Terminal states exist
- No infinite runs
Two-player zero-sum
- Zero-sum refers to the lack of win-win outcome - what is good for one player is just as bad for the other.
Turn-taking

Problem Formulation

States
Initial state
Terminal states
Actions
Transition
Utility function

Terminal states refer to when the states in which the game is over.

Utility function which defines the final numeric value to player when the game ends in terminal state .

Minimax

Expand function outputs a set of (action, next_state) pairs. The terminal function is_terminal returns true if the state is terminal, false otherwise.

def minimax(state):
	v = max_value(state)
	return action in expand(state) with value v

def max_value(state):
	if is_terminal(state):
		return utility(state)
	v = -infty
	for action, next_state in expand(state):
		v = max(v, min_value(next_state))
	return v

def min_value(state):
	if is_terminal(state):
		return utility(state)
	v = infty
	for action, next_state in expand(state):
		v = min(v, max_value(next_state))
	return v

The idea of the minimax algorithm is that it assumes the opponent plays optimally and tries to minimise the player’s value.

In max_value, it assumes opponents play optimally, hence it gets the min_value of the next step, instead, and vice-versa until it hits a terminal state.

Analysis

Optimal and complete

Alpha-Beta Pruning

Motivation

There is no point to explore the full game tree. Thus, it is good to prune some of the nodes so that less time is spent evaluating some nodes.

Alpha-beta pruning keeps track of the highest and lowest values the MAX player can obtain, essentially:

referring to the highest-value choice found so far at any choice point along the path at MAX
referring to the lowest-value choice found so far at any choice point along the path at MAX

def alpha_beta_search(state):
	v = max_value(state, -infty, infty)
	return action in expand(state) with value v

def max_value(state, alpha, beta):
	if is_terminal(state):
		return utility(state)
	v = -infty
	for action, next_state in expand(state):
		v = max(v, min_value(next_state))
		alpha = max(alpha, v)
		if v >= beta:
			return v
	return v

def min_value(state, alpha, beta):
	if is_terminal(state):
		return utility(state)
	v = infty
	for action, next_state in expand(state):
		v = min(v, max_value(next_state))
		beta = min(beta, v)
		if v <= alpha:
			return v
	return v

Analysis

Perfect ordering:

Note that doing alpha-beta pruning does not change the eventual solution of the algorithm.

Using a Cutoff

To save time, if a good enough solution is enough, a cutoff can be used, which means the algorithm is terminated either when it hits terminal states, or a certain specified depth.

This is important in handling large/infinite game trees.

def minimax(state):
	v = max_value(state)
	return action in expand(state) with value v

def max_value(state):
	if is_cutoff(state):
		return eval(state)
	v = -infty
	for action, next_state in expand(state):
		v = max(v, min_value(next_state))
	return v

def min_value(state):
	if is_cutoff(state):
		return eval(state)
	v = infty
	for action, next_state in expand(state):
		v = min(v, max_value(next_state))
	return v

Note that the difference between this and regular minimax is that:

is_terminal is replaced with is_cutoff
utility is replaced with eval

Evaluation Function

Evaluation function

Used to estimate the “goodness at the state”

To invent the heuristic, there are multiple ways

relax the game (in principle, too costly in practice)
expert knowledge
learn from data
- reinforcement learning, self play

Explorer

Adversarial Search

Motivation

Problem Formulation

Minimax

Alpha-Beta Pruning

Using a Cutoff

Evaluation Function

Graph View

Table of Contents

Backlinks