make-mdp-agent function (&key body name mdp program algorithm)
An MDP agent constructs a policy from the MDP once, and then uses that policy repeatedly to take action. The ALGORITHM keyword specifies the algorithm that is used to create the policy; don't confuse it with the PROGRAM keyword, which decides what actions to take.
mdp-agent type (total-reward)
mdp type (initial-state model rewards terminal-states hash-key name)
mdp-action-model type (transitions times-executed)
transition type (destination probability times-achieved)
action-model function (a s m)
transitions function (a s m)
Returns the transitions resulting from executing action a in state s according to model M.
actions function (s m)
Returns the list of actions feasible in state s according to model M.
*4x3-mdp* variable
*4x3-m-data* variable
*4x3-r-data* variable
mdp-environment type ( to make an mdp into an environment, we basically just keep track of the current state, and then ask the mdp model to determine the new state. this makes sense for the case of a single agent in the environment. mdp epochs-left)
An MDP-environment is driven by an MDP (Markov Decision Process), which (probabilistically) says what state to transition to for each action.
mdp-percept type (state reward terminalp)
A percept gives the current state, the reward received, and whether it is a terminal state.
initialize method ((env mdp-environment))
get-percept method ((env mdp-environment) agent)
The percept is the current state, the reward, and whether this is terminal.
update-fn method ((env mdp-environment))
We update by transitioning to a new state. When we hit a terminal state, we restart in the initial state (until there are no more epochs left).
performance-measure method ((env mdp-environment) agent)
Return a number saying how well this agent is doing.
termination? method ((env mdp-environment))
mdp-next-state function (action state mdp)
mdp-transitions function (action state-model)
random-transition function (transitions)
value-iteration-policy function (mdp)
Given an environment model M, value iteration determine the values of states U. Basic equation is U(i) <- r(i) + max_a sum_j M(a,i,j)U(j) where U(j) MUST be the old value not the new.
value-iteration function (mdp &optional uold &key epsilon)
A state is a sink if there are no actions that can lead to another state. Sinks can arise by accident during reinforcement learning of an environment model. Because they cause infinite loops, they must be detected.
sink? function (s m)
Given an initial policy P and initial utilities U, calculate the optimal policy. Do this by value determination alternating with policy update.
policy-iteration function (mdp &optional u)
Given a fixed policy and a model, calculate the value of each state. This version does it by an iterative process similar to value iteration. Basic equation is U(i) <- r(i) + sum_j M(P(i),i,j)U(j) where U(j) MUST be the old value not the new. A better alternative is to set up the value equations and solve them using matrix methods.
value-determination function (p uold m r &key epsilon)
Compute optimal policy given U and M
optimal-policy function (u m r)
The following functions select actions in particular states Pick a random action
policy-choice function (state p)
random-choice function (state u m r)
Pick the currently best action with tie-breaking
max-choice function (state u m r)
Simply pick a currently best action deterministically
dmax-choice function (state u m r)
Q(a,s) is the value of doing a in s, calculated by averaging over the utilities of the possible outcomes. Used in several update equations.
q-value function (action state u m r)
*policy-fn* variable
the policy used by the agent in acting
*correct-u* variable
*correct-m* variable
*correct-r* variable
u-rms-error function (u1 u2)
The policy loss of a utility function U for an mdp is defined as the difference in utility between the corresponding policy and the optimal policy, for the agent's current state. Calculate using value determination wrt the current policy
loss function (mdp u)
AIMA Home | Authors | Lisp Code | AI Programming | Instructors Pages |