Learning (Subsystem of AIMA Code)

We provide a good variety of learning algorithms and agents. Unfortunately, right now the code is going through some changes, and not all the learning code has been updated, so it may not work.

learning/:

test-learning.lisp

learning/algorithms/:

inductive-learning.lisp
learning-curves.lisp
dtl.lisp
dll.lisp
nn.lisp
perceptron.lisp
multilayer.lisp
q-iteration.lisp

learning/domains/:

restaurant-multivalued.lisp
restaurant-real.lisp
restaurant-boolean.lisp
majority-boolean.lisp
ex-19-4-boolean.lisp
and-boolean.lisp
xor-boolean.lisp
4x3-passive-mdp.lisp

learning/agents/:

passive-lms-learner.lisp
passive-adp-learner.lisp
passive-td-learner.lisp
active-adp-learner.lisp
active-qi-learner.lisp
exploring-adp-learner.lisp
exploring-tdq-learner.lisp

File learning/test-learning.lisp

File learning/algorithms/inductive-learning.lisp

learning:inductive-learning.lisp Definition for learning problems and utilities for generating and manipulating example data. Examples are list of (attribute . value) pairs. Some of these may be goal values -- no explicit separation in data

learning-problem type (examples attributes goals)

attribute-name function (attribute)

attribute-values function (attribute)

attribute-value function (attribute example)

random-examples function (n attributes)

classify function (unclassified-examples goals h performance-element)

consistent function (examples goals h performance-element)

Coded examples have goal values (in a single list) followed by attribute values, both in fixed order

code-examples function (examples attributes goals)

code-example function (example attributes goals)

code-unclassified-example function (example attributes goals)

print-learning-problem function (problem &optional stream depth)

File learning/algorithms/learning-curves.lisp

Functions for testing induction algorithms Tries to be as generic as possible Mainly for NN purposes, allows multiple goal attributes A prediction is correct if it agrees on ALL goal attributes

learning-curve function (induction-algorithm examples -> hypothesis performance-element hypothesis + example -> prediction examples attributes goals trials training-size-increment &optional error-fn)

this version uses incremental data sets rather than a new batch each time

incremental-learning-curve function (induction-algorithm examples -> hypothesis performance-element hypothesis + example -> prediction examples attributes goals trials training-size-increment &optional error-fn)

accuracy function (h performance-element test-set goals &optional error-fn)

File learning/algorithms/dtl.lisp

decision tree learning algorithm - the standard "induction algorithm" returns a tree in the format (a1 (v11 . ) (v12 . )), bottoming out with goal values. currently handles only a single goal attribute

decision-tree-learning function (problem)

dtl function (examples attributes goal &optional prior)

distribution function (examples goal)

majority function (examples goal)

select-attribute function (examples attributes goal)

information-value function (a examples goal)

bits-required function (d)

dtpredict is the standard "performance element" that interfaces with the example-generation and learning-curve functions

dtpredict function (dt example)

File learning/algorithms/dll.lisp

decision list learning algorithm (Rivest) returns a decision list, each element of which is a test of the form (x .term), where each term is of the form ((a1 . v1) (a2 . v2) ... (an . vn)). The last element is the test (0). only works for purely boolean attributes.

decision-list-learning function (k problem)

dll function (k examples attributes goal)

select-test finds a test of size at most k that picks out a set of examples with uniform classification. Returns test and subset.

select-test function (k examples attributes goal)

select-k-test function (k examples attributes goal test-attributes)

generate-terms function (attributes)

uniform-classification function (examples goal)

passes function (example test)

dlpredict is the standard "performance element" that interfaces with the example-generation and learning-curve functions

dlpredict function (dl example)

File learning/algorithms/nn.lisp

Code for layered feed-forward networks Network is represented as a list of lists of units. Inputs assumed to be the ordered attribute values in examples Every unit gets input 0 set to -1

unit type (parents sequence of indices of units in previous layer children sequence of indices of units in subsequent layer weights weights on links from parents g activation function dg activation gradient function g' (if it exists) a activation level in total weighted input gradient g'(in_i) )

make-connected-nn returns a multi-layer network with layers given by sizes

make-connected-nn function (sizes &optional previous g dg)

step-function function (threshold x)

sign-function function (threshold x)

sigmoid function (x)

nn-learning establishes the basic epoch struture for updating, Calls the desired updating mechanism to improve network until either all correct or runs out of epochs

nn-learning function (problem network learning-method &key tolerance limit)

nn-error function (examples network)

network-output function (inputs network)

nn-output is the standard "performance element" for neural networks and interfaces to example-generating and learning-curve functions. Since performance elements are required to take only two arguments (hypothesis and example), nn-output is used in an appropriate lambda-expression

nn-output function (network unclassified-example attributes goals)

unit-output computes the output of a unit given a set of inputs it always adds a bias input of -1 as the zeroth input

unit-output function (inputs unit)

get-unit-inputs function (inputs parents)

random-weights function (n low high)

print-nn prints out the network relatively prettily

print-nn function (network)

File learning/algorithms/perceptron.lisp

perceptron learning - single-layer neural networks make-perceptron returns a one-layer network with m units, n inputs each

make-perceptron function (n m &optional g)

majority-perceptron function (n &optional g)

perceptron-learning is the standard "induction algorithm" and interfaces to the learning-curve functions

perceptron-learning function (problem)

Perceptron updating - simple version without lower bound on delta Hertz, Krogh, and Palmer, eq. 5.19 (p.97)

perceptron-update function (perceptron actual-inputs predicted target &optional learning-rate)

File learning/algorithms/multilayer.lisp

back-propagation learning - multi-layer neural networks backprop-learning is the standard "induction algorithm" and interfaces to the learning-curve functions

backprop-learning function (problem &optional hidden)

Backprop updating - Hertz, Krogh, and Palmer, p.117

backprop-update function (network actual-inputs predicted target &optional learning-rate)

backpropagate function (rnetwork network in reverse order inputs the inputs to the network deltas the "errors" for current layer learning-rate)

backprop-update-layer function (layer all-inputs deltas learning-rate)

compute-deltas propagates the deltas back from layer i to layer j pretty ugly, partly because weights Wji are stored only at layer i

compute-deltas function (jlayer ilayer ideltas)

File learning/algorithms/q-iteration.lisp

learning/algorithms/q-iteration.lisp Data structures and algorithms for calculating the Q-table for an MDP. Q(a,i) is the value of doing action a in state i.

q-entry function (q a i)

all-q-entries function (i q)

q-actions function (s q)

Given an MDP, determine the q-values of the states. Q-iteration iterates on the Q-values instead of the U-values. Basic equation is Q(a,i) <- R(i) + sum_j M(a,i,j) max_a' Q(a',j) where Q(a',j) MUST be the old value not the new.

q-iteration function (mdp &optional qold &key epsilon)

average-successor-q function (a i q m)

Compute optimal policy from Q table

q-optimal-policy function (q)

Choice functions select an action under specific circumstances Pick a random action

q-random-choice function (s q)

Pick the currently best action

q-dmax-choice function (s q)

Pick the currently best action with tie-breaking

q-max-choice function (s q)

File learning/domains/restaurant-multivalued.lisp

learning/domains/restaurant-multivalued.lisp Restaurant example from chapter 18, encoded using multivalued input attributes suitable for decision-tree learning.

*restaurant-multivalued* variable

*restaurant-multivalued-problem* variable

File learning/domains/restaurant-real.lisp

learning/domains/restaurant-real.lisp Restaurant example from chapter 18, encoded using real-valued input attributes suitable for decision-tree learning or neural network learning.

*restaurant-real* variable

*restaurant-real12-problem* variable

*restaurant-real100-problem* variable

File learning/domains/restaurant-boolean.lisp

learning/domains/restaurant-boolean.lisp Restaurant learning problem encoded using boolean attributes only, as appropriate for decision-list learning. Target is encoded as a decision list.

*restaurant-boolean* variable

*restaurant-boolean-problem* variable

File learning/domains/majority-boolean.lisp

*majority-boolean* variable

*majority-boolean-problem* variable

File learning/domains/ex-19-4-boolean.lisp

learning/domains/ex-19-4-boolean.lisp Inductive learning example from exercise 19.4

*ex-19-4-boolean-problem* variable

File learning/domains/and-boolean.lisp

learning/domains/and-boolean.lisp Data for Boolean AND function

*and-boolean-problem* variable

File learning/domains/xor-boolean.lisp

learning/domains/xor-boolean.lisp Data for Boolean XOR function

*xor-boolean-problem* variable

File learning/domains/4x3-passive-mdp.lisp

Passive, stochastic 4x3 environment for chapter 20. Just one possible action (no-op), uniformly arrives at neighbour square.

*4x3-passive-m-data* variable

*4x3-passive-r-data* variable

*4x3-passive-mdp* variable

File learning/agents/passive-lms-learner.lisp

learning/agents/passive-lms-learner.lisp Passive LMS learning agent. When a given training sequence terminates, update the utility of each state visited in the sequence to reflect the rewards received from then on.

make-passive-lms-learner function ()

lms-update function (u e percepts n)

File learning/agents/passive-adp-learner.lisp

learning/agents/passive-adp-learner.lisp Reinforcement learning agent that uses dynamic programming to solve the Markov process that it learns from its experience. Thus, the main job is to update the model over time. Being a passive agent, it simply does no-op at each step, watching the world go by.

make-passive-adp-learner function ()

Updating the transition model according to oberved transition i->j. Fairly tedious because of initializing new transition records.

update-passive-model function (j current state (destination of transition) percepts in reverse chronological order m transition model, indexed by state )

(passive-policy M) makes a policy of no-ops for use in value determination

passive-policy function (m)

File learning/agents/passive-td-learner.lisp

learning/agents/passive-td-learner.lisp Passive temporal-difference learning agent. After each transition, update the utility of the source state i to make it agree more closely with that of the destination state j.

*alpha* variable

initial learning rate parameter

make-passive-td-learner function ()

td-update function (u e percepts n)

current-alpha function (n)

File learning/agents/active-adp-learner.lisp

learning/agents/active-adp-learner.lisp Reinforcement learning agent that uses dynamic programming to solve the Markov decision process that it learns from its experience. Thus, the main job is to update the model over time.

make-random-adp-learner function (actions)

make-maximizing-adp-learner function (actions)

make-active-adp-learner function (actions choice-function)

Update current model to reflect the evidence from the most recent action

update-active-model function (mdp current description of envt. percepts in reverse chronological order action last action taken )

File learning/agents/active-qi-learner.lisp

make-random-qi-learner function (actions)

make-maximizing-qi-learner function (actions)

make-active-qi-learner function (actions choice-function)

File learning/agents/exploring-adp-learner.lisp

learning/agents/exploring-adp-learner.lisp Reinforcement learning agent that uses dynamic programming to solve the Markov decision process that it learns from its experience. Thus, the main job is to update the model over time. Unlike the active-adp-learner, this agent uses an "intelligent" exploration policy to make sure it explores the state space reasonably quickly.

*r+* variable

*ne* variable

exploration-function function (u n)

make-exploring-adp-learner function (actions)

Given an environment model M, determine the values of states U. Use value iteration, with initial values given by U itself. Basic equation is U(i) <- r(i) + max_a f(sum_j M(a,i,j)U(j), N(a,i)) where f is the exploration function. Does not applyt to terminal states.

exploratory-value-iteration function (mdp &optional uold &key epsilon)

exploration-choice function (s u m r)

File learning/agents/exploring-tdq-learner.lisp

learning/agents/exploring-tdq-learner.lisp Exploratory reinforcement learning agent using temporal differences. Works without a model by using the stochastic sampling to mirror the effect of averaging using the model.

make-exploring-tdq-learner function (actions)

update-exploratory-q function (q a i j n ri)

exploration-q-choice function (s q n)

Authors