MARBLES AND MEMORY:TWO URN MODELS OF LEARNING

Phenomenon to be explained: When people are instructed to learn paired associates, the proportion correct increases as a monotonic, negatively accelerated function of the number of repetitions (reinforcements of stimulus and response) of the pair of items. The exact form of this relationship is called the empirical learning curve.

Notation: Cn is the event "Correct on trial n". The probability correct on trial n will be denoted P(Cn) or Pn.

On the first trial, there have been no reinforcements; therefore

P(C1) = G, where G is the guessing rate.

Sometimes G is known from the design of the study (e.g., with two answers and no prior training, G = .5) Other times, P(C1) will represent the initial state of knowledge.

Urn Analogy: Each marble in the urn represents an association from the stimulus to a response. Perhaps each marble represents a neural link. Each marble is either White (for wrong) or Red (for right), and the marbles are assumed to be randomly mixed between trials. On each trial, a marble is chosen at random, as though the subject selects one association on each trial.

If the marble is Red, the response is right (correct).

If the marble is White, it is wrong.

Let Rn = the number of red marbles on trial n.

Wn = the number of white marbles on trial n.

Tn = Rn + Wn = the total number of marbles on trial n.

P(Cn) = (theoretical) probability of correct on trial n

P(Cn) = Rn/Tn

The probability of being correct on trial n corresponds to the probability of selecting a red marble on trial n.

In this analogy, learning corresponds to changing the proportion of Red marbles in the urn. There are two simple ways to change the proportion of red marbles in the urn:

(1) Add red marbles on each trial. (Accumulation)

(2) Replace marbles in the urn with red marbles. (Replacement)

In this handout, we will assume that every reinforcement is equally effective and that the subject would eventually learn perfectly. Therefore, the asymptote is 1.0 for the probability correct; as n gets large, P(Cn) approaches 1.0. Later, we will consider a more general case where this asymptote, a, need not be 1.0.

Accumulation Theory:

According to accumulation theory, new associations are added, but old associations (like "bad habits") are never wiped out. The total number of marbles increases on each trial.

Let R1 = the number of red marbles on trial 1

W1 = the number of white marbles on trial 1

Key Assumption: For each reinforcement, k red marbles are added to the urn. Therefore,

Rn = R1 + (n - 1)k

Wn = W1

(Note that on trial n, there have been n - 1 reinforcements). Therefore,

P(Cn) = [R1+(n - 1)k]/[R1 +(n - 1)k + W1]

Dividing both numerator and denominator by R1 + W1, using the fact that

G = R1/(R1 + W1), and letting Q = k/(R1 + W1), we have:

P(Cn) = [G + (n - 1)Q]/[1 + (n - 1)Q]      (1)

Write a computer program to calculate this curve for n = 1 to 10, for any G and Q.

Replacement Theory:

According to this theory, the total number of associations remains constant, and wrong associations can be replaced by correct ones. The total number of marbles stays the same from trial to trial, T.

Key assumption: For each reinforcement, k marbles are randomly sampled and replaced with k red marbles. Because the marbles are randomly sampled, the replaced marbles contain some red and some white. The number of red marbles on trial n is therefore equal to the previous number, minus those taken out, plus k red:

Rn = Rn - 1 - k(Rn - 1/T) + k

Therefore, dividing by T, letting Q = k/T

P(Cn) = P(Cn - 1) +Q[1 - P(Cn - 1)]      (2)

Write a program to calculate predictions for replacement theory. Remember, P(C1) = G.

It can be shown (by induction) that Equation 2 is equivalent to the following:

P(Cn) = 1 - (1 - G)(1 - Q)(n - 1)       (3)

Write a program to calculate the learning curves according to both Equation 2 and 3, (the results will be the same).